After a lot of research I (think) I have a reliable enough method, using Aubio, of identifying speaker gender. This is not crucial to the system, but it’s a nice touch.

The fundamental frequency of audio can be used to determine gender. Men typically fall within the 75-150 Hz range and women in the 150–300 Hz. Overlapping is possible, so any voice in the 140-160 Hz range will be treated as ‘undetermined’.

This is language neutral – meaning that it can be used in any language.

My initial test results (from random YouTube clips):

Male 01: 131 Hz fundamental frequency
Male 02: 117 Hz fundamental frequency

Female 01: 191 Hz fundamental frequency
Female 02: 194 Hz fundamental frequency

Very low female voice: 147 Hz fundamental frequency

The last sample will fall in the ‘undetermined’ range.

Now that I have this working, the next step is to get proper speaker identification working.