After a lot of research I (think) I have a reliable enough method, using Aubio, of identifying speaker gender. This is not crucial to the system, but it’s a nice touch.
The fundamental frequency of audio can be used to determine gender. Men typically fall within the 75-150 Hz range and women in the 150–300 Hz. Overlapping is possible, so any voice in the 140-160 Hz range will be treated as ‘undetermined’.
This is language neutral – meaning that it can be used in any language.
My initial test results (from random YouTube clips):
Male 01: 131 Hz fundamental frequency
Male 02: 117 Hz fundamental frequency
Female 01: 191 Hz fundamental frequency
Female 02: 194 Hz fundamental frequency
Very low female voice: 147 Hz fundamental frequency
The last sample will fall in the ‘undetermined’ range.
Now that I have this working, the next step is to get proper speaker identification working.