I am currently working on a project for which I am trying to use Deepspeech on a raspberry pi while using microphone audio, but I keep getting an Invalid Sample rate error. Using pyAudio I create a stream which uses the sample rate the model wants, which is 16000, but the microphone I am using has a sample rate of 44100. When running the python script no rate conversion is done and the microphones sample rate and the expected sample rate of the model produce an Invalid Sample Rate error.
The microphone info is listed like this by pyaudio:
{'index': 1, 'structVersion': 2, 'name': 'Logitech USB Microphone: Audio (hw:1,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': -1.0, 'defaultHighInputLatency': 0.034829931972789115, 'defaultHighOutputLatency': -1.0, 'defaultSampleRate': 44100.0}
The first thing I tried was setting the pyAudio stream sample rate to 44100 and feeding the model that. But after testing I found out that the model does not work well when it gets a rate different from its requested 16000.
I have been trying to find a way to have the microphone change rate to 16000, or at least have its rate converted to 16000 when it is used in the python script, but to no avail.
The latest thing I have tried is changing the .asoundrc file to find away to change the rate, but I don't know if it is possible to change the microphone's rate to 16000 within this file. This is how the file currently looks like:
pcm.!default {
type asymd
playback.pcm
{
type plug
slave.pcm "dmix"
}
capture.pcm
{
type plug
slave.pcm "usb"
}
}
ctl.!default {
type hw
card 0
}
pcm.usb {
type hw
card 1
device 0
rate 16000
}
The python code I made works on windows, which I guess is because windows does convert the rate of the input to the sample rate in the code. But Linux does not seem to do this.
tldr; microphone rate is 44100, but has to change to 16000 to be usable. How do you do this on Linux?
Edit 1:
I create the pyAudio stream like this:
self.paStream = self.pa.open(rate = self.model.sampleRate(), channels = 1, format= pyaudio.paInt16, input=True, input_device_index = 1, frames_per_buffer= self.model.beamWidth())
It uses the model's rate and model's beamwidth, and the number of channels of the microphone and index of the microphone.
I get the next audio frame and to format it properly to use with the stream I create for the model I do this:
def __get_next_audio_frame__(self):
audio_frame = self.paStream.read(self.model.beamWidth(), exception_on_overflow= False)
audio_frame = struct.unpack_from("h" * self.model.beamWidth(), audio_frame)
return audio_frame
exception_on_overflow = False
was used to test the model with an input rate of 44100, without this set to False the same error as I currently deal with would occur. model.beamWidth
is a variable that hold the value for the amount of chunks the model expects. I then read that amount of chunks and reformat them before feeding them to the model's stream. Which happens like this:
modelStream.feedAudioContent(self.__get_next_audio_frame__())