Home Community Blog Buy Now

Misty Community Forum

Sending Misty Recorded file from CaptureSpeech

hi, i would like to ask is there any possible way for misty to send over the audio file for off-board processing offline(e.g to Raspi) through a physical medium?

And is there any speech-to-Text offline support yet ? I last heard was that misty support DeepSpeech and PocketSphinx Internally

My project was to implementing misty entirely offline using offline-speech recognition library like voskJS on the RaspberryPi , is it possible ?

Hi @believeitcode ,

If I’m understanding correctly, you’re looking to enable communication over the USB port in the back of the robot, is that correct? If so, that’s something we have not tried yet.

One thing we have tried (with relative success) is setting up a STT service on a local network connection to send requests for transcription. That way the robot doesn’t need to send a request outside of its local network.

For this approach, we’ve found that using nVidia’s NeMo achieves a decent balance between speed and reliability. A minor caveat though, we have not tried it on a Raspberry Pi so can’t necessarily speak to its efficacy on the Raspberry Pi.

Regarding DeepSpeech and PocketSphinx integrations, we have had mixed results, hence us steering you more toward NeMo for the time being.

I hope this guidance helps but let us know if you have any further questions.

For setting up an STT service on a local network connection is it possible to share more on this approach?
Do you guys run a server to run the STT service on different PC ?
If that the case , does misty support sending his/her audio over to cloud storage or sending the audio file via FTP ?

We set up an app that was essentially a wrapper around the NeMo STT transcription service using Python. It accepted an audio file on a POST endpoint that the robot could send a request to, and the response was the transcribed speech.

The app runs on a different PC on the local network or can be connected directly to the robot over WiFi, as long as the robot knows to which IP the requests should be sent. There is no need to send the audio over cloud storage or FTP. Audio files can be sent as part of an HTTP request. There is an example of something similar in the NeMo tools on their github.

The example on their github is similar but not exactly like the implementation we ran. This implementation is designed to use a web front-end instead of just a REST API, so it separates uploading and transcribing into different sections of the app. However, it can be simplified down to a single request that

  1. accepts a file as a parameter
  2. does checks on the file to determine if it can be transcribed
  3. then, if all the checks are passed it will transcribe the audio and return the transcription

Hope this helps!

Thanks. It is possible to share the code implementation of the work done by you guys for my reference? I need to implement a STT service without using 3rd party cloud services . My overall objective was to implement a conversation AI system(currently using Rasa open source) with ASR and TTS. If not , a guide on it will be sufficient for me.