Robot Not a Dev? Buy Now
Not a Dev?

Misty Community Forum

Misty ASR and TTS

Any plans to have a version of this built in that runs on local hardware? or would you put it in the category of something the community comes up with?

@station I’m curious what you have in mind?

would love to be able to give voice commands or just talk and have it understand some NLU to some extent, and seems like there has been enough progress in this space to get something better than just sox and flite which is what others use. Even when there is no internet connection, say I’m in the car and I don’t want the kids playing an app or watching something, they could play and talk with Misty. kids love the Tom the talking cat …thing, and cozmo has a basic tts that is fun to play with.

Step one was high level, RESTful APIs to get folks started.
Step 2 is high level on-robot APIs so the robot can operate independently. Step 3 is enabling local community code to operate that calls the on-robot API (so that you or anyone else can experiment with any local processing - whether tts, neural nets, object rec, sound processing, etc…
Step 4 is low-level on-robot access.

If you were in our shoes, would you sequence this differently?

1 Like

That sequence makes sense for sure. I was just curious of the roadmap for those features.

I can certainly say that this is on the roadmap in some form, though I can’t speak to the priority. One thing we are working on is finding a way to share the roadmap with all of you, so that you can help us build the things you’d really like. Again, I’m not sure on the timing, but it is in the works.

This fits with the learning, I did see the mistyvoice sample for Python library.
(Which uses Google for TTS)

There are javascript flavors of the APIs (for just Google in Javascript) a few:
Cloud Speech-to-Text - Speech Recognition  |  Cloud Speech-to-Text  |  Google Cloud
Cloud Text-to-Speech - Speech Synthesis  |  Cloud Text-to-Speech API  |  Google Cloud
GitHub - googleapis/nodejs-speech: Node.js client for Google Cloud Speech: Speech to text conversion powered by machine learning.
GitHub - hiddentao/google-tts: Javascript API for the Google Text-to-Speech engine

First basics in case others are not familiar:
NLP - Natural Language Processing
NLU - Natural Language Understading
STT - Speech to Text
TTS - Text to Speech

Is there a planned solution for this? (Google, Alexa, AT&T, Cortana, etc)
Some other solutions:
- Jasper (Jasper | Documentation) can use a mix of above
and others.
- Rasa (https://rasa.com/)
- Snips (https://snips.ai/) (GitHub - snipsco/snips-nlu: Snips Python library to extract meaning from text)

1 Like

I would add:
WIT.ai - they have a simple api to do STT by sending an audio file as binary as well as a decent intent classifier and entity extraction system.
Rasa is pretty decent, but has a crap ton of dependencies that I’m pretty sure wouldn’t run on the robot for a while (Keras / TensorFlow, Spacy NLU) , but also it doesn’t to TTS or STT.

Jasper is ok but never got it to run very well, that was a long time ago though.
Snips is interesting but I haven’t dug too deep into it yet.

I think until it’s running on the robot then you could get away with just using the browser based STT and STT since the other samples are in JS anyway, I could get something whipped up pretty quick for testing actually since I’m using it for diff project right now.

TTS right now would be odd since it would be:
create audio in browser -> save as binary string array -> load tmp file through Misty API -> play file - > delete file -> repeat

but this should work

ok @markwdalton Here is what I came up with:

Give it a shot

@station you are my hero!

1 Like

I had a good role model :wink:

I replied on slack… but so others know. It was working fine.

2 Likes

I did get jasper working on my Raspberry Pi zero W, but my microphone is not great
(even with my new 2 mic array but it could be my now sloppy soldering… I am a bit
out of practice… I have not been teaching many kids since around 2001).

So I will get Jasper running on my Linux notebook with a good mic.
But if there are no privacy concerns, I prefer Google/Alexa since they do most
of the hard work, internationalization, and the TTS voices are better. But I have
not played with Festival for a few decades and some with MBROLA. (I remember
doing speech synthesis on the Apple II+ 32k RAM via peeks, pokes and calls).

http://www.cs.cmu.edu/~awb/
- has some links for the Emotional Speech
- flite - light weight version of Festival
- Bard a story teller program for ebook reading (Index of /bard)

1 Like

I think flite is fast and easy to use, tiny as well. Just doesn’t sound great if you want it to sound realistic but for a robot you could slap a vocoder on there to add some effects and that might work just fine for most TTS.
If we could add a GTX 1080ti to Misty we coudl run TacoTron on it, that actually has great sounding TTS and its fast enough on just 1 gpu… I wish :stuck_out_tongue:

1 Like

Thanks! I am just thinking of the STT/NLP parts of one of these, so it is more of a fast, private
dictation. (Think journaling or similar where I would not want it uploaded to Amazon or Google)

But it is one of the reasons I liked the idea of Jasper, the SST could be one engine (private)
and the TTS speech could be others like Google or Amazon with the better voice quality.

I just have not looked at a lot of these others (I have Google and Alexa at home), but I do
not ‘share’ information with them, just normal things. And I am not a very ‘private’ person
I put way too much info online on Google. But I am thinking for others, that are extremely
concerned. (At this point in my life, if it is true get used to it anyone can find out nearly
anything).