Thank you! Likewise
I am interested. Shall we create a GitHub repo and begin to collect references on its wiki?
My handle on GitHub is slivingston.
I sat across the table from you on that Wednesday evening.
I’m writing a book which has a few chapters about doing speech recognition and text-to-speech (TTS) on inexpensive hardware such as the Raspberry Pi. I’ve got some code written in Python that uses the Google cloud for speech recognition and TTS. There are various other speech APIs that are also available. The current Misty local API does not seem to support speech yet.
What do you have in mind?
- Mike Seiler, MSEE
Yes, Mike, I remember! Excited to see you book!
I know what you mean. It looks like this is coming in the future: “Voice Integration – You can bring a voice to Misty by integrating with your choice of third party NLP or TTS provider. (2019 Pre-built NLP Integration w/Alexa)”.
I thought of maybe writing an Alexa skill, just for Misty?
Do you have other ideas?
I was there with you the other night too, and I am interested in learning more about your plans. I know some Python and used the Snack Toolkit a long time ago. I am sure things have progressed since then.
Awesome! That would be great! I will create a repo in a next couple of days, and then we can all work on it.
In terms of basic architecture, we can choose from at least one of the following:
- entirely within Misty skills (so, running onboard, internal to Misty robot)
- some combination of Misty skills and a backpack Raspberry Pi or Arduino
- JS skill making HTTP calls to some remote provider that we make
- same as previous, but with calls to a paid or otherwise opaque service like Google Cloud (Cloud Speech-to-Text - Speech Recognition | Cloud Speech-to-Text API | Google Cloud).
The main disadvantage of #3 and #4 is the dependence on Internet connection quality.
Besides the Snack Toolkit (referenced by @baghaii), some other software that might be useful without depending on opaque services:
- CMUSphinx Tutorial For Developers – CMUSphinx Open Source Speech Recognition
- GitHub - mozilla/DeepSpeech: A TensorFlow implementation of Baidu's DeepSpeech architecture
- the celebrated Web scraper in Python: Beautiful Soup: We called him Tortoise because he taught us.
- Natural Language Toolkit (http://www.nltk.org/)
Also, some research results and tools for NLP are listed at https://nlpprogress.com/
I lean more towards #1 or #2. But having remote option #3 is great too (I could build a front-end, if we were to host it on a page, possibly).
As for topic, I thought of teaching Misty greetings in different foreign languages.
Once we decide, let’s start a repo.
how about we try #1 first because it requires the least additional materials besides an off-the-shelf Misty robot?
for the first skill, how about the following?
Misty waits to hear the name of a natural language in that language. if it hears and recognizes it, then responds with a greeting in the requested language. otherwise, displays a question mark on its screen.
- hears “English”, responds “hello”
- hears “español”, responds “hola”
Yes, that sounds like a plan
cool. how about we call the project Misty speech library (abbreviated: MSL), or to avoid trademark conflict, Speech Skills Library (abbreviated: Speech SL)? the latter has the advantage that it can be a pattern for naming other coordinated collections of Misty skills, e.g., “Telepresence SL” for skills that facilitate using Misty robots for telepresence.
addendum: as an example about naming, SciPy has the pattern of separate packages being named with the prefix
scikit-, as described at SciKits - about. scikit-learn is a well-known example of one of these.
I started a repository “Misty speech library” here:
I coded a simple conversational bot using Python with regex and random. This is just a start, of course. Check it out.
Please add more stuff. We can add license info (there was info from Misty team), Wiki, code, libraries, and of course add voice features!
This is great to work with:
Thank you so much! That’s very helpful. I was just thinking about how to keep the momentum and keep developing this.
thanks for creating the repo. to keep the momentum going, here are pull requests (PRs) that I can start:
- documentation, including notes from earlier messages in this thread (e.g., New skills for Misty_voice/speech)
- a minimal skill to play an audio recording from a file onboard the robot
- a minimal skill to record audio and save it somewhere
after doing the above, we might be ready to organize the repo a little more in terms of subdirectories and code style.
what do you think?
Yes, absolutely, thank you.
Regarding #2. I have already saved 2 audio files on Misty at the last event, I can record more.
today I learned about another NLP project that might be useful here: spaCy (https://spacy.io/)
they also have a “universe” page that lists “resources developed with or for spaCy”, https://spacy.io/universe
some of which might be useful with Misty robots, e.g., mordecai (GitHub - openeventdata/mordecai: Full text geoparsing as a Python library), which can
Extract the place names from a piece of text, resolve them to the correct place, and return their coordinates and structured geographic information.
Hi, great to hear from you! I have been thinking about this skill too. I am building chatbots right now full-time, but always think about Misty .
Anyway, I will check that out! Thank you! Will be in touch soon.
Hey Olena and Scott,
I’m one of the developers here at Misty and just stumbled across this conversation. Just wanted to pop in and address a few things as this is a portion of the system that I’m actively working on. I can tell you that we do plan on having quite a few things available on Misty’s local APIs.
- Wake Word - likely “Hey Misty” (unless you guys have other ideas?)
- Source Localization - this will tell you where the direction of the predominant speaker, relative to Misty
- Command Capture - after hearing the wake word, will put the voice command into a wav file and notify your skill when it is ready (so you can upload it to whatever service you want)
I would also like to integrate #3 at a lower level in the robot so that the command/response latency isn’t as perceptible. Right now when you make an external request in a skill, that audio data gets routed through several portions of our system rather inefficiently which adds considerable latency in remote command/response skills.
If you guys have other ideas on what you would like to see on Misty, let me know either here on the forums or you can DM me on the community site. Eventually I would like to have some sort of speech-to-intent service available on Misty that you could either configure or maybe even replace the implementation (high hopes for modularity). We’re still working on tuning the microphones and speakers but once that is complete, then the fun really begins.
If you have a few minutes, your votes on our Robot Roadmap really do count and it’s how we prioritize features that we work on.