Plans or what we need to do to get direct Voice response/speech-to-text

MistyI (and eventually Misty II)
I would like to bypass having to use the BackPack for Voice response
and avoid having to have a audio/video streams to external computer
to do basic detection of a ‘Wake Word’.

So we have a way on the computer to listen, and do voice response,
and then send a command to Misty or to repeat saying something in
Misty’s chosen voice. Misty has Microphones, so the missing need is
the daemon/API which can:
- Customize wake word
- Daemon listens for ‘Wake word’ and notifies subscriber(s)
- Or at a minimum pick a service (amazon, Google, etc.) and either
make that default or allow customization.
- Train on voice - for user recognition/identity - wish list - for
security and profiles.

  • The skills I want are based on interaction with Misty versus control
    via a web page.

I do not see any API to call or daemon listening on Misty to voice or
audio streaming even so I can listen to Misty’s microphone on my computer.

For now my computer (Linux) will do the listening on its microphone and send the ‘actions’ to Misty over the APIs (play sound/voice, movement, eyes, colors, control add-on arm/tail, control Arduino led strip cyborg eye). Since to generate skills I need input monitored that Misty will respond to… (someone arriving at the door, knock at the door, scheduled wake-up time).

  • right now in my DeepLearning course we are doing object recognition.

The other ‘skill’ I am thinking of is a ‘pop-up’ (hidden slide up LCD), but
I really want to do is remove the wheels and build the track with outdoor level clearance, then swap that out with legs.

1 Like

Quick note: We plan to have a Misty wake word prior to the launch of Misty II.

1 Like

Excellent any hints on me getting to this now, I think it is hidden somewhere in the Windows part of the Misty? Almost every ‘behavior’ (stimulus-response is based on voice commands/interaction) and a major part of learning personality.

I did find someone that did VoiceRecongnition already on a Windows 10 IOT Raspberry Pi, and the source is there.
And speech synthesis (a bit machine like voice):