Home Community Blog Buy Now

Misty Community Forum

Speech To Text Using Google Cloud API

Hi all ! i currently doing my graudation project with misty robot , My objective for this project was making misty to recognise intent
& QnA using with NLP model . I’m currently stuck with implementing Speech to Text on misty for quite sometime , i had just use voskjs with the model as server with using HTTP GET but it required file to be local file system.

Does anyone had experience doing speech to text with Google cloud without having needing the file to be on system or cloud storage ? Or just by using the audio file recorded from misty to Transcribe into text . Hope you guys can provide a better solution for my problem. Thanks

Hi @believeitcode

If it’d be helpful, here’s an example of STT implementation using CaptureGoogleSpeech.

Regarding file storage, if you’d like to gain some experience sending file content to the cloud, then you can definitely do it manually. Today, that’s transactional (i.e. record file, download file to skill, upload file to the cloud, receive a response, delete the file). If you’re using the inbox implementation then you’ll have the option to create a file or not, in which case the mechanical aspect goes away.

Does this help to answer your question? If not, would you mind providing a bit more detail about what you’re looking to accomplish and where you’re stuck? Additionally, a code sample would help if you’d be willing to share that information. Thanks!

Just to add on, here are some code snippets (in JS) that demonstrate how to send external requests to Google STT and process them. Depending on your use case, this may allow for more control and granularity.

function _GoogleSTT(data) 
    let base64 = data.Result.Base64;
    let arguments = {
        "config": {
        "audio": {
          "content": base64
    misty.SendExternalRequest("POST", "https://speech.googleapis.com/v1p1beta1/speech:recognize?key=*ADD YOUR KEY HERE*", null, null, JSON.stringify(arguments), false, false, null, "application/json", "_speechToTextResponse");
function _speechToTextResponse(data){
    if(data.Result.ResponseObject.Data != "{}\n"){
        let parsed = JSON.parse(data.Result.ResponseObject.Data);
        let response = parsed.results[0].alternatives[0].transcript;

Thanks @scott.bobbitt @Jackson i manage to make it work using CP codes.