In this lab you will run a Node.js instance to send streaming and non streaming requests to Speech API to transcribe recorded audio

What you need

To complete this lab, you need:

What you learn

In this lab, you:

Step 1

If necessary, install the following software:

If you don't have a Mac or Linux laptop with admin privileges, create a Compute Engine instance on Google Cloud Platform and install the above software there (the gcloud SDK will already be present). In this case, you will not be able to test out the microphone aspect of this lab.

Step 2

Install brew with the following command at the command prompt:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Step 3

Install sox (utility to play and record audio and convert audio formats):

brew install sox

Step 4

Clone the Node.js samples repository

git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git
cd nodejs-docs-samples

Step 5

Set cloud console project

export GCLOUD_PROJECT=<YOUR-PROJECT-ID>

Step 6

Obtain authentication credentials

Set the GOOGLE_APPLICATION_CREDENTIALS environment variable:

export GOOGLE_APPLICATION_CREDENTIALS= /path/to/service_account_file.json 

Step 1

cd speech

Step 2

Install dependencies (listed in package.json)

npm install

Step 3

List command options and run a simple speech recognition request (synchronous)

node recognize --help
node recognize sync ./resources/audio.raw

Examine the code for recognize.js in a text editor such as nano:

nano recognize.js

Look for the function syncRecognize that performs the above. Get familiar with the options used, like encoding and sampleRate.

In this section, you will modify the code to use custom hints.

Step 1

Download a new audio file that has 44100 sample rate, and uses a relatively new word, typogram, that's not in the speech database yet.

gsutil cp gs://speechapi-demo/typograms_entities.wav ./resources

Step 2

Modify the sample rate to 44100 in syncRecognize function in recognize.js:

nano recognize.js

Step 3

Repeat the command with the new audio file:

node recognize sync ./resources/typograms_entities.wav

Note in the result that instead of spelling out typograms, the result recognizes it as telegrams.

Step 4

Add a custom hint to your syncRecognize function so it does a better job with the audio file by editing the function as follows:

function syncRecognize (filename, callback) {
  // Detect speech in the audio file, e.g. "./resources/audio.raw"
  speech.recognize(filename, {
    encoding: 'LINEAR16',
    speechContext: {
      "phrases":["typograms"]
     },
    sampleRate: 44100
  }, (err, results) => {
    if (err) {
      callback(err);
      return;
    }

    console.log('Results:', results);
    callback();
  });
}

Step 5

Run the sync command again and now take a look at how the result has changed.

node recognize sync ./resources/typograms_entities.wav

Step 1

Try out the asynchronous version of the recognize function

node recognize async ./resources/audio.raw

Step 2

When making an asynchronous (non-blocking) call, other operations that you may add after the function's callback might run first.

Lets test this in our code. Simply add a print statement like

console.log(‘your speech is being processed.....‘) 

after the callback in both syncRecognize and asyncRecognize functions and run them again. Notice how the statement appears after the results, in case of sync, and before results in case of async.

Cloud Speech API also allows you to you to stream audio via rpc to do real-time speech to text, for example live news feed, or a speech enabled dictation system.

Run the stream and listen version of the command to invoke a real-time streaming request to take input from your microphone, send it to Cloud Speech API and transcribe it:

node recognize stream ./resources/audio.raw

node recognize listen

Look at the streamingRecognize and streamingMicRecognize functions to understand how the above commands work.

In this lab you ran a Node.js instance to send streaming and non streaming requests to Speech API to transcribe recorded audio