About this codelab
1. Overview
Google Cloud Speech-to-Text API enables developers to convert audio to text in 120 languages and variants, by applying powerful neural network models in an easy to use API.
In this codelab, you will focus on using the Speech-to-Text API with Node.js. You will learn how to send an audio file in English and other languages to the Cloud Speech-to-Text API for transcription.
What you'll learn
- How to enable the Speech-to-Text API
- How to Authenticate API requests
- How to install the Google Cloud client library for Node.js
- How to transcribe audio files in English
- How to transcribe audio files with word timestamps
- How to transcribe audio files in different languages
What you'll need
- A Google Cloud Platform Project
- A Browser, such Chrome or Firefox
- Familiarity using Javascript/Node.js
Survey
How will you use this tutorial?
How would you rate your experience with Node.js?
How would you rate your experience with using Google Cloud Platform services?
2. Setup and Requirements
Self-paced environment setup
- Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one.)
Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID
.
- Next, you'll need to enable billing in Cloud Console in order to use Google Cloud resources.
Running through this codelab shouldn't cost much, if anything at all. Be sure to to follow any instructions in the "Cleaning up" section which advises you how to shut down resources so you don't incur billing beyond this tutorial. New users of Google Cloud are eligible for the $300USD Free Trial program.
Start Cloud Shell
While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.
Activate Cloud Shell
- From the Cloud Console, click Activate Cloud Shell
.
If you've never started Cloud Shell before, you'll be presented with an intermediate screen (below the fold) describing what it is. If that's the case, click Continue (and you won't ever see it again). Here's what that one-time screen looks like:
It should only take a few moments to provision and connect to Cloud Shell.
This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.
Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.
- Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list
Command output
Credentialed Accounts ACTIVE ACCOUNT * <my_account>@<my_domain.com> To set the active account, run: $ gcloud config set account `ACCOUNT`
gcloud config list project
Command output
[core] project = <PROJECT_ID>
If it is not, you can set it with this command:
gcloud config set project <PROJECT_ID>
Command output
Updated property [core/project].
3. Enable the Speech-to-Text API
Before you can begin using the Speech-to-Text API, you must enable the API. You can enable the API by using the following command in the Cloud Shell:
gcloud services enable speech.googleapis.com
4. Authenticate API requests
In order to make requests to the Speech-to-Text API, you need to use a Service Account. A Service Account belongs to your project and it is used by the Google Client Node.js library to make Speech-to-Text API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the Cloud SDK to create a service account and then create credentials you will need to authenticate as the service account.
First, set an environment variable with your PROJECT_ID which you will use throughout this codelab, if you are using Cloud Shell this will be set for you:
export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value core/project)
Next, create a new service account to access the Speech-to-Text API by using:
gcloud iam service-accounts create my-speech-to-text-sa \
--display-name "my speech-to-text codelab service account"
Next, create credentials that your Node.js code will use to login as your new service account. Create these credentials and save it as a JSON file ~/key.json
by using the following command:
gcloud iam service-accounts keys create ~/key.json \
--iam-account my-speech-to-text-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com
Finally, set the GOOGLE_APPLICATION_CREDENTIALS
environment variable, which is used by the Speech-to-Text API Node.js library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:
export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/key.json"
You can read more about authenticating the Speech-to-Text API.
5. Install the Google Cloud Speech-to-Text API client library for Node.js
First, create a project that you will use to run this Speech-to-Text API lab, initialize a new Node.js package in a folder of your choice:
npm init
NPM asks several questions about the project configuration, such as name and version. For each question, press ENTER
to accept the default values. The default entry point is a file named index.js
.
Next, install the Google Cloud Speech library to the project:
npm install --save @google-cloud/speech
For more instructions on how to set up a Node.js development for Google Cloud please see the Setup Guide.
Now, you're ready to use Speech-to-Text API!
6. Transcribe Audio Files
In this section, you will transcribe a pre-recorded audio file in English. The audio file is available on Google Cloud Storage.
Navigate to the index.js
file inside the and replace the code with the following:
// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');
const client = new speech.SpeechClient();
/**
* Calls the Speech-to-Text API on a demo audio file.
*/
async function quickstart() {
// The path to the remote LINEAR16 file stored in Google Cloud Storage
const gcsUri = 'gs://cloud-samples-data/speech/brooklyn_bridge.raw';
// The audio file's encoding, sample rate in hertz, and BCP-47 language code
const audio = {
uri: gcsUri,
};
const config = {
encoding: 'LINEAR16',
sampleRateHertz: 16000,
languageCode: 'en-US',
};
const request = {
audio: audio,
config: config,
};
// Detects speech in the audio file
const [response] = await client.recognize(request);
const transcription = response.results
.map(result => result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: ${transcription}`);
}
quickstart();
Take a minute or two to study the code and see it is used to transcribe an audio file*.*
The Encoding
parameter tells the API which type of audio encoding you're using for the audio file. Flac
is the encoding type for .raw files (see the doc for encoding type for more details).
In the RecognitionAudio
object, you can pass the API either the uri of our audio file in Cloud Storage or the local file path for the audio file. Here, we're using a Cloud Storage uri.
Run the program:
node .
You should see the following output:
how old is the Brooklyn Bridge
7. Transcribe with word timestamps
Speech-to-Text can detect time offset (timestamp) for the transcribed audio. Time offsets show the beginning and end of each spoken word in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.
Navigate to the index.js
file inside the and replace the code with the following:
const speech = require('@google-cloud/speech');
const client = new speech.SpeechClient();
/**
* Calls the Speech-to-Text API on a demo audio file.
*/
async function quickstart() {
// The path to the remote LINEAR16 file stored in Google Cloud Storage
const gcsUri = 'gs://cloud-samples-data/speech/brooklyn_bridge.raw';
// The audio file's encoding, sample rate in hertz, and BCP-47 language code
const audio = {
uri: gcsUri,
};
const config = {
encoding: 'LINEAR16',
sampleRateHertz: 16000,
languageCode: 'en-US',
enableWordTimeOffsets: true,
};
const request = {
audio: audio,
config: config,
};
// Detects speech in the audio file
const [response] = await client.recognize(request);
response.results.forEach((result) => {
result.alternatives.forEach((alternative) => {
console.log(`Transcript: ${alternative.transcript}`);
console.log(`Word details:`);
console.log(` Word count ${alternative.words.length}`);
alternative.words.forEach((item) => {
console.log(` ${item.word}`);
const s = parseInt(item.startTime.seconds) +
item.startTime.nanos/1000000000;
console.log(` WordStartTime: ${s}s`);
const e = parseInt(item.endTime.seconds) +
item.endTime.nanos/1000000000;
console.log(` WordEndTime: ${e}s`);
});
});
});
}
quickstart();
Take a minute or two to study the code and see it is used to transcribe an audio file with word timestamps*.* The EnableWordTimeOffsets
parameter tells the API to enable time offsets (see the doc for more details).
Run your program again:
node .
You should see the following output:
Transcript: how old is the Brooklyn Bridge
Word details:
Word count 6
how
WordStartTime: 0s
WordEndTime: 0.3s
old
WordStartTime: 0.3s
WordEndTime: 0.6s
is
WordStartTime: 0.6s
WordEndTime: 0.8s
the
WordStartTime: 0.8s
WordEndTime: 0.9s
Brooklyn
WordStartTime: 0.9s
WordEndTime: 1.1s
Bridge
WordStartTime: 1.1s
WordEndTime: 1.4s
8. Transcribe different languages
Speech-to-Text API supports transcription in over 100 languages! You can find a list of supported languages here.
In this section, you will transcribe a pre-recorded audio file in French. The audio file is available on Google Cloud Storage.
Navigate to the index.js
file inside the and replace the code with the following:
// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');
const client = new speech.SpeechClient();
/**
* Calls the Speech-to-Text API on a demo audio file.
*/
async function quickstart() {
// The path to the remote LINEAR16 file stored in Google Cloud Storage
const gcsUri = 'gs://cloud-samples-data/speech/corbeau_renard.flac';
// The audio file's encoding, sample rate in hertz, and BCP-47 language code
const audio = {
uri: gcsUri,
};
const config = {
encoding: 'FLAC',
languageCode: 'fr-FR',
};
const request = {
audio: audio,
config: config,
};
// Detects speech in the audio file
const [response] = await client.recognize(request);
const transcription = response.results
.map((result) => result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: ${transcription}`);
}
quickstart();
Run your program again and you should see the following output:
maître corbeau sur un arbre perché tenait en son bec un fromage
This is a sentence from a popular French children's tale.
For the full list of supported languages and language codes, see the documentation here.
9. Congratulations!
You learned how to use the Speech-to-Text API using Node.js to perform different kinds of transcription on audio files!
Clean up
To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:
- Go to the Cloud Platform Console.
- Select the project you want to shut down, then click ‘Delete' at the top: this schedules the project for deletion.
Learn More
- Google Cloud Speech-to-Text API: https://cloud.google.com/speech-to-text/docs
- Node.js on Google Cloud Platform: https://cloud.google.com/nodejs/
- Google Cloud Node.js client: https://googlecloudplatform.github.io/google-cloud-node/
License
This work is licensed under a Creative Commons Attribution 2.0 Generic License.