Google Cloud Speech-to-Text API enables developers to convert audio to text in 120 languages and variants, by applying powerful neural network models in an easy to use API.

In this codelab, you will focus on using the Speech-to-Text API with C#. You will learn how to send an audio file in English and other languages to the Cloud Speech-to-Text API for transcription.

What you'll learn

What you'll need

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with C#?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud Platform services?

Novice Intermediate Proficient

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).

New users of Google Cloud Platform are eligible for a $300 free trial.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

Activate Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click "Start Cloud Shell":

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID.

Run the following command in the cloud shell to confirm that you are authenticated:

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

Before you can begin using the Speech-to-Text API, you must enable the API. You can enable the API by using the following command in the Cloud Shell:

gcloud services enable speech.googleapis.com

In order to make requests to the Speech-to-Text API, you need to use a Service Account. A Service Account belongs to your project and it is used by the Google Client C# library to make Speech-to-Text API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the Cloud SDK to create a service account and then create credentials you will need to authenticate as the service account.

First, set an environment variable with your PROJECT_ID which you will use throughout this codelab:

export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value core/project)

Next, create a new service account to access the Speech-to-Text API by using:

gcloud iam service-accounts create my-speech-to-text-sa \
  --display-name "my speech-to-text codelab service account"

Next, create credentials that your C# code will use to login as your new service account. Create these credentials and save it as a JSON file "~/key.json" by using the following command:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account  my-speech-to-text-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the Speech-to-Text API C# library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:

export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/key.json"

You can read more about authenticating the Speech-to-Text API.

First, create a simple C# console application that you will use to run Speech-to-Text API samples:

dotnet new console -n SpeechToTextApiDemo

You should see the application created and dependencies resolved:

The template "Console Application" was created successfully.
Processing post-creation actions...
...
Restore succeeded.

Next, navigate to SpeechToTextApiDemo folder:

cd SpeechToTextApiDemo/

And add Google.Cloud.Language.V1 NuGet package to the project:

dotnet add package Google.Cloud.Speech.V1
info : Adding PackageReference for package 'Google.Cloud.Speech.V1' into project '/home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj'.
log  : Restoring packages for /home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj...
...
info : PackageReference for package 'Google.Cloud.Speech.V1' version '1.0.1' added to file '/home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj'.

Now, you're ready to use Speech-to-Text API!

In this section, you will transcribe a pre-recorded audio file in English. The audio file is available on Google Cloud Storage.

To transcribe an audio file, open the code editor from the top right side of the Cloud Shell:

Navigate to the Program.cs file inside the SpeechToTextApiDemo folder and replace the code with the following:

using Google.Cloud.Speech.V1;
using System;

namespace SpeechToTextApiDemo
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var config = new RecognitionConfig
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
                SampleRateHertz = 16000,
                LanguageCode = LanguageCodes.English.UnitedStates
            };
            var audio = RecognitionAudio.FromStorageUri("gs://cloud-samples-tests/speech/brooklyn.flac");         
            
            var response = speech.Recognize(config, audio);

            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine(alternative.Transcript);
                }
            }
        }
    }
}

Take a minute or two to study the code and see it is used to transcribe an audio file.

The Encoding parameter tells the API which type of audio encoding you're using for the audio file. Flac is the encoding type for .raw files (see the doc for encoding type for more details).

In the RecognitionAudio object, you can pass the API either the uri of our audio file in Cloud Storage or the local file path for the audio file. Here, we're using a Cloud Storage uri.

Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

how old is the Brooklyn Bridge

Summary

In this step, you were able to transcribe an audio file in English and print out the result. Read more about Transcribing.

Speech-to-Text can detect time offset (timestamp) for the transcribed audio. Time offsets show the beginning and end of each spoken word in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.

To transcribe an audio file with time offsets, navigate to the Program.cs file inside the SpeechToTextApiDemo folder and replace the code with the following:

using Google.Cloud.Speech.V1;
using System;

namespace SpeechToTextApiDemo
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var config = new RecognitionConfig
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
                SampleRateHertz = 16000,
                LanguageCode = LanguageCodes.English.UnitedStates,
                EnableWordTimeOffsets = true
            };
            var audio = RecognitionAudio.FromStorageUri("gs://cloud-samples-tests/speech/brooklyn.flac");
     
            var response = speech.Recognize(config, audio);

            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine($"Transcript: { alternative.Transcript}");
                    Console.WriteLine("Word details:");
                    Console.WriteLine($" Word count:{alternative.Words.Count}");
                    foreach (var item in alternative.Words)
                    {
                        Console.WriteLine($"  {item.Word}");
                        Console.WriteLine($"    WordStartTime: {item.StartTime}");
                        Console.WriteLine($"    WordEndTime: {item.EndTime}");
                    }
                }
            }
        }
    }
}

Take a minute or two to study the code and see it is used to transcribe an audio file with word timestamps. The EnableWordTimeOffsets parameter tells the API to enable time offsets (see the doc for more details).

Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

dotnet run

Transcript: how old is the Brooklyn Bridge
Word details:
 Word count:6
  how
    WordStartTime: "0s"
    WordEndTime: "0.300s"
  old
    WordStartTime: "0.300s"
    WordEndTime: "0.600s"
  is
    WordStartTime: "0.600s"
    WordEndTime: "0.800s"
  the
    WordStartTime: "0.800s"
    WordEndTime: "0.900s"
  Brooklyn
    WordStartTime: "0.900s"
    WordEndTime: "1.100s"
  Bridge
    WordStartTime: "1.100s"
    WordEndTime: "1.500s"

Summary

In this step, you were able to transcribe an audio file in English with word timestamps and print out the result. Read more about Transcribing with word offsets.

Speech-to-Text API supports transcription in over 100 languages! You can find a list of supported languages here.

In this section, you will transcribe a pre-recorded audio file in French. The audio file is available on Google Cloud Storage.

To transcribe the French audio file, navigate to the Program.cs file inside the SpeechToTextApiDemo folder and replace the code with the following:

using Google.Cloud.Speech.V1;
using System;

namespace SpeechToTextApiDemo
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var config = new RecognitionConfig
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
                LanguageCode = LanguageCodes.French.France
            };
            var audio = RecognitionAudio.FromStorageUri("gs://speech-language-samples/fr-sample.flac");

            var response = speech.Recognize(config, audio);

            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine(alternative.Transcript);
                }
            }
        }
    }
}

Take a minute or two to study the code and see how it is used to transcribe an audio file. The LanguageCode parameter tells the API what language the audio recording is in.

Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

maître corbeau sur un arbre perché tenait en son bec un fromage

This is a sentence from a popular French children's tale.

Summary

In this step, you were able to transcribe an audio file in French and print out the result. Read more about supported languages.

You learned how to use the Speech-to-Text API using C# to perform different kinds of transcription on audio files!

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

Learn More

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.