1. Overview
Google Cloud Speech-to-Text API enables developers to convert audio to text in 120 languages and variants, by applying powerful neural network models in an easy to use API.
In this codelab, you will focus on using the Speech-to-Text API with C#. You will learn how to send an audio file in English and other languages to the Cloud Speech-to-Text API for transcription.
What you'll learn
- How to use the Cloud Shell
- How to enable the Speech-to-Text API
- How to Authenticate API requests
- How to install the Google Cloud client library for C#
- How to transcribe audio files in English
- How to transcribe audio files with word timestamps
- How to transcribe audio files in different languages
What you'll need
Survey
How will you use this tutorial?
How would you rate your experience with C#?
How would you rate your experience with using Google Cloud Platform services?
2. Setup and Requirements
Self-paced environment setup
- Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.
- The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
- The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as
PROJECT_ID
). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project. - For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.
- Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.
Start Cloud Shell
While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.
Activate Cloud Shell
- From the Cloud Console, click Activate Cloud Shell .
If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue.
It should only take a few moments to provision and connect to Cloud Shell.
This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.
Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.
- Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list
Command output
Credentialed Accounts ACTIVE ACCOUNT * <my_account>@<my_domain.com> To set the active account, run: $ gcloud config set account `ACCOUNT`
- Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:
gcloud config list project
Command output
[core] project = <PROJECT_ID>
If it is not, you can set it with this command:
gcloud config set project <PROJECT_ID>
Command output
Updated property [core/project].
3. Enable the Speech-to-Text API
Before you can begin using the Speech-to-Text API, you must enable the API. You can enable the API by using the following command in the Cloud Shell:
gcloud services enable speech.googleapis.com
4. Install the Google Cloud Speech-to-Text API client library for C#
First, create a simple C# console application that you will use to run Speech-to-Text API samples:
dotnet new console -n SpeechToTextApiDemo
You should see the application created and dependencies resolved:
The template "Console Application" was created successfully.
Processing post-creation actions...
...
Restore succeeded.
Next, navigate to SpeechToTextApiDemo
folder:
cd SpeechToTextApiDemo/
And add Google.Cloud.Speech.V1
NuGet package to the project:
dotnet add package Google.Cloud.Speech.V1
info : Adding PackageReference for package 'Google.Cloud.Speech.V1' into project '/home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj'.
log : Restoring packages for /home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj...
...
info : PackageReference for package 'Google.Cloud.Speech.V1' version '1.0.1' added to file '/home/atameldev/SpeechToTextApiDemo/SpeechToTextApiDemo.csproj'.
Now, you're ready to use Speech-to-Text API!
5. Transcribe Audio Files
In this section, you will transcribe a pre-recorded audio file in English. The audio file is available on Google Cloud Storage.
To transcribe an audio file, open the code editor from the top right side of the Cloud Shell:
Navigate to the Program.cs
file inside the SpeechToTextApiDemo
folder and replace the code with the following:
using Google.Cloud.Speech.V1;
using System;
namespace SpeechToTextApiDemo
{
public class Program
{
public static void Main(string[] args)
{
var speech = SpeechClient.Create();
var config = new RecognitionConfig
{
Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
SampleRateHertz = 16000,
LanguageCode = LanguageCodes.English.UnitedStates
};
var audio = RecognitionAudio.FromStorageUri("gs://cloud-samples-tests/speech/brooklyn.flac");
var response = speech.Recognize(config, audio);
foreach (var result in response.Results)
{
foreach (var alternative in result.Alternatives)
{
Console.WriteLine(alternative.Transcript);
}
}
}
}
}
Take a minute or two to study the code and see it is used to transcribe an audio file*.*
The Encoding
parameter tells the API which type of audio encoding you're using for the audio file. Flac
is the encoding type for .raw files (see the doc for encoding type for more details).
In the RecognitionAudio
object, you can pass the API either the uri of our audio file in Cloud Storage or the local file path for the audio file. Here, we're using a Cloud Storage uri.
Back in Cloud Shell, run the app:
dotnet run
You should see the following output:
how old is the Brooklyn Bridge
Summary
In this step, you were able to transcribe an audio file in English and print out the result. Read more about Transcribing.
6. Transcribe with word timestamps
Speech-to-Text can detect time offset (timestamp) for the transcribed audio. Time offsets show the beginning and end of each spoken word in the supplied audio. A time offset value represents the amount of time that has elapsed from the beginning of the audio, in increments of 100ms.
To transcribe an audio file with time offsets, navigate to the Program.cs
file inside the SpeechToTextApiDemo
folder and replace the code with the following:
using Google.Cloud.Speech.V1;
using System;
namespace SpeechToTextApiDemo
{
public class Program
{
public static void Main(string[] args)
{
var speech = SpeechClient.Create();
var config = new RecognitionConfig
{
Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
SampleRateHertz = 16000,
LanguageCode = LanguageCodes.English.UnitedStates,
EnableWordTimeOffsets = true
};
var audio = RecognitionAudio.FromStorageUri("gs://cloud-samples-tests/speech/brooklyn.flac");
var response = speech.Recognize(config, audio);
foreach (var result in response.Results)
{
foreach (var alternative in result.Alternatives)
{
Console.WriteLine($"Transcript: { alternative.Transcript}");
Console.WriteLine("Word details:");
Console.WriteLine($" Word count:{alternative.Words.Count}");
foreach (var item in alternative.Words)
{
Console.WriteLine($" {item.Word}");
Console.WriteLine($" WordStartTime: {item.StartTime}");
Console.WriteLine($" WordEndTime: {item.EndTime}");
}
}
}
}
}
}
Take a minute or two to study the code and see it is used to transcribe an audio file with word timestamps*.* The EnableWordTimeOffsets
parameter tells the API to enable time offsets (see the doc for more details).
Back in Cloud Shell, run the app:
dotnet run
You should see the following output:
dotnet run
Transcript: how old is the Brooklyn Bridge
Word details:
Word count:6
how
WordStartTime: "0s"
WordEndTime: "0.300s"
old
WordStartTime: "0.300s"
WordEndTime: "0.600s"
is
WordStartTime: "0.600s"
WordEndTime: "0.800s"
the
WordStartTime: "0.800s"
WordEndTime: "0.900s"
Brooklyn
WordStartTime: "0.900s"
WordEndTime: "1.100s"
Bridge
WordStartTime: "1.100s"
WordEndTime: "1.500s"
Summary
In this step, you were able to transcribe an audio file in English with word timestamps and print out the result. Read more about Transcribing with word offsets.
7. Transcribe different languages
Speech-to-Text API supports transcription in over 100 languages! You can find a list of supported languages here.
In this section, you will transcribe a pre-recorded audio file in French. The audio file is available on Google Cloud Storage.
To transcribe the French audio file, navigate to the Program.cs
file inside the SpeechToTextApiDemo
folder and replace the code with the following:
using Google.Cloud.Speech.V1;
using System;
namespace SpeechToTextApiDemo
{
public class Program
{
public static void Main(string[] args)
{
var speech = SpeechClient.Create();
var config = new RecognitionConfig
{
Encoding = RecognitionConfig.Types.AudioEncoding.Flac,
LanguageCode = LanguageCodes.French.France
};
var audio = RecognitionAudio.FromStorageUri("gs://cloud-samples-data/speech/corbeau_renard.flac");
var response = speech.Recognize(config, audio);
foreach (var result in response.Results)
{
foreach (var alternative in result.Alternatives)
{
Console.WriteLine(alternative.Transcript);
}
}
}
}
}
Take a minute or two to study the code and see how it is used to transcribe an audio file*.* The LanguageCode
parameter tells the API what language the audio recording is in.
Back in Cloud Shell, run the app:
dotnet run
You should see the following output:
maître corbeau sur un arbre perché tenait en son bec un fromage
This is a sentence from a popular French children's tale.
Summary
In this step, you were able to transcribe an audio file in French and print out the result. Read more about supported languages.
8. Congratulations!
You learned how to use the Speech-to-Text API using C# to perform different kinds of transcription on audio files!
Clean up
To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:
- Go to the Cloud Platform Console.
- Select the project you want to shut down, then click ‘Delete' at the top: this schedules the project for deletion.
Learn More
- Google Cloud Speech-to-Text API: https://cloud.google.com/speech-to-text/docs
- C#/.NET on Google Cloud Platform: https://cloud.google.com/dotnet/
- Google Cloud .NET client: https://googlecloudplatform.github.io/google-cloud-dotnet/
License
This work is licensed under a Creative Commons Attribution 2.0 Generic License.