Google Cloud Text-to-Speech API (Beta) allows developers to include natural-sounding, synthetic human speech as playable audio in their applications. The Text-to-Speech API converts text or Speech Synthesis Markup Language (SSML) input into audio data like MP3 or LINEAR16 (the encoding used in WAV files).

In this codelab, you will focus on using the Text-to-Speech API with C#. You will learn how to list available voices and also synthesize audio from text.

What you'll learn

What you'll need

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with C#?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud Platform services?

Novice Intermediate Proficient

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).

New users of Google Cloud Platform are eligible for a $300 free trial.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

Activate Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click "Start Cloud Shell":

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID.

Run the following command in the cloud shell to confirm that you are authenticated:

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

Before you can begin using the Text-to-Speech API, you must enable the API. You can enable the API by using the following command in the Cloud Shell:

gcloud services enable texttospeech.googleapis.com

In order to make requests to the Text-to-Speech API, you need to use a Service Account. A Service Account belongs to your project and it is used by the Google Client C# library to make Text-to-Speech API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the Cloud SDK to create a service account and then create credentials you will need to authenticate as the service account.

First, set an environment variable with your PROJECT_ID which you will use throughout this codelab:

export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value core/project)

Next, create a new service account to access the Text-to-Speech API by using:

gcloud iam service-accounts create my-text-to-speech-sa \
  --display-name "my text-to-speech codelab service account"

Next, create credentials that your C# code will use to login as your new service account. Create these credentials and save it as a JSON file "~/key.json" by using the following command:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account  my-text-to-speech-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the Text-to-Speech API C# library, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:

export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/key.json"

You can read more about authenticating the Text-to-Speech API.

First, create a simple C# console application that you will use to run Text-to-Speech API samples:

dotnet new console -n TextToSpeechApiDemo

You should see the application created and dependencies resolved:

The template "Console Application" was created successfully.
Processing post-creation actions...
...
Restore succeeded.

Next, navigate to TextToSpeechApiDemo folder:

cd TextToSpeechApiDemo/

And add Google.Cloud.Language.V1 NuGet package to the project:

dotnet add package Google.Cloud.TextToSpeech.V1 -v 1.0.0-beta01
info : Adding PackageReference for package 'Google.Cloud.TextToSpeech.V1' into project '/home/atameldev/TextToSpeechDemo/TextToSpeechDemo.csproj'.
log  : Restoring packages for /home/atameldev/TextToSpeechDemo/TextToSpeechDemo.csproj...
...
info : PackageReference for package 'Google.Cloud.TextToSpeech.V1' version '1.0.0-beta01' added to file '/home/atameldev/TextToSpeechDemo/TextToSpeechDemo.csproj'.

Now, you're ready to use Text-to-Speech API!

In this section, you will first list all available voices in English for audio synthesis.

First, open the code editor from the top right side of the Cloud Shell:

Navigate to the Program.cs file inside the TextToSpeechApiDemo folder and replace the code with the following:

using Google.Cloud.TextToSpeech.V1;
using System;

namespace TextToSpeechApiDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            var client = TextToSpeechClient.Create();
            var response = client.ListVoices("en");
            foreach (var voice in response.Voices)
            {
                Console.WriteLine($"{voice.Name} ({voice.SsmlGender}); Language codes: {string.Join(", ", voice.LanguageCodes)}");
            }
        }
    }
}

Take a minute or two to study the code. Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

en-US-Wavenet-D (Male); Language codes: en-US
en-AU-Wavenet-A (Female); Language codes: en-AU
en-AU-Wavenet-B (Male); Language codes: en-AU
en-AU-Wavenet-C (Female); Language codes: en-AU
en-AU-Wavenet-D (Male); Language codes: en-AU
en-GB-Wavenet-A (Female); Language codes: en-GB
en-GB-Wavenet-B (Male); Language codes: en-GB
en-GB-Wavenet-C (Female); Language codes: en-GB
...
en-GB-Standard-A (Female); Language codes: en-GB
en-GB-Standard-B (Male); Language codes: en-GB
en-AU-Standard-D (Male); Language codes: en-AU

Summary

In this step, you were able to list all available voices in English for audio synthesis. You can also find the complete list of voices available on the Supported Voices page.

You can use Text-to-Speech API to convert a string into audio data. You can configure the output of speech synthesis in a variety of ways, including selecting a unique voice or modulating the output in pitch, volumn, speaking rate, and sample rate.

To synthesize an audio file from text, navigate to the Program.cs file inside the TextToSpeechApiDemo folder and replace the code with the following:

using Google.Cloud.TextToSpeech.V1;
using System;
using System.IO;

namespace TextToSpeechApiDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            var client = TextToSpeechClient.Create();

            // The input to be synthesized, can be provided as text or SSML.
            var input = new SynthesisInput
            {
                Text = "This is a demonstration of the Google Cloud Text-to-Speech API"
            };

            // Build the voice request.
            var voiceSelection = new VoiceSelectionParams
            {
                LanguageCode = "en-US",
                SsmlGender = SsmlVoiceGender.Female
            };

            // Specify the type of audio file.
            var audioConfig = new AudioConfig
            {
                AudioEncoding = AudioEncoding.Mp3
            };

            // Perform the text-to-speech request.
            var response = client.SynthesizeSpeech(input, voiceSelection, audioConfig);
            
            // Write the response to the output file.
            using (var output = File.Create("output.mp3"))
            {
                response.AudioContent.WriteTo(output);
            }
            Console.WriteLine("Audio content written to file \"output.mp3\"");
        }
    }
}

Take a minute or two to study the code and see how it is used to create an audio file from text.

Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

Audio content written to file "output.mp3"

Inside code editor, you can download the mp3 file and play it locally on your machine.

Summary

In this step, you were able to use Text-to-Speech API to convert a string into an audio mp3 file. Read more about Creating voice audio files.

You learned how to use the Text-to-Speech API using C# to perform different kinds of transcription on audio files!

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

Learn More

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.