Using the Text-to-Speech API with C#

1. Overview

Google Cloud Text-to-Speech API (Beta) allows developers to include natural-sounding, synthetic human speech as playable audio in their applications. The Text-to-Speech API converts text or Speech Synthesis Markup Language (SSML) input into audio data like MP3 or LINEAR16 (the encoding used in WAV files).

In this codelab, you will focus on using the Text-to-Speech API with C#. You will learn how to list available voices and also synthesize audio from text.

What you'll learn

How to use the Cloud Shell
How to enable the Text-to-Speech API
How to Authenticate API requests
How to install the Google Cloud client library for C#
How to list available voices
How to synthesize audio from text

What you'll need

A Google Cloud Platform Project
A Browser, such Chrome or Firefox
Familiarity using C#

Survey

How will you use this tutorial?

Read it through only

Read it and complete the exercises

How would you rate your experience with C#?

Novice

Intermediate

Proficient

How would you rate your experience with using Google Cloud Platform services?

Novice

Intermediate

Proficient

2. Setup and Requirements

Self-paced environment setup

Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as PROJECT_ID). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project.
For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.

Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

Activate Cloud Shell

From the Cloud Console, click Activate Cloud Shell .

If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue.

It should only take a few moments to provision and connect to Cloud Shell.

This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.

Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.

Run the following command in Cloud Shell to confirm that you are authenticated:

gcloud auth list

Command output

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:

gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

3. Enable the Text-to-Speech API

Before you can begin using the Text-to-Speech API, you must enable the API. You can enable the API by using the following command in the Cloud Shell:

gcloud services enable texttospeech.googleapis.com

4. Install the Google Cloud Text-to-Speech API client library for C#

First, create a simple C# console application that you will use to run Text-to-Speech API samples:

dotnet new console -n TextToSpeechApiDemo

You should see the application created and dependencies resolved:

The template "Console Application" was created successfully.
Processing post-creation actions...
...
Restore succeeded.

Next, navigate to TextToSpeechApiDemo folder:

cd TextToSpeechApiDemo/

And add Google.Cloud.TextToSpeech.V1 NuGet package to the project:

dotnet add package Google.Cloud.TextToSpeech.V1

info : Adding PackageReference for package 'Google.Cloud.TextToSpeech.V1' into project '/home/atameldev/TextToSpeechDemo/TextToSpeechDemo.csproj'.
log  : Restoring packages for /home/atameldev/TextToSpeechDemo/TextToSpeechDemo.csproj...
...
info : PackageReference for package 'Google.Cloud.TextToSpeech.V1' version '1.0.0-beta01' added to file '/home/atameldev/TextToSpeechDemo/TextToSpeechDemo.csproj'.

Now, you're ready to use Text-to-Speech API!

5. List Available Voices

In this section, you will first list all available voices in English for audio synthesis.

First, open the code editor from the top right side of the Cloud Shell:

Navigate to the Program.cs file inside the TextToSpeechApiDemo folder and replace the code with the following:

using Google.Cloud.TextToSpeech.V1;
using System;

namespace TextToSpeechApiDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            var client = TextToSpeechClient.Create();
            var response = client.ListVoices("en");
            foreach (var voice in response.Voices)
            {
                Console.WriteLine($"{voice.Name} ({voice.SsmlGender}); Language codes: {string.Join(", ", voice.LanguageCodes)}");
            }
        }
    }
}

Take a minute or two to study the code*.* Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

en-US-Wavenet-D (Male); Language codes: en-US
en-AU-Wavenet-A (Female); Language codes: en-AU
en-AU-Wavenet-B (Male); Language codes: en-AU
en-AU-Wavenet-C (Female); Language codes: en-AU
en-AU-Wavenet-D (Male); Language codes: en-AU
en-GB-Wavenet-A (Female); Language codes: en-GB
en-GB-Wavenet-B (Male); Language codes: en-GB
en-GB-Wavenet-C (Female); Language codes: en-GB
...
en-GB-Standard-A (Female); Language codes: en-GB
en-GB-Standard-B (Male); Language codes: en-GB
en-AU-Standard-D (Male); Language codes: en-AU

Summary

In this step, you were able to list all available voices in English for audio synthesis. You can also find the complete list of voices available on the Supported Voices page.

6. Synthesize audio from text

You can use Text-to-Speech API to convert a string into audio data. You can configure the output of speech synthesis in a variety of ways, including selecting a unique voice or modulating the output in pitch, volumn, speaking rate, and sample rate.

To synthesize an audio file from text, navigate to the Program.cs file inside the TextToSpeechApiDemo folder and replace the code with the following:

using Google.Cloud.TextToSpeech.V1;
using System;
using System.IO;

namespace TextToSpeechApiDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            var client = TextToSpeechClient.Create();

            // The input to be synthesized, can be provided as text or SSML.
            var input = new SynthesisInput
            {
                Text = "This is a demonstration of the Google Cloud Text-to-Speech API"
            };

            // Build the voice request.
            var voiceSelection = new VoiceSelectionParams
            {
                LanguageCode = "en-US",
                SsmlGender = SsmlVoiceGender.Female
            };

            // Specify the type of audio file.
            var audioConfig = new AudioConfig
            {
                AudioEncoding = AudioEncoding.Mp3
            };

            // Perform the text-to-speech request.
            var response = client.SynthesizeSpeech(input, voiceSelection, audioConfig);
            
            // Write the response to the output file.
            using (var output = File.Create("output.mp3"))
            {
                response.AudioContent.WriteTo(output);
            }
            Console.WriteLine("Audio content written to file \"output.mp3\"");
        }
    }
}

Take a minute or two to study the code and see how it is used to create an audio file from text*.*

Back in Cloud Shell, run the app:

dotnet run

You should see the following output:

Audio content written to file "output.mp3"

Inside code editor, you can download the mp3 file and play it locally on your machine.

Summary

In this step, you were able to use Text-to-Speech API to convert a string into an audio mp3 file. Read more about Creating voice audio files.

7. Congratulations!

You learned how to use the Text-to-Speech API using C# to perform different kinds of transcription on audio files!

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

Go to the Cloud Platform Console.
Select the project you want to shut down, then click ‘Delete' at the top: this schedules the project for deletion.

Learn More

Google Cloud Text-to-Speech API: https://cloud.google.com/text-to-speech/docs
C#/.NET on Google Cloud Platform: https://cloud.google.com/dotnet/
Google Cloud .NET client: https://googlecloudplatform.github.io/google-cloud-dotnet/

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.