Multi-region failover using Cloud DNS Routing Policies and Health Checks for Internal TCP/UDP Load Balancer

1. Introduction

Last Updated: 2022-09-22

What is a DNS routing policy

Cloud DNS routing policies enable users to configure DNS based traffic steering depending on specific criteria like weight, geo location or health checks.

Cloud DNS supports the following routing policies:

  • Weighted round robin routing policy
  • Geolocation routing policy
  • Geofenced routing policy
  • Failover routing policy

In this lab you will configure and test the failover routing policy.

Failover routing policy

Cloud DNS supports health checks for Internal TCP/UDP Load Balancers that have global access enabled. With a failover routing policy, you can configure primary and backup IPs for a resource record. In normal operation, Cloud DNS will respond to queries with the IP addresses provisioned in the primary set. When all IP addresses in the primary set fail (health status changes to unhealthy), Cloud DNS starts serving the IP addresses in the backup set.

Health checks

DNS routing policy will depend on native Internal Load Balancer unified health checks(UHC). An Internal Load Balancer is considered healthy if 20% (or more) of the backends are healthy. Health checks for internal TCP/UDP and Internal HTTP(S) load balancers provide different information. For an internal HTTP(S) load balancer, UHC provides the health status of all the Envoy proxies, but for an internal TCP/UDP load balancer, Cloud DNS gets direct health signals from the individual backend instances. Details of health checks can be found here .

What you'll build

In this Codelab, you're going to build a website running in 2 regions and associate a failover DNS routing policy to it. The set up will have:

Active resources -

  • L4 Internal Load Balancer in REGION_1
  • A VM running Apache web server in REGION_1

Backup resources -

  • L4 Internal Load Balancer in REGION_2
  • A VM running Apache web server in REGION_2

The set up is as shown below -

d0a91d3d3698f544.png

What you'll learn

  • How to create a failover routing policy
  • Trigger DNS failover
  • How to trickle traffic to the backup set

What you'll need

  • Basic knowledge of DNS
  • Basic knowledge of Google Compute Engine
  • Basic knowledge of L4 Internal Load Balancer

2. Setup and Requirements

  1. Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

b35bf95b8bf3d5d8.png

a99b7ace416376c4.png

bd84a6d3004737c5.png

  • The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can update it at any time.
  • The Project ID must be unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference the Project ID (it is typically identified as PROJECT_ID). If you don't like the generated ID, you may generate another random one. Alternatively, you can try your own and see if it's available. It cannot be changed after this step and will remain for the duration of the project.
  • For your information, there is a third value, a Project Number which some APIs use. Learn more about all three of these values in the documentation.
  1. Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab shouldn't cost much, if anything at all. To shut down resources so you don't incur billing beyond this tutorial, you can delete the resources you created or delete the whole project. New users of Google Cloud are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

From the Google Cloud Console, click the Cloud Shell icon on the top right toolbar:

55efc1aaa7a4d3ad.png

It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:

7ffe5cbb04455448.png

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this codelab can be done within a browser. You do not need to install anything.

3. Google Cloud SDK version

At the time of writing, 401.0.0 is the latest Google Cloud SDK version. All the commands in this lab were tested using the latest version of Google Cloud SDK. Before proceeding, please make sure that Cloud Shell is using the latest version of SDK.

Checking the SDK version

Use the gcloud version command to check the SDK version. Run the following commands in Cloud Shell

Command

gcloud version | grep "Google Cloud SDK"

Output Example

Google Cloud SDK 401.0.0

Next Steps

  1. If the SDK version is 401.0.0 or higher, then skip to the next section.
  2. If the SDK version is lower than 401.0.0, then run the command listed below to update the SDK.

Optional Command

sudo apt-get update && sudo apt-get install google-cloud-sdk

4. Before you begin

Before you start deploying the architecture that we explained above, let's make sure that the Cloud Shell is configured correctly and all the required APIs are enabled.

Set up Project id

Inside Cloud Shell, make sure that your project id is set up. If your Cloud shell prompt looks like the output below and you don't plan to change the project ID, then you can skip to the next step (Set Environment Variables).

USER@cloudshell:~ (PROJECT_ID)$

If you still want to change the project ID, use the command listed below, the Cloud Shell prompt will change from (PROJECT_ID) to (YOUR-PROJECT-ID)

Optional Command

gcloud config set project [YOUR-PROJECT-ID]

Output Example

Updated property [core/project].
USER@cloudshell:~ (YOUR-PROJECT-ID)$

Set the Environment Variables

Set the Environment variables

We will use the export command to set the environment variables. Run the following commands in Cloud Shell

Commands

export REGION_1=us-west1
export REGION_1_ZONE=us-west1-a
export REGION_2=us-east4
export REGION_2_ZONE=us-east4-a

Verify

Now that the environment variables are set, let's verify using the echo command. The output for each command should be the value that we configured above using the export command. Run the following commands in Cloud Shell

Commands

echo $REGION_1
echo $REGION_1_ZONE
echo $REGION_2
echo $REGION_2_ZONE

Enable all necessary services

Use the gcloud services enable command to enable the Compute and DNS APIs. Run the following commands in Cloud Shell

Enable the Compute API

Command

gcloud services enable compute.googleapis.com

Enable the DNS API

Command

gcloud services enable dns.googleapis.com

Verify

Now that the services are enabled, let's verify using the gcloud services list command to list all the enabled APIs.

Command

gcloud services list | grep -E 'compute|dns'

Output Example

NAME: compute.googleapis.com
NAME: dns.googleapis.com

5. Create VPC Network, Subnets and Firewall rules

In this section, we will create the VPC network, two subnets (one in each region) and the required firewall rules.

Create VPC Network

Use the gcloud compute networks create command to create the VPC network. We are setting the subnet mode as custom because we will create our own subnets in the next step. Run the following commands in Cloud Shell.

Command

gcloud compute networks create my-vpc --subnet-mode custom

Create Subnets

Use the gcloud compute networks subnets create command to create two subnets, one in the REGION_1 and one in the REGION_2. Run the following commands in Cloud Shell

REGION_1 subnet

Command

gcloud compute networks subnets create ${REGION_1}-subnet \
--network my-vpc \
--range 10.1.0.0/24 \
--region $REGION_1

REGION_2 subnet

Command

gcloud compute networks subnets create ${REGION_2}-subnet \
--network my-vpc \
--range 10.2.0.0/24 \
--region $REGION_2

Create firewall rules

You need to allow traffic on port 80 from the VPC subnets and from the load balancer health check IP ranges.

In addition to that, you also need to create firewall rule to allow SSH traffic on the client VMs.

Use the gcloud compute firewall-rules create command to create the firewall rules. Run the following commands in Cloud Shell

Allow Traffic on Port 80

Command

gcloud compute firewall-rules create allow-http-lb-hc \
--allow tcp:80 --network my-vpc \
--source-ranges 10.1.0.0/24,10.2.0.0/24,35.191.0.0/16,130.211.0.0/22 \
--target-tags=allow-http

Allow SSH Traffic on the Client VM

Command

gcloud compute firewall-rules create allow-ssh \
--allow tcp:22 --network my-vpc \
--source-ranges 0.0.0.0/0 \
--target-tags=allow-ssh

6. Create Cloud NAT

You need Cloud NAT gateways in both regions for the private VMs to be able to download and install packages from the internet.

  • Our web server VMs will need to download and install Apache web server.
  • The client VM will need to download and install the dnsutils package which we will use for our testing.

Each Cloud NAT gateway is associated with a single VPC network, region, and Cloud Router. So before we create the NAT gateways, we need to create Cloud Routers in each region.

Create Cloud Routers

Use the gcloud compute routers create command to create Cloud Routers in us-west1 and us-east4 regions. Run the following commands in Cloud Shell.

Region_1 Cloud Router

Commands

gcloud compute routers create "${REGION_1}-cloudrouter" \
--region $REGION_1 --network=my-vpc --asn=65501

Region_2 Cloud Router

Commands

gcloud compute routers create "${REGION_2}-cloudrouter" \
--region $REGION_2 --network=my-vpc --asn=65501

Create the NAT Gateways

Use the gcloud compute routers nat create command to create the NAT gateways in us-west1 and us-east4 regions. Run the following commands in Cloud Shell.

Region_1 NAT Gateway

Commands

gcloud compute routers nats create "${REGION_1}-nat-gw" \
--router="${REGION_1}-cloudrouter" \
--router-region=$REGION_1 \
--nat-all-subnet-ip-ranges --auto-allocate-nat-external-ips

Region_2 NAT Gateway

Commands

gcloud compute routers nats create "${REGION_2}-nat-gw" \
--router="${REGION_2}-cloudrouter" \
--router-region=$REGION_2 \
--nat-all-subnet-ip-ranges --auto-allocate-nat-external-ips

7. Create Compute Engine VMs

In this section, you will create the web servers, unmanaged instance groups for the web servers and the client VM.

Create Web server VMs

Use the gcloud compute instances create command to create the web servers. We need to create two web servers, one in REGION_1 and another in REGION_2. We are using startup scripts to install and configure Apache on the web servers.

REGION_1 Web Server

Run the following command in Cloud Shell

Command

gcloud compute instances create "${REGION_1}-instance" \
--image-family=debian-11 --image-project=debian-cloud \
--zone=$REGION_1_ZONE \
--network-interface=network=my-vpc,subnet=${REGION_1}-subnet,no-address \
--tags=allow-http \
--metadata=startup-script='#! /bin/bash
    apt-get update
    apt-get install apache2 -y
    a2ensite default-ssl
    a2enmod ssl
    vm_hostname="$(curl -H "Metadata-Flavor:Google" \
    http://169.254.169.254/computeMetadata/v1/instance/name)"
    echo "Page served from: $vm_hostname" | \
    tee /var/www/html/index.html
    systemctl restart apache2'

REGION_2 Web Server

Run the following command in Cloud Shell

Command

gcloud compute instances create "${REGION_2}-instance" \
--image-family=debian-11 --image-project=debian-cloud \
--zone=$REGION_2_ZONE \
--network-interface=network=my-vpc,subnet=${REGION_2}-subnet,no-address \
--tags=allow-http \
--metadata=startup-script='#! /bin/bash
    apt-get update
    apt-get install apache2 -y
    a2ensite default-ssl
    a2enmod ssl
    vm_hostname="$(curl -H "Metadata-Flavor:Google" \
    http://169.254.169.254/computeMetadata/v1/instance/name)"
    echo "Page served from: $vm_hostname" | \
    tee /var/www/html/index.html
    systemctl restart apache2'

Create unmanaged instance groups

In this section, we create two unmanaged instance groups. We will use these instance groups in the next section to configure the ILB backend services. Once the instance groups are created, we will add the web server VMs to these instance groups.

Create the Unmanaged Instance Groups

Use the gcloud compute instance-groups unmanaged create command to create two unmanaged instance groups, one for the us-west1 web server and one for the us-east4 web server.

Region_1 Instance Group

Commands

gcloud compute instance-groups unmanaged create \
"${REGION_1}-instance-group" --zone=$REGION_1_ZONE

Region_2 Instance Group

Commands

gcloud compute instance-groups unmanaged create \
"${REGION_2}-instance-group" --zone=$REGION_2_ZONE

Add VMs to the Instance Groups

Use the gcloud compute instance-groups unmanaged add-instances command to add the instances to the Instance groups that we just created. Add REGION_1 web server to the REGION_1 instance group and the REGION_2 web server to the REGION_2 instance group

Region_1 Instance Group

Commands

gcloud compute instance-groups unmanaged add-instances \
"${REGION_1}-instance-group" --instances $REGION_1-instance \
--zone=$REGION_1_ZONE

Region_2 Instance Group

Commands

gcloud compute instance-groups unmanaged add-instances \
"${REGION_2}-instance-group" --instances $REGION_2-instance \
--zone=$REGION_2_ZONE

Create a client VM

We will use this VM to run tests and verify our DNS configuration. We are using a startup script to install the dnsutils package. Run the following commands in Cloud Shell.

Command

gcloud compute instances create client-instance --image-family=debian-11 \
--image-project=debian-cloud \
--zone=$REGION_1_ZONE \
--network-interface=network=my-vpc,subnet=${REGION_1}-subnet,no-address \
--tags=allow-ssh \
--metadata=startup-script='#! /bin/bash
    apt-get update
    apt-get install dnsutils -y'

8. Create L4 Internal Load Balancers

To create the L4 ILB, we need to create a health check, a backend service and a forwarding rule.

Create health check

Use the gcloud compute health-checks create command to create the health check. We are creating a basic http health check and the target port is port 80. Run the following commands in Cloud Shell

Command

gcloud compute health-checks create http http-hc --port 80

Configure backend services

Use the gcloud compute backend-services create command to create the backend service. Once the backend services are created, we will add the unmanaged instance groups to the backend services using the gcloud compute backend-services add-backend command. Run the following commands in Cloud Shell.

Create Backend Service

Commands

gcloud compute backend-services create $REGION_1-backend-service \
--load-balancing-scheme=INTERNAL --protocol=TCP \
--health-checks=http-hc --region=$REGION_1
gcloud compute backend-services create $REGION_2-backend-service \
--load-balancing-scheme=INTERNAL --protocol=TCP \
--health-checks=http-hc --region=$REGION_2

Add Backend

Command

gcloud compute backend-services add-backend $REGION_1-backend-service \
--instance-group=$REGION_1-instance-group \
--region=$REGION_1 \
--instance-group-zone=$REGION_1_ZONE
gcloud compute backend-services add-backend $REGION_2-backend-service \
--instance-group=$REGION_2-instance-group \
--region=$REGION_2 \
--instance-group-zone=$REGION_2_ZONE

Create forwarding rules

Use the gcloud compute forwarding-rules create command to create the forwarding rules in both regions. Run the following commands in Cloud Shell

REGION_1 forwarding rule

Commands

gcloud compute forwarding-rules create $REGION_1-ilb \
    --region=$REGION_1 \
    --load-balancing-scheme=internal \
    --network=my-vpc \
    --subnet=$REGION_1-subnet \
    --ip-protocol=TCP \
    --ports=80 \
    --backend-service=$REGION_1-backend-service \
    --backend-service-region=$REGION_1 \
    --allow-global-access

REGION_2 forwarding rule

gcloud compute forwarding-rules create $REGION_2-ilb \
    --region=$REGION_2 \
    --load-balancing-scheme=internal \
    --network=my-vpc \
    --subnet=$REGION_2-subnet \
    --ip-protocol=TCP \
    --ports=80 \
    --backend-service=$REGION_2-backend-service \
    --backend-service-region=$REGION_2 \
    --allow-global-access

9. Configure DNS

In this section, we will create the private zone and a DNS record set with the failover routing policy.

Create a private DNS zone

Use the gcloud dns managed-zones create command to create a private zone for example.com. We will use this zone to create a resource record set with failover routing policy. Run the following command in Cloud Shell

Commands

gcloud dns managed-zones create example-com \
--dns-name example.com. --description="My private zone" \
--visibility=private --networks my-vpc 

Create a DNS record with failover routing policy

Use the gcloud dns record-sets create command to create a DNS record with the failover routing policy. The primary target is the load balancer in the REGION_1. Cloud DNS only supports geo-based backup targets, the backup set is a geolocation policy with REGION_2 load balancer as target for both REGION_1 and REGION_2. Run the following commands in Cloud Shell

Command

gcloud dns record-sets create failover.example.com --ttl 5 --type A \
--routing-policy-type=FAILOVER \
--routing-policy-primary-data=$REGION_1-ilb \
--routing-policy-backup-data="${REGION_1}=${REGION_2}-ilb;${REGION_2}=${REGION_2}-ilb" \
--routing-policy-backup-data-type=GEO \
--zone=example-com \
--enable-health-checking

Output Example

NAME: failover.example.com.
TYPE: A
TTL: 5
DATA: Primary: "10.1.0.4, 80, tcp, https://www.googleapis.com/compute/v1/projects/my-clouddns-codelab/global/networks/my-vpc, my-clouddns-codelab, us-west1, regionalL4ilb" Backup: us-west1: "10.2.0.3, 80, tcp, https://www.googleapis.com/compute/v1/projects/my-clouddns-codelab/global/networks/my-vpc, my-clouddns-codelab, us-east4, regionalL4ilb";us-east4: "10.2.0.3, 80, tcp, https://www.googleapis.com/compute/v1/projects/my-clouddns-codelab/global/networks/my-vpc, my-clouddns-codelab, us-east4, regionalL4ilb"

10. Test DNS resolution

Before testing our failover setup, let's make a note of the IP addresses for both Internal Load Balancers. Run the following commands in Cloud Shell.

Command

gcloud compute forwarding-rules list --filter="name:($REGION_1-ilb $REGION_2-ilb)"

Output Example

In this example, the us-west1-ilb has an IP address of 10.1.0.4 and the us-east4-ilb has an IP address of 10.2.0.3

NAME: us-west1-ilb
REGION: us-west1
IP_ADDRESS: 10.1.0.4
IP_PROTOCOL: TCP
TARGET: us-west1/backendServices/us-west1-backend-service

NAME: us-east4-ilb
REGION: us-east4
IP_ADDRESS: 10.2.0.3
IP_PROTOCOL: TCP
TARGET: us-east4/backendServices/us-east4-backend-service

Now we will log in to the client-instance and test DNS resolution. In the web console, navigate to "Compute Engine | VM Instances"

5c824940bf414501.png

Click on the SSH button to login into the client-instance from the console.

b916eb32c60a4156.png

Now that we are in the client VM, use the dig command to resolve the failover.example.com domain name.

The loop is configured to run the command ten times with a sleep timer of 6 seconds.

Command

for i in {1..10}; do echo $i; dig failover.example.com +short; sleep 6; done

Since the TTL on the DNS record is set to 5 seconds, a sleep timer of 6 seconds has been added. The sleep timer will make sure that you get an uncached DNS response for each DNS request. This command will take approximately one minute to execute.

In the output you will see the IP address of the load balancer in the primary set of the resource record. In our setup this will be the IP of the load balancer in the us-west1 region.

11. Test failover

We will simulate a failover by removing the network tag from the REGION_1 VM. This will block access to port 80, and as a result, the health checks will start failing.

Remove the Network Tag

Use the gcloud compute instances remove-tags command to remove the network tag from the VM. Run the following command in Cloud Shell

Command

gcloud compute instances remove-tags $REGION_1-instance \
--zone=$REGION_1_ZONE --tags=allow-http

The health check will fail in 10 seconds. Run the DNS resolution test again.

DNS Resolution

From the client-instance run the following command

Command

for i in {1..10}; do echo $i; dig failover.example.com +short; sleep 6; done

In the output you will see the IP address of the load balancer in the backup set of the resource record. In our setup this will be the IP of the load balancer in the us-east4 region.

12. Test traffic trickling

By default, the failover policy returns the primary endpoint IP for all the DNS requests and only returns the backup IPs if the primary fails health checks. Cloud DNS allows users to configure Trickle Ratio which allows Cloud DNS to send a portion of the traffic to the backup targets, even when the primary targets are healthy. The ration must be a value between 0 and 1. The default value is 0

To test this out, let's add the network tag back to the REGION_1 web server.

Add Network Tag

Add the tag back to the Web Server VM to allow http traffic to the primary region VM. Run the following command in Cloud Shell.

Command

gcloud compute instances add-tags $REGION_1-instance \
--zone $REGION_1_ZONE --tags allow-http

The health checks will pass in 10 seconds

Verify that the DNS resolution points to the primary load balancer. In our setup this will be the IP address of the load balancer in the us-west1 region.

From the client-instance run the following command

Command

dig +short failover.example.com

Update the DNS Record

Now, we will modify the DNS record for failover.example.com to trickle 30% of the traffic to the backup set even when the primary is healthy. Run the following command in Cloud Shell

Command

gcloud dns record-sets update failover.example.com --ttl 30 --type A \
--routing-policy-type=FAILOVER \
--routing-policy-primary-data=$REGION_1-ilb \
--routing-policy-backup-data="${REGION_1}=${REGION_2}-ilb;${REGION_2}=${REGION_2}-ilb" \
--routing-policy-backup-data-type=GEO \
--zone=example-com --enable-health-checking \
--backup-data-trickle-ratio=0.3

DNS Resolution

Run the following command from the client VM. You will observe that the DNS record failover.example.com will resolve to the primary load balancer IP approx. 70% of the time and to the backup load balancer IP approx. 30% of the time.

for i in {1..10}; do echo $i; dig failover.example.com +short; sleep 6; done

13. Cleanup steps

In order to clean up the resources used in this lab, run the following commands from CloudShell

gcloud dns record-sets delete failover.example.com --type=A \
--zone=example-com --quiet

gcloud dns managed-zones delete example-com --quiet

gcloud compute forwarding-rules delete $REGION_1-ilb \
--region=$REGION_1 --quiet

gcloud compute forwarding-rules delete $REGION_2-ilb \
--region=$REGION_2 --quiet

gcloud compute backend-services delete $REGION_1-backend-service \
--region=$REGION_1 --quiet

gcloud compute backend-services delete $REGION_2-backend-service \
--region=$REGION_2 --quiet

gcloud compute health-checks delete http-hc --quiet

gcloud compute instances delete client-instance --zone=$REGION_1_ZONE --quiet

gcloud compute instance-groups unmanaged delete $REGION_1-instance-group \
--zone=$REGION_1_ZONE --quiet

gcloud compute instance-groups unmanaged delete $REGION_2-instance-group \
--zone=$REGION_2_ZONE --quiet

gcloud compute instances delete $REGION_1-instance \
--zone=$REGION_1_ZONE --quiet

gcloud compute instances delete $REGION_2-instance \
--zone=$REGION_2_ZONE --quiet

gcloud compute routers nats delete $REGION_1-nat-gw \
--router=$REGION_1-cloudrouter --region=$REGION_1 --quiet

gcloud compute routers nats delete $REGION_2-nat-gw \
--router=$REGION_2-cloudrouter --region=$REGION_2 --quiet

gcloud compute routers delete $REGION_1-cloudrouter \
--region=$REGION_1 --quiet

gcloud compute routers delete $REGION_2-cloudrouter \
--region=$REGION_2 --quiet

gcloud compute firewall-rules delete allow-ssh allow-http-lb-hc --quiet

gcloud compute networks subnets delete $REGION_1-subnet \
--region=$REGION_1 --quiet

gcloud compute networks subnets delete $REGION_2-subnet \
--region=$REGION_2 --quiet

gcloud compute networks delete my-vpc --quiet

14. Congratulations

Congratulations, you've successfully deployed and tested Cloud DNS failover routing policy

What we've covered

  • How to configure Cloud DNS failover routing policy
  • Test DNS failover
  • How to trickle traffic to backup set

What's next?

  • Try to set up multiple IPs for active and backup sets
  • Try adding multiple backend VMs to your unmanaged instance groups
  • Try to set up multiple load balancers in different regions for the geolocation policy in the backup set.

Learn more

https://cloud.google.com/dns/docs/zones/manage-routing-policies