Uptime Checks is a service of Cloud Monitoring. You configure the service to check your system's health by sending requests to your applications, services, or URLs from various locations around the world. You can use the results of the checks as conditions in your alert policies, so you will be notified if system health is degraded.
An Alert Policy is a set of rules that determine whether your resources or groups are operating normally. The rules are logical conditions involving metric thresholds and uptime checks. For example, you can create a rule that your web site's average response latency must not exceed five seconds over a period of two minutes.
An alert occurs when an alert policy's conditions are met, causing an Incident to appear in the Incidents section of the Cloud Monitoring Console. Incidents remain open until the alert policy rules are no longer in violation or until the incident is manually closed.
You can associate notifications with alert policies. For example, alerts can send email or SMS notifications to people or services.
In this codelab, you'll learn how to create an Uptime check on a Compute Engine instance, attach an alerting policy to it, so that an incident from that policy will be created to notify you when the machine goes down.
If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:
Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as
Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources.
Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).
New users of Google Cloud Platform are eligible for a $300 free trial.
Before we can enable monitoring, we will need some kind of infrastructure within this Google Cloud Platform project to actually monitor, so let us create that now.
We will create a Compute Engine instance with NGINX through Cloud Launcher, so that we have a URL we can hit with a HTTP request to see if our resource is up and running.
To create the virtual machine:
We now have a resource that we can monitor!
Before we can use Stackdriver Monitoring, it must first be enabled for your project.
To use Stackdriver Monitoring with one of your projects, do the following:
You are now looking at the Stackdriver Monitoring Console. The information shown will vary depending on the Google (and AWS) services you are using and the monitoring features you have set up, but it will look something like the following:
Now that monitoring is enabled, we want to create an Uptime Check. An uptime check is a process to make sure that a given resource is up and running all the time. There are a variety of ways that uptime checks can be made, including: HTTP, HTTPS, UDP and TCP.
For the purposes of this Code Lab, we will create a HTTP uptime check, to monitor our recently created NGINX web server.
To create the Uptime Check, click the "Create Check" button you will find on the monitoring dashboard:
From there, select the following options, so that the form looks like the below screenshot:
Congratulations, You have now successfully created a Uptime Check!
Creating an Uptime Check is only half the battle. You will need something to notify you when a Uptime Check fails. This is where an Alerting Policy comes into effect.
There are multiple ways to create an Alerting Policy (as we saw earlier), but to create an Alerting Policy directly from your Uptime Check:
This sets the conditions that we want this alert to occur - in this case, we want it to fail when the HTTP check we configured earlier fails.
This should already be set with the Uptime Check we configured in the last step:
Now we need to configure how we want to be notified. There are lots of options, including PagerDuty integration, SMS, Slack, Hipchat, etc, but the easiest option for now is Email, so let's configure that:
You have now successfully created an Alerting Policy for whenever the NGINX server goes down for five minutes. An email will be sent to you when it goes down, and an Incident will be created in the Google Cloud Monitoring Console.
It is often useful to include documentation with your alerts, outlining what the alert is for, and possible fixes or troubleshooting steps. For this code lab we will not add any documentation, but it is something you should consider for production systems.
This gives a convenient name to the Alerting Policy, so it can be recognisable when it creates an Incident.
You now have a Compute Engine instance that has it's uptime state monitored by a Uptime Check and a Alerting Policy