Uptime Checks is a service of Cloud Monitoring. You configure the service to check your system's health by sending requests to your applications, services, or URLs from various locations around the world. You can use the results of the checks as conditions in your alert policies, so you will be notified if system health is degraded.
An Alert Policy is a set of rules that determine whether your resources or groups are operating normally. The rules are logical conditions involving metric thresholds and uptime checks. For example, you can create a rule that your web site's average response latency must not exceed five seconds over a period of two minutes.
An alert occurs when an alert policy's conditions are met, causing an Incident to appear in the Incidents section of the Cloud Monitoring Console. Incidents remain open until the alert policy rules are no longer in violation or until the incident is manually closed.
You can associate notifications with alert policies. For example, alerts can send email or SMS notifications to people or services.
In this codelab, you'll learn how to create an Uptime check on a Compute Engine instance, attach an alerting policy to it, so that an incident from that policy will be created to notify you when the machine goes down.
This walkthrough shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running - make sure to clean up the project afterwards.
New users of Google Cloud Platform are eligible for a $300 free trial.
Before we can enable monitoring, we will need some kind of infrastructure within this Google Cloud Platform project to actually monitor, so let us create that now.
We will create a Compute Engine instance with NGINX through the GCP Marketplace, so that we have a URL we can hit with a HTTP request to see if our resource is up and running.
Note: The first time you access Compute Engine, it will need to be enabled. This can take a minute or two, so please be patient.
To create the virtual machine:
We now have a resource that we can monitor!
Before we can use Stackdriver Monitoring, it must first be enabled for your project.
To use Stackdriver Monitoring with one of your projects, do the following:
You are now looking at the Stackdriver Monitoring Console. The information shown will vary depending on the Google (and AWS) services you are using and the monitoring features you have set up.
Now that monitoring is enabled, we want to create an Uptime Check. An uptime check is a process to make sure that a given resource is up and running all the time. There are a variety of ways that uptime checks can be made, including: HTTP, HTTPS, UDP and TCP.
For the purposes of this Code Lab, we will create a HTTP uptime check, to monitor our recently created NGINX web server.
To create the Uptime Check, click the "Create Check" button you will find on the monitoring dashboard.
From there, select the following options:
Click test to make sure that your Uptime Check works correctly. You should get back a message with ‘responded with 200(OK)'.
Click saveto save your Uptime Check.
Click No Thanks on the Alerting Policy question - we will do this in the next section.
Congratulations, You have now successfully created a Uptime Check!
Creating an Uptime Check is only half the battle. You will need something to notify you when a Uptime Check fails. This is where an Alerting Policy comes into effect.
There are multiple ways to create an Alerting Policy (as we saw earlier), but to create an Alerting Policy directly from your Uptime Check:
This sets the conditions that we want this alert to occur - in this case, we want it to fail when the HTTP check we configured earlier fails.
This should already be set with the Uptime Check we configured in the last step:
Now we need to configure how we want to be notified. There are lots of options, including PagerDuty integration, SMS, Slack, Hipchat, etc, but the easiest option for now is Email, so let's configure that:
You have now successfully created an Alerting Policy for whenever the NGINX server goes down for five minutes. An email will be sent to you when it goes down, and an Incident will be created in the Google Cloud Monitoring Console.
It is often useful to include documentation with your alerts, outlining what the alert is for, and possible fixes or troubleshooting steps. For this code lab we will not add any documentation, but it is something you should consider for production systems.
This gives a convenient name to the Alerting Policy, so it can be recognisable when it creates an Incident.
You now have a Compute Engine instance that has it's uptime state monitored by a Uptime Check and a Alerting Policy