In this lab, you'll learn how to build a time-series forecasting model using AutoML and with TensorFlow, and then learn how to deploy these models with the Google Cloud AI Platform.

What you learn

You'll learn how to:

The focus of this codelab is on how to apply time-series forecasting techniques using the Google Cloud Platform. It isn't a general time-series forecasting course, but a brief tour of the concepts may be helpful for our users.

Time Series Data

First, what is a time series? It's a dataset with data recorded at regular time intervals. A time-series dataset contains both time and at least one variable that is dependent on time.

Components

A time-series can be decomposed into components:

There can be multiple layers of seasonality. For example, a call center might see a pattern in call volume on certain days of the week as well as on given months. The residual might be able to be explained by other variables besides time.

Stationarity

For best results in forecasting, time-series data should be made stationary, where statistical properties such as mean and variance are constant over time. Techniques such as differencing and detrending can be applied to raw data to make it more stationary.

For example, the plot below of CO2 concentration shows a repeating yearly pattern with an upward trend. (Source)

After removing the linear trend, the data is more suitable for forecasting, as it now has a constant mean.

Using Time Series Data for Machine Learning

To use time-series data in a machine learning problem, it needs to be transformed so that previous values can be used to predict future values. This table shows an example of how lagged variables are created to help predict the target.

Now that we've covered some fundamentals, let's get started with exploring data and forecasting!

You'll need a Google Cloud Platform project with billing enabled to run this codelab. To create a project, follow the instructions here.

Step 1: Access the BigQuery public dataset

Follow this link to access the Iowa Public Liquor Sales dataset from the BigQuery Public Datasets. Review the material about the dataset so that you are familiar with the problem.

Now that we have gone through a brief introduction to the data, let's now set up our model development environment.

Step 1: Enable APIs

The BigQuery connector uses the BigQuery Storage API. Search for the BigQuery Storage API in the console and enable the API if it is currently disabled.

Step 2: Create an AI Platform Notebooks instance

Navigate to AI Platform Notebooks section of your Cloud Console and click New Instance. Then select the latest TensorFlow Enterprise 2.x instance type without GPUs:

Use the default options and then click Create. Once the instance has been created, select Open JupyterLab:

Then, create a Python 3 notebook from JupyterLab:

Step 3: Download lab materials


Create a new Terminal window from the JupyterLab interface: File -> New -> Terminal.

From there, clone the source material with this command:

git clone https://github.com/GoogleCloudPlatform/training-data-analyst

In this lab, you will:

Step 1

In AI Platform Notebooks, navigate to training-data-analyst/courses/ai-for-time-series/notebooks and open 01-explore.ipynb.

Step 2

Clear all the cells in the notebook (Edit > Clear All Outputs), change the region, project and bucket settings in one of the first few cells, and then Run the cells one by one.

In this lab, you will:

Step 1

In AI Platform Notebooks, navigate to training-data-analyst/courses/ai-for-time-series/notebooks and open 02-model.ipynb.

Step 2

Clear all the cells in the notebook (Edit > Clear All Outputs), change the region, project and bucket settings in one of the first few cells, and then Run the cells one by one.

In this lab, you will:

Step 1

In AI Platform Notebooks, navigate to training-data-analyst/courses/ai-for-time-series/notebooks and open 03-cloud-training.ipynb.

Step 2

Clear all the cells in the notebook (Edit > Clear All Outputs), change the region, project and bucket settings in one of the first few cells, and then Run the cells one by one.

In this section, you will try applying the concepts you learned to a new dataset!

We won't provide detailed instructions, just some hints (if you want them!).

The goal is to predict 311 service requests from the City of New York. These non-emergency requests include noise complaints, street light issues, etc.

Step 1

Let's start by understanding the dataset.

First, access the City of New York 311 Service Requests dataset.

To get to know the data better, try out a couple of the sample queries listed in the dataset description:

In the BigQuery UI, select Create Query to see how to access the dataset. Note the select statement is querying from bigquery-public-data.new_york_311.311_service_requests.

Step 2

We're ready to get started. In this section, make modifications to the Explore and Visualize notebook to work with this data.

In the liquor sales data set, we explored categories of liquor. When you reach that section, instead of using liquor categories, try using the complaint_type of the 311 service request. There are many, many complaint-types, so to keep the visualization understandable, is recommended to visualize the most frequent subset:


Hints

SELECT * FROM `bigquery-public-data.new_york_311.311_service_requests` LIMIT 5
SELECT
  complaint_type,
  count(unique_key) as y,
  date_trunc(DATE(created_date), month) as ds  
FROM `bigquery-public-data.new_york_311.311_service_requests`
WHERE complaint_type in ('Noise - Residential','HEAT/HOT WATER','Street Condition','Illegal Parking','Blocked Driveway','Street Light Condition')
GROUP by complaint_type, ds ORDER BY ds asc, complaint_type asc
df_complaint_type = df_monthly_by_complaint_type.groupby('complaint_type').sum().sort_values(by=target_col, ascending=False)
_ = sns.lineplot(x=ts_col, y=target_col, hue='complaint_type', data=df_monthly_by_complaint_type)
SELECT count(unique_key) as y, date_trunc(DATE(created_date), day) as ds FROM `bigquery-public-data.new_york_311.311_service_requests`
group by ds
order by ds

Step 3

Let's now create a time-series model with the daily data.

Hints

Step 4

For a final challenge, let's predict with monthly data, which will require several changes to the parameters:

n_features = 1  # Holidays aren't included in the monthly data set we created)
n_input_steps = 12 # Lookback window of 12 months
n_output_steps = 1 # Predict one month ahead
n_seasons = 12 # For the statistical model, use yearly periodicity (12 months)
freq = 'MS' # Set the dataframe index frequency to the abbreviation for monthly
interval = d.DateOffset(months=1) # Used for the statistical model

Hints

If you'd like to continue using this notebook, it is recommended that you turn it off when not in use. From the Notebooks UI in your Cloud Console, select the notebook and then select Stop:

If you'd like to delete all the resources you've created in this lab, simply Delete the notebook instance instead of stopping it.

Using the Navigation menu in your Cloud Console, browse to Storage and delete both buckets you created to store your model assets.