This Codelab covers streaming analytics on events coming from an app in Firebase, with the use of several services such as Cloud Firestore, Cloud Functions, Cloud Pub/Sub, Cloud Dataflow and BigQuery.

What you'll learn:

What you'll need:

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).

New users of Google Cloud Platform are eligible for a $300 free trial.

IAM Permission Needed - Owner

This lab requires the provisioning and enablement of multiple Google Cloud Platform and Firebase services. You will need Owner permission on the project being used for this lab.

To confirm you have the Owner permission on the project, under the Google Cloud Platform console (console.cloud.google.com), navigate to "IAM & admin", "IAM", and confirm that "Owner" role is next to your email address (you may see other service accounts in your project):

Cloud Shell

Activate Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click "Start Cloud Shell":

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID.

Run the following command in the cloud shell to confirm that you are authenticated:

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

In your Cloud Shell, execute the following to enable the Dataflow API (if not enabled already):

gcloud services enable dataflow.googleapis.com

By enabling Dataflow API, it should also enable a couple of other APIs. You can check which ones have been enabled by running the following:

gcloud services list --format 'value(config.name)' | sort

This command should display a list of APIs enabled in your project, with key ones like bigquery-json.googleapis.com, dataflow.googleapis.com and pubsub.googleapis.com that we are using in this lab.

Google Cloud Storage allows object storage and retrieval in various regions. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.

Buckets are the basic containers of Google Cloud Storage that hold your data. Everything that you store in Cloud Storage must be contained in a bucket. In this lab, the bucket will be used in various ways:

Create the GCS Bucket using gsutil under Cloud Shell

Choose a region for your bucket. This region will also be used for Dataflow later.

For this lab, let's use us-central1. Copy and paste this onto Cloud Shell:

REGION=us-central1

Choose a name for your bucket (without the brackets or gs:// prefix). You are welcome to choose any names for the lab, although the bucket name must be unique across all of Cloud Storage. So if you choose an obvious name, such as "test", you will probably find that someone else has already created a bucket with that name, and will receive an error.

For the purpose of this lab, we will use the following, which is the project ID, with "-mdp-lab-gcs" as the suffix.

GCS_BUCKET=${GOOGLE_CLOUD_PROJECT}-mdp-lab-gcs && echo $GCS_BUCKET

There are also some rules regarding what characters are allowed in bucket names. If you start and end your bucket name with a letter or number, and only use dashes in the middle, then you'll be fine. If you try to use special characters, or try to start or end your bucket name with something other than a letter or number, the dialog box will remind you of the rules.

Run this command to create it.

gsutil mb -c regional -l ${REGION} gs://${GCS_BUCKET}

If you encounter an error saying "ServiceException: 409 Bucket test already exists", you will need to go back to the GCS_BUCKET command, pick another name and try the "gsutil mb" command again.

You can confirm the bucket creation in GCP console by going to Storage -> Browser, and it should display the newly created bucket:

Cloud Pub/Sub is a simple, reliable, scalable foundation for stream analytics and event-driven computing systems. As part of Google Cloud's stream analytics solution, the service ingests event streams and delivers them to Cloud Dataflow for processing and BigQuery for analysis as a data warehousing solution. Relying on the Cloud Pub/Sub service for delivery of event data allows you to implement use cases such as:

We will start with creating a Pub/Sub topic.

Let's go into Cloud Shell again. Create the Cloud Pub/Sub topic "triviagameevents":

gcloud pubsub topics create triviagameevents

In the GCP console, you can go to Pub/Sub -> Topics, and verify the newly created topic.

Firebase lets you build more powerful, secure and scalable apps. We are using several Firebase components in this lab:

Create a Firebase project

Navigate to the Firebase console. click on Add Project.

Select the project name used in the previous section. Accept the terms and choose "Continue". In this lab, you are NOT required to enable "Use the default settings for sharing Google Analytics for Firebase data".

You can leave the next set of settings unchecked, and click "Add Firebase".

Confirm the Firebase billing plan, which is "Pay as you go". The charges will be made against the credits in your GCP account.

Enable Google Auth

To allow users to sign-in on the web app we'll use Google auth which needs to be enabled.

In the Firebase Console open the DEVELOP section > Authentication > SIGN IN METHOD tab (click here to go to the page directly) you need to enable the Google Sign-in Provider and click SAVE. This will allow users to sign-in the Web app with their Google accounts

Retrieve the code

Under Cloud Shell, retrieve the code containing the sample firebase code:

cd ~
gsutil cp gs://mdp-next18-lab/quizgame.tar.gz .
tar zxvf quizgame.tar.gz

Enter the newly cloned 📁 quizgame directory. This directory contains the code for the fully functional Firebase Web App.

cd quizgame

Configure the Firebase Command Line Interface

Cloud Shell should come with the firebase Command Line Interface already installed. Make sure you are in the ~/quizgame directory, then set up the Firebase CLI to use your Firebase Project:

firebase use --add

Then select your Project ID and follow the instructions. When prompted, you can choose the alias for this project, which you can input a value like "staging".

Substitute configuration variables

You will need to replace some configuration variables in the source code. We will use command line to retrieve and replace the variables.

Run this command to examine the variables. These are variables that are to be placed on the client side app to help with conectivity to the Firebase services.

firebase setup:web

We will use some quick sed commands to reformat it to the format for the app's code.

FIREBASE_APIKEY=$(firebase setup:web | grep '^  "apiKey' | cut -d'"' -f4)
FIREBASE_DATABASEURL=$(firebase setup:web | grep '^  "databaseURL' | cut -d'"' -f4)
FIREBASE_STORAGEBUCKET=$(firebase setup:web | grep '^  "storageBucket' | cut -d'"' -f4)
FIREBASE_AUTHDOMAIN=$(firebase setup:web | grep '^  "authDomain' | cut -d'"' -f4)
FIREBASE_MESSAGINGSENDERID=$(firebase setup:web | grep '^  "messagingSenderId' | cut -d'"' -f4)
FIREBASE_PROJECTID=$(firebase setup:web | grep '^  "projectId' | cut -d'"' -f4)
echo FIREBASE_APIKEY=${FIREBASE_APIKEY}
echo FIREBASE_DATABASEURL=${FIREBASE_DATABASEURL}
echo FIREBASE_STORAGEBUCKET=${FIREBASE_STORAGEBUCKET}
echo FIREBASE_AUTHDOMAIN=${FIREBASE_AUTHDOMAIN}
echo FIREBASE_MESSAGINGSENDERID=${FIREBASE_MESSAGINGSENDERID}
echo FIREBASE_PROJECTID=${FIREBASE_PROJECTID}

The variables are to be substituted into the file src/main.js.

sed -i "s~^  apiKey:.*$~  apiKey: '${FIREBASE_APIKEY}',~g" src/main.js
sed -i "s~^  databaseURL:.*$~  databaseURL: '${FIREBASE_DATABASEURL}',~g" src/main.js
sed -i "s~^  storageBucket:.*$~  storageBucket: '${FIREBASE_STORAGEBUCKET}',~g" src/main.js
sed -i "s~^  authDomain:.*$~  authDomain: '${FIREBASE_AUTHDOMAIN}',~g" src/main.js
sed -i "s~^  messagingSenderId:.*$~  messagingSenderId: '${FIREBASE_MESSAGINGSENDERID}',~g" src/main.js
sed -i "s~^  projectId:.*$~  projectId: '${FIREBASE_PROJECTID}'~g" src/main.js
echo "Completed substitution"

Confirm the successful substition by looking at src/main.js again.

sed -n /^firebase.initializeApp/,/}\)/p src/main.js

You should see something like this, with values specific to your project:

Examine firebase.json

Firebase comes with a hosting service that will serve your static assets/web app. You deploy your files to Firebase Hosting using the Firebase CLI. Before deploying you need to specify which files will be deployed in your firebase.json file. We have already done this for you because this was required to serve the file for development through this lab. These settings are specified under the hosting attribute in the firebase.json file:

cat firebase.json

And this should be on the output:

This will tell the CLI that we want to deploy the files with the exception of the files listed in the ignore array.

Enable Cloud Firestore

For this particular app, we are using Cloud Firestore, which you will need to enable in the Firebase Console. Under Firebase console, navigate to Database, and click "Create database".

Choose "Start in locked mode" when prompted, then click "Enable".

Build and deploy the app

Go back to Cloud Shell. Run these commands to build the files for deployment.

cd ~/quizgame/ && npm install && npm run build && (cd functions && npm install)

Now deploy your files to Firebase static hosting by running firebase deploy. :

firebase deploy

This is the console output you should see:

The web app should now be served from your Hosting URL - which is of the form https://<hosting-id>.firebaseapp.com.

Load the questions and answers

Under the quizgame directory, there should be a file called questionLoader.js. This script contains code to load the questions into Cloud Firestore.

ls -l util/questionLoader.js

Execute the script to insert data for our app into Cloud Firestore. These are the questions and answers to be used by the app.

node util/questionLoader.js

You can check that they're loaded using the Firebase console. Navigate to Database, and you should be able to see the list of questions.

Run the app

Open it in a standard browser window (not incognito). You should be able to see your deployed app. Leave this tab open, and we will come back to this later:

Now we are going to run a Dataflow pipeline that retrieves the messages from the Pub/Sub topic, and writes the data to BigQuery.

Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed. And with its serverless approach to resource provisioning and management, you have access to virtually limitless capacity to solve your biggest data processing challenges, while paying only for what you use.

BigQuery is Google's serverless, highly scalable, low cost enterprise data warehouse. Because there is no infrastructure to manage, you can focus on analyzing data to find meaningful insights using familiar SQL and you don't need a database administrator.

Create the BigQuery dataset and table

In Cloud Shell, enter the following commands:

BQ_DATA_LOCATION=US
BQ_DATASET=raw
BQ_TABLE=events

#Now create the BQ dataset
bq mk --data_location=${BQ_DATA_LOCATION} ${BQ_DATASET}

Then create the BigQuery destination table:

bq mk --schema gameevents-bigquery-schema.json --table ${BQ_DATASET}.${BQ_TABLE}

Execute the Template from the Console

Go to GCP Console. On the left navigation bar, choose "Pub/Sub" -> "Topics", then click into the Topic name ending with "triviagameevents". You should have a screen like the following. Click "Export To":

Choose "BigQuery" at the popup.

Click "Continue" at the prompt.

You should be brought to the "Create job from template" screen. Certain variables should have been pre-populated:

You will need to enter a few parameters:

Click the "Run Job" button. It should display a screen like this:

Examine the Cloud Pub/Sub subscription

In the GCP console, you can go to Pub/Sub -> Subscriptions, and verify a subscription has been automatically created by the Dataflow pipeline.

We will run a second pipeline to write the same data from Pub/Sub to a GCS (Google Cloud Storage) bucket in Avro format.

Dataflow has the ability to write to multiple destinations within a single pipeline. But in this lab, we are creating two separate pipelines with separate subscriptions to the same topic, so that the pulling of the messages are independent of each other. This is a useful pattern if the subscribers could be consuming the messages at different pace (e.g. downtime in a particular downstream subscriber application) and do not want to slow down the message consumption of each other.

Execute the Template from the Console

Again, go to GCP Console. On the left navigation bar, choose "Pub/Sub" -> "Topics", then click into the Topic name ending with "triviagameevents". You should have a screen like the following. Click "Export To":

Choose "Cloud Storage Avro file" at the popup.

Click "Continue" at the prompt.

You should be brought to the "Create job from template" screen. Certain variables should have been pre-populated:

You will need to enter a few parameters:

Click the "Run Job" button. It should display a screen like this:

Examine the Cloud Pub/Sub subscription

In the GCP console, you can go to Pub/Sub -> Subscriptions, and verify another subscription has been automatically created by the Dataflow pipeline.

Play the game!

Back to your Firebase Web App UI. It should be in the form of https://<hosting-id>.firebaseapp.com

Click the "Sign in with Google", and you should be redirected to choose a Google account for signing in.

After signing in, you should be able to the trivia questions! Play a few questions to generate data to be processed by the data pipelines. (The first question may take slightly longer to be processed

BigQuery

Open the BigQuery UI in the Google Cloud Platform Console. You may get an authentication prompt if this is your first time here.

On the left rail, you should be able to see your project name with the "raw" dataset underneath. Expand it and you should see the "events" table.

Click "events", which should bring up the schema on the right hand side. Click "Query Table".

You should see a query editor.

Click "Compose Query" and copy-and-paste the following query into the New Query box.

SELECT substr(userId,-6,6) user,count(isAnswerCorrect) as score 
FROM raw.events
where isAnswerCorrect=true
group by user
order by score desc

You should be able to see the user (as appeared in the top right-hand corner of the Firebase Web App) with the count of correct answers next to it. Feel free to open another browser session and sign in with a different Google account to generate more data.

Go back and answer more questions on the app, and observe how the data gets updated in BigQuery in near real-time.

Avro on GCS

Navigate to the GCS Storage Browser. Click on the bucket created earlier (ending with -mdp-lab-gcs), you should see an "avro" sub-directory, with the Avro files created underneath. This is a useful pattern of archiving the messages as separate copies for potential re-processing in the future.

You can clean up your resources from the Google Cloud Platform Console.

Stopping Dataflow pipelines

On the Dataflow screen, click into each of the pipelines, then "Stop job" on the console.

You will be prompted for "Cancel" or "Drain". In this case you can choose "Cancel". (To learn about the differences, click on the "Read more about stopping Dataflow jobs" for details.)

Perform these steps for each of the Dataflow pipelines you have started in the lab. The jobs should say "Canceled" in the Status column.

Delete the BigQuery Dataset

Under the BigQuery UI, select the dropdown on the "raw" dataset, and choose Delete dataset.

Removing the Cloud Storage Bucket if needed

If you have newly created the Cloud Storage bucket and no longer need it, you can navigate to the Cloud Storage browser screen, and remove the newly created bucket.

Delete the Cloud Pub/Sub Subscription

Under Pub/Sub -> Topics, check the box and click "Delete" to delete the subscription.

Delete the Cloud Functions

Under Cloud Functions, highlight both the "answerSubmit" and "publishMessageToTopic" functions, and delete them.

Delete the Firebase Hosting Resources

Under Cloud Shell, run the following command to stop the Firebase hosting:

cd ~/quizgame && firebase hosting:disable

Then, the Firebase console, select the project used in this lab, and then choose "Hosting". There should be a Disabled status and a Deployed status. Hover to the right edge of the "Deployed" row, and Delete the resources.

Delete the Firebase Cloud Firestore Data

In the Firebase console, under the Database tab, you should be able to see the data for the questions and users. To delete the data, highlight "questions" and then choose "Delete all documents" in the 2nd column. Repeat this for "users".

Revert the Firebase Authentication Settings

In the Firebase console, under the Authentication tab, you can delete the users by highlighting each row and choose "Delete Account".

Under the "Sign-in method" tab, you can revert the Google's Sign-in-provider status to "Disabled".

You learned how to build modern data pipelines on Google Cloud Platform, with the use of several services such as Cloud Firestore, Cloud Functions, Cloud Pub/Sub, Cloud Dataflow and BigQuery.