BigQuery UI Navigation and Data Exploration Codelab

1. Introduction

BigQuery is a serverless, highly scalable, and cost-effective data warehouse. Simply move your data into BigQuery and let us handle the hard work so you can focus on what truly matters, running your business. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.

In this lab, you will discover the analytical possibilities of the BigQuery. You'll learn how to import a dataset from a Google Cloud Storage bucket and get a grasp of the BigQuery UI by working with a Retail banking dataset. Additionally, this lab will teach you how to uncover key features in BigQuery that make your day to day analytics much easier such as exporting query results in a spreadsheet, viewing and running queries from your query history, viewing query performance, and creating table views for to be used by other teams and departments.

What you will learn

In this lab, you learn how to perform the following tasks:

Loading new data into BigQuery
Become familiar with the BigQuery UI
Running Queries in BigQuery
View Query Performance
Creating Views in BigQuery
Securely share datasets with others

2. Introduction: Understanding BigQuery UI

In this section you will learn how to navigate the BigQuery UI, view available datasets and run a simple query.

Loading BQ UI

Type in "BigQuery" located at the top of the Google Cloud Platform Console.
Select BigQuery from the option list. Be sure to select the option that has the BigQuery logo, the magnifying glass.

Viewing Datasets and Running Queries

In the left pane in the Resource section, click on your BigQuery project.
Click on bq_demo to view the tables in that dataset
In the type to search box, type "card" to see a list of tables and datasets that contain "card" in their name.
Select "card_transactions" table from the search results list

Click on the Details tab under the card_transactions pane to view the metadata for this table.
Click on the Preview tab to see a preview of the table

[Competitive Talking Point]: Integration with the Google Data Catalog means that BigQuery metadata can be managed along with other data sources, such as data lakes or operational data sources. This is one example that shows that Google Cloud is not just a relational data warehouse, it is an entire Analytical Data platform.

Click the magnifying glass icon to query the "card_transactions" table. An auto-generated text will populate the BigQuery query editor.
Enter the code below to show us distinct merchants from the Card_Transactions table

SELECT distinct (merchant) FROM bq_demo.card_transactions LIMIT 1000

Click the Run button to run the query.

3. Creating datasets and sharing views

Sharing data and governance is crucial, this can be done intuitively in the BQ UI. In this section you will learn how to create a new dataset, populate it with a view, and share that dataset.

Viewing Query History

Click "Query History" in the left pane of the GCP Console
Click refresh in the Query History pane
Click the download image/arrow on the far right of the query to view the results of the query.

Creating a new dataset

Select [your project name] in the resources pane of the BigQuery UI.
Select "Create new Dataset" from the project information pane
For Dataset Id:

bq_demo_shared

Leave all other fields as defaults
Click "Create Dataset"

Creating Views

[Competitive Talking Point]: BigQuery is fully ANSI SQL compliant and supports both simple and complex multi-table joins and rich analytical functions. We have continuously released enhanced support for common SQL data types and functions used in traditional data warehouses to ease the migration process.

Select "Compose New Query" at the top of the Query Editor pane.
Insert the following code in the query editor

WITH revenue_by_month AS (
SELECT
    card.type AS card_type,
    FORMAT_DATE('%Y-%m', trans_date) as revenue_date,
    SUM(amount) as revenue
FROM bq_demo.card_transactions
JOIN bq_demo.card ON card_transactions.cc_number = card.card_number
WHERE trans_date  DATE_ADD(CURRENT_DATE, INTERVAL -1 YEAR)
GROUP BY card_type, revenue_date
)
SELECT
    card_type,
    revenue_date,
    revenue as monthly_rev,
    revenue -  LAG(revenue) OVER (ORDER BY card_type, revenue_date ASC) as rev_change
FROM revenue_by_month
ORDER BY card_type, revenue_date ASC;

Click "Save View"
Select your current project for Project Name
Select the newly created Dataset:

bq_demo_shared

For Table Name:

rev_change_by_card_type

Click Save.

Select the "bq_demo_shared" dataset from the left resource pane in the BigQuery UI.
Click "Share Dataset" from In the dataset information pane
Enter in an email address
Select "BigQuery Data Viewer" from the Role dropdown menu
Click "Add"
Click Done

Explore Data in Sheets

[Competitive Talking Point]: Another benefit of BigQuery compared to its competitors is the BI Engine. BI Engine can be used to make BI type summary queries return in less than a second through in-memory caching engine. This is currently supported by Google Data Studio but will soon be available to accelerate all queries in BigQuery.

For example:

Snowflake relies on 3rd party BI tools for dashboards and data visualization while GCP offers a range of integrated BI tools, including Connected Sheets, Data Studio, and Looker.

Select the "rev_change_by_card_type" view from the left resource pane in the BigQuery UI.
Click on the magnifying glass to query the view
Type:

SELECT *

FROM bq_demo_shared.rev_change_by_card_type

Click Run
Click on the "Export" Icon from the Results Pane
Select "Explore Data with Sheets"

Click"Start Analyzing"
Select "Pivot Table"
Select "New Sheet"
Click "Create"
Add "revenue_date" under the Row section of the Pivot Table Editor located on the right of the Sheets window
Add "card_type" under the Column section of the Pivot Table Editor
Add "monthly_rev" under the Column section of the Pivot Table Editor
Click Apply

Navigate to the top robbin of the Sheets UI and select Insert Chart

4. Setup: Data Integration

In this section you will learn how to create a new table and perform a JOINS on one of the many public datasets that Google Cloud has available.

[Competitive Talking Point]:

BigQuery has supported shared data sets for years. Customers in any project can query both public data sets and data sets in other projects that have been shared with them.

BigQuery can support data lakes in GCS through the use of external tables. In addition to bulk loading, BigQuery supports the ability to stream data into the database at rates upwards of hundreds of MB per second. Snowflake has no support for streaming data.

Importing Data to a new table

In the resources pane select the bq_demo dataset
In the dataset information pane select "Create Table"
Select Google Cloud Storage for Source
In the file path text box:

gs://retail-banking-looker/district

Select CSV for File Format
Enter "district" for Table Name
Select the checkbox for Auto Detect schema
Click Create Table

Querying Public Dataset

In the query editor enter the following query:

SELECT
    CAST(geo_id as STRING) AS zip_code,
    total_pop,
    median_age,
    households,
    income_per_capita,
    housing_units,
    vacant_housing_units_for_sale,
    ROUND(SAFE_DIVIDE(employed_pop, pop_16_over),4) AS rate_employment,
    ROUND(SAFE_DIVIDE(bachelors_degree_or_higher_25_64, pop_25_64),4) AS rate_bachelors_degree_or_higher_25_64
  FROM
    `bigquery-public-data.census_bureau_acs.zip_codes_2017_5yr`;

Click Run
View the Results

Now we will combine this public data with another query. Enter the following SQL Code in the Query Editor:

WITH customer_counts AS (
    select regexp_extract(address, "[0-9][0-9][0-9][0-9][0-9]") as zip_code, 
    count(*) as num_clients
    FROM bq_demo.client
    GROUP BY zip_code
    )
SELECT 
    CAST(geo_id as STRING) AS zip_code,
    total_pop,
    median_age,
    households,
    income_per_capita,
    ROUND(SAFE_DIVIDE(employed_pop, pop_16_over),4) AS rate_employment,
    num_clients
FROM
    `bigquery-public-data.census_bureau_acs.zip_codes_2017_5yr`
JOIN customer_counts on zip_code = geo_id
ORDER BY num_clients DESC

Click Run
View the Results

5. Capacity Management

Working with slots and reservations

BQ offers multiple pricing models to meet your needs. Most large customers primarily leverage flat rate for predictable pricing with reserved capacity. For bursting beyond that baseline capacity, BQ offers flex slots which allow you to grow into additional capacity on the fly and then automatically shrink back with no impact on running queries. BQ also has a byte scan model which allows you to only pay for the queries you run.

[Competitive Talking Point: Some competitors work exclusively on a fixed capacity model where customers have to allocate a virtual warehouse for each workload in their organization. In addition to a low-cost per-query model that makes it easy to get started with BigQuery, we support a flat rate capacity pricing model where idle capacity can be shared among a set of workloads.]

Go to the reservations tab.

Click on "Buy Slots"

Select "Flex" as duration.
Select 500 slots.
Confirm purchase.

Click View Slot Commitments.
Click "Create Reservation"
User "demo" as reservation name
Select United States as location
Type 500 for slots (all available)
Click Assignments
Pick current project for organization project
Select "demo" for reservation ID
Click Create."