1. Introduction
BigQuery is a serverless, highly scalable, and cost-effective data warehouse. Simply move your data into BigQuery and let us handle the hard work so you can focus on what truly matters, running your business. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
In this lab, you will discover the analytical possibilities of the BigQuery. You'll learn how to import a dataset from a Google Cloud Storage bucket and get a grasp of the BigQuery UI by working with a Retail banking dataset. Additionally, this lab will teach you how to uncover key features in BigQuery that make your day to day analytics much easier such as exporting query results in a spreadsheet, viewing and running queries from your query history, viewing query performance, and creating table views for to be used by other teams and departments.
What you will learn
In this lab, you learn how to perform the following tasks:
- Loading new data into BigQuery
- Become familiar with the BigQuery UI
- Running Queries in BigQuery
- View Query Performance
- Creating Views in BigQuery
- Securely share datasets with others
2. Introduction: Understanding BigQuery UI
In this section you will learn how to navigate the BigQuery UI, view available datasets and run a simple query.
Loading BQ UI
- Type in "BigQuery" located at the top of the Google Cloud Platform Console.
- Select BigQuery from the option list. Be sure to select the option that has the BigQuery logo, the magnifying glass.
Viewing Datasets and Running Queries
- In the left pane in the Resource section, click on your BigQuery project.
- Click on
bq_demo
to view the tables in that dataset - In the type to search box, type "card" to see a list of tables and datasets that contain "card" in their name.
- Select "card_transactions" table from the search results list
- Click on the Details tab under the
card_transactions
pane to view the metadata for this table. - Click on the Preview tab to see a preview of the table
[Competitive Talking Point]: Integration with the Google Data Catalog means that BigQuery metadata can be managed along with other data sources, such as data lakes or operational data sources. This is one example that shows that Google Cloud is not just a relational data warehouse, it is an entire Analytical Data platform.
- Click the magnifying glass icon to query the "card_transactions" table. An auto-generated text will populate the BigQuery query editor.
- Enter the code below to show us distinct merchants from the Card_Transactions table
SELECT distinct (merchant) FROM bq_demo.card_transactions LIMIT 1000
- Click the Run button to run the query.
3. Creating datasets and sharing views
Sharing data and governance is crucial, this can be done intuitively in the BQ UI. In this section you will learn how to create a new dataset, populate it with a view, and share that dataset.
Viewing Query History
- Click "Query History" in the left pane of the GCP Console
- Click refresh in the Query History pane
- Click the download image/arrow on the far right of the query to view the results of the query.
Creating a new dataset
- Select [your project name] in the resources pane of the BigQuery UI.
- Select "Create new Dataset" from the project information pane
- For Dataset Id:
bq_demo_shared
- Leave all other fields as defaults
- Click "Create Dataset"
Creating Views
[Competitive Talking Point]: BigQuery is fully ANSI SQL compliant and supports both simple and complex multi-table joins and rich analytical functions. We have continuously released enhanced support for common SQL data types and functions used in traditional data warehouses to ease the migration process.
- Select "Compose New Query" at the top of the Query Editor pane.
- Insert the following code in the query editor
WITH revenue_by_month AS (
SELECT
card.type AS card_type,
FORMAT_DATE('%Y-%m', trans_date) as revenue_date,
SUM(amount) as revenue
FROM bq_demo.card_transactions
JOIN bq_demo.card ON card_transactions.cc_number = card.card_number
WHERE trans_date DATE_ADD(CURRENT_DATE, INTERVAL -1 YEAR)
GROUP BY card_type, revenue_date
)
SELECT
card_type,
revenue_date,
revenue as monthly_rev,
revenue - LAG(revenue) OVER (ORDER BY card_type, revenue_date ASC) as rev_change
FROM revenue_by_month
ORDER BY card_type, revenue_date ASC;
- Click "Save View"
- Select your current project for Project Name
- Select the newly created Dataset:
bq_demo_shared
- For Table Name:
rev_change_by_card_type
- Click Save.
Sharing Views and Datasets
- Select the "bq_demo_shared" dataset from the left resource pane in the BigQuery UI.
- Click "Share Dataset" from In the dataset information pane
- Enter in an email address
- Select "BigQuery Data Viewer" from the Role dropdown menu
- Click "Add"
- Click Done
Explore Data in Sheets
[Competitive Talking Point]: Another benefit of BigQuery compared to its competitors is the BI Engine. BI Engine can be used to make BI type summary queries return in less than a second through in-memory caching engine. This is currently supported by Google Data Studio but will soon be available to accelerate all queries in BigQuery.
For example:
Snowflake relies on 3rd party BI tools for dashboards and data visualization while GCP offers a range of integrated BI tools, including Connected Sheets, Data Studio, and Looker.
- Select the "rev_change_by_card_type" view from the left resource pane in the BigQuery UI.
- Click on the magnifying glass to query the view
- Type:
SELECT *
FROM bq_demo_shared.rev_change_by_card_type
- Click Run
- Click on the "Export" Icon from the Results Pane
- Select "Explore Data with Sheets"
- Click"Start Analyzing"
- Select "Pivot Table"
- Select "New Sheet"
- Click "Create"
- Add "revenue_date" under the Row section of the Pivot Table Editor located on the right of the Sheets window
- Add "card_type" under the Column section of the Pivot Table Editor
- Add "monthly_rev" under the Column section of the Pivot Table Editor
- Click Apply
- Navigate to the top robbin of the Sheets UI and select Insert Chart
4. Setup: Data Integration
In this section you will learn how to create a new table and perform a JOINS on one of the many public datasets that Google Cloud has available.
[Competitive Talking Point]:
BigQuery has supported shared data sets for years. Customers in any project can query both public data sets and data sets in other projects that have been shared with them.
BigQuery can support data lakes in GCS through the use of external tables. In addition to bulk loading, BigQuery supports the ability to stream data into the database at rates upwards of hundreds of MB per second. Snowflake has no support for streaming data.
Importing Data to a new table
- In the resources pane select the bq_demo dataset
- In the dataset information pane select "Create Table"
- Select Google Cloud Storage for Source
- In the file path text box:
gs://retail-banking-looker/district
- Select CSV for File Format
- Enter "district" for Table Name
- Select the checkbox for Auto Detect schema
- Click Create Table
Querying Public Dataset
- In the query editor enter the following query:
SELECT
CAST(geo_id as STRING) AS zip_code,
total_pop,
median_age,
households,
income_per_capita,
housing_units,
vacant_housing_units_for_sale,
ROUND(SAFE_DIVIDE(employed_pop, pop_16_over),4) AS rate_employment,
ROUND(SAFE_DIVIDE(bachelors_degree_or_higher_25_64, pop_25_64),4) AS rate_bachelors_degree_or_higher_25_64
FROM
`bigquery-public-data.census_bureau_acs.zip_codes_2017_5yr`;
- Click Run
- View the Results
- Now we will combine this public data with another query. Enter the following SQL Code in the Query Editor:
WITH customer_counts AS (
select regexp_extract(address, "[0-9][0-9][0-9][0-9][0-9]") as zip_code,
count(*) as num_clients
FROM bq_demo.client
GROUP BY zip_code
)
SELECT
CAST(geo_id as STRING) AS zip_code,
total_pop,
median_age,
households,
income_per_capita,
ROUND(SAFE_DIVIDE(employed_pop, pop_16_over),4) AS rate_employment,
num_clients
FROM
`bigquery-public-data.census_bureau_acs.zip_codes_2017_5yr`
JOIN customer_counts on zip_code = geo_id
ORDER BY num_clients DESC
- Click Run
- View the Results
5. Capacity Management
Working with slots and reservations
BQ offers multiple pricing models to meet your needs. Most large customers primarily leverage flat rate for predictable pricing with reserved capacity. For bursting beyond that baseline capacity, BQ offers flex slots which allow you to grow into additional capacity on the fly and then automatically shrink back with no impact on running queries. BQ also has a byte scan model which allows you to only pay for the queries you run.
[Competitive Talking Point: Some competitors work exclusively on a fixed capacity model where customers have to allocate a virtual warehouse for each workload in their organization. In addition to a low-cost per-query model that makes it easy to get started with BigQuery, we support a flat rate capacity pricing model where idle capacity can be shared among a set of workloads.]
- Go to the reservations tab.
- Click on "Buy Slots"
- Select "Flex" as duration.
- Select 500 slots.
- Confirm purchase.
- Click View Slot Commitments.
- Click "Create Reservation"
- User "demo" as reservation name
- Select United States as location
- Type 500 for slots (all available)
- Click Assignments
- Pick current project for organization project
- Select "demo" for reservation ID
- Click Create."