BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes of data without needing a database administrator or any infrastructure to manage. BigQuery uses familiar SQL and a pay-only-for-what-you-use charging model. BigQuery allows you to focus on analyzing data to find meaningful insights.

In this codelab, you'll use the bq command-line tool to load a local CSV file into a new BigQuery table.

What you'll learn

What you'll need

Create a project

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project.

If you don't have any projects, click the create project button:

Otherwise, create a new project from the project selection menu:

Choose a project name and make note of the project ID.

The project ID is a unique name across all Google Cloud projects. It will be referred to later in this codelab as PROJECT_ID.

Enable BigQuery

Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources and enable the BigQuery API.

Running through this codelab should only use resources within the BigQuery free tier, but it could cost more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document). See the Google BigQuery pricing documentation for details.

New users of Google Cloud Platform are eligible for a $300 free trial.

Google Cloud Shell

While this codelab can be operated from your computer, in this codelab we will be using Google Cloud Shell, a command line environment running in the Cloud.

Activate Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click "Start Cloud Shell":

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID.

Run the following command in the cloud shell to confirm that you are authenticated:

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

Create a dataset to contain your tables.

What is a dataset?

A BigQuery dataset is a collection of tables. All tables in a dataset are stored in the same data location. You can also attach custom access controls to limit access to a dataset and its tables.

Creating a dataset

In the Cloud Shell, use the bq mk command to create a dataset called bq_load_codelab.

bq mk bq_load_codelab

Viewing dataset properties

Verify that you created the dataset by viewing the dataset's properties with the bq show command.

bq show bq_load_codelab

You should see output similar to:

Dataset my-project:bq_load_codelab

   Last modified           ACLs          Labels
 ----------------- -------------------- --------
  15 Jun 14:12:49   Owners:
                      projectOwners,
                      your-email@example.com
                    Writers:
                      projectWriters
                    Readers:
                      projectReaders

BigQuery can load data from several data formats, including newline-delimited JSON, Avro, and CSV. For simplicity, this codelab uses CSV.

Create a CSV file

In the Cloud Shell, create an empty CSV file.

touch customer_transactions.csv

Open the CSV in the Cloud Shell code editor by running the cloudshell edit command.

cloudshell edit customer_transactions.csv

It will open a new browser window with a code editor and Cloud Shell panel.

In the editor, enter some comma-separated values to load into BigQuery.

ID,Zipcode,Timestamp,Amount,Feedback,SKU
c123,78757,2018-02-14 17:01:39Z,1.20,4.7,he4rt5
c456,10012,2018-03-14 15:09:26Z,53.60,3.1,ppiieee
c123,78741,2018-04-01 05:59:47Z,5.98,2.0,ch0c0

Save the CSV with the File -> Edit menu button.

Use the bq load command to load your CSV into a BigQuery table.

bq load \
    --source_format=CSV \
    --skip_leading_rows=1 \
    bq_load_codelab.customer_transactions \
    ./customer_transactions.csv \
    id:string,zip:string,ttime:timestamp,amount:numeric,fdbk:float,sku:string

You used the following options:

The customer_transactions table uses the following schema:

Get the table details

Verify that the table loaded by showing the table properties.

bq show bq_load_codelab.customer_transactions

Output:

Table my-project:bq_load_codelab.customer_transactions

   Last modified          Schema          Total Rows   Total Bytes
 ----------------- --------------------- ------------ -------------
  15 Jun 15:13:55   |- id: string         3            159
                    |- zip: string
                    |- ttime: timestamp
                    |- amount: numeric
                    |- fdbk: float
                    |- sku: string

Now that your data is loaded, you can query it by using the BigQuery Web UI, the bq command, or the API. Your queries can join your data against any dataset (or datasets, so long as they all are in the same location) that you have permission to read.

Run a standard SQL query that joins your dataset with the zipcode public dataset and sums up transactions by U.S. state. Use the bq query command to execute the query.

bq query --nouse_legacy_sql '
SELECT SUM(c.amount) AS amount_total, z.state_code AS state_code
FROM `bq_load_codelab.customer_transactions` c
JOIN `bigquery-public-data.utility_us.zipcode_area` z
ON c.zip = z.zipcode
GROUP BY state_code
'

This command should output something like:

Waiting on bqjob_r26...05a15b38_1 ... (1s) Current status: DONE   
+--------------+------------+
| amount_total | state_code |
+--------------+------------+
|         53.6 | NY         |
|         7.18 | TX         |
+--------------+------------+

The query you just ran used both a public dataset and your own private dataset. Learn more by reading this commented version of the same query:

#standardSQL
SELECT
  /* Total of all transactions in the state. */
  SUM(c.amount) AS amount_total,

  /* State corresponding to the transaction's zipcode. */
  z.state_code AS state_code

/* Query the table you just constructed.
 * Note: If you omit the project from the table ID,
 *       the dataset is read from your project. */
FROM `bq_load_codelab.customer_transactions` c

/* Join the table to the zipcode public dataset. */
JOIN `bigquery-public-data.utility_us.zipcode_area` z

/* Find the state corresponding to the transaction's zipcode. */
ON c.zip = z.zipcode

/* Group over all transactions by state. */
GROUP BY state_code

Optionally, delete the dataset you created with the bq rm command. Use the -r flag to remove any tables it contains.

bq rm -r bq_load_codelab

You just uploaded a table to BigQuery and queried it!

What we've covered

Learn More