BigQuery is a fully-managed, petabyte-scale, low-cost enterprise data warehouse for analytics. BigQuery is serverless. You do not need to set up and manage clusters.
A BigQuery dataset resides in a project. A dataset contains one or more tables with data.
In this codelab, you will use the BigQuery web UI to query public datasets, load your own data, and export data to a Cloud Storage bucket.
For more information about BigQuery, see BigQuery documentation.
To complete this lab, you need:
Open the Cloud Platform Console. Click Select a project, and then click the + icon to create a project.
In the New Project dialog, for Project name, type BQProject. Click Create.
Open the Stackoverflow dataset. The BigQuery web UI opens in a new browser tab. The BigQuery web UI displays the tables in the Stackoverflow dataset. The navigation pane lists other public datasets. Each dataset comprises one or more tables.
For more information about all the public datasets available in BigQuery, see Google BigQuery Public Datasets.
Click Compose Query. In the New Query text area, copy and paste the following SQL query. The query uses the
#standardSQL prefix to specify that BigQuery should treat this query as standard SQL. The UI validates the query and displays a green checkmark below the text area to indicate that the syntax is valid.
#standardSQL SELECT badge_name AS First_Gold_Badge, COUNT(1) AS Num_Users, ROUND(AVG(tenure_in_days)) AS Avg_Num_Days FROM ( SELECT badges.user_id AS user_id, badges.name AS badge_name, TIMESTAMP_DIFF(badges.date, users.creation_date, DAY) AS tenure_in_days, ROW_NUMBER() OVER (PARTITION BY badges.user_id ORDER BY badges.date) AS row_number FROM `bigquery-public-data.stackoverflow.badges` badges JOIN `bigquery-public-data.stackoverflow.users` users ON badges.user_id = users.id WHERE badges.class = 1 ) WHERE row_number = 1 GROUP BY First_Gold_Badge ORDER BY Num_Users DESC LIMIT 10
Click Run Query. The query generates the top 10 gold badges, ranked by how many users got them as their first gold badges. The query also determines how many days it took for these gold badges to be obtained on average.
You can also query sets of tables using wildcard table names. For more information, see Querying sets of tables using wildcard tables.
Download the baby names zip file. The zip file, provided by the US Social Security Administration, contains approximately 7 MB of data about popular baby names.
Unzip the file in a local folder and note its location.
In the BigQuery web UI's navigation pane, next to the project name BQProject, click the down arrow icon, and click Create new dataset.
In the Create Dataset dialog, for Dataset ID, type babynames. Leave all of the other default settings in place and click OK.
In the navigation pane, hover over the babynames dataset ID that you just created. Click the down arrow icon next to the ID and click Create new table.
In the Source Data section, click the Choose file button. Navigate to the data you unzipped earlier, and select the yob2014.txt file.
In the Destination Table section, for the destination table name, type names_2014
In the Schema section, click Edit as Text.
Replace the contents of the Schema input area with the following schema:
Click Create Table.
Switch to the Cloud Platform Console browser tab. In the Cloud Platform Console's Products & Services menu, click Storage, and then click Create bucket.
In the Create a bucket page, specify a unique bucket name, Regional storage class, and us-central1 regional location. Click Create. Note the bucket name. You will use this bucket to store the data exported from BigQuery.
Switch to the BigQuery web UI browser tab. In the navigation pane, under the babynames dataset, hover over the names_yob2014 table. Click the down arrow icon, and then click Export Table.
In the Export to Google Cloud Storage dialog, select CSV as the export format.
For Google Cloud Storage URI, type <export bucket name>/babynames2014.csv. Replace the angle brackets and the text inside with the name of the bucket you created earlier.
After the export completes, switch to the Cloud Platform Console browser tab. Refresh the page to view the complete contents of the Cloud Storage bucket. The babynames2014.csv object in the bucket contains the exported data.
In the BigQuery web UI's navigation pane, hover over the babynames dataset ID. Click the down arrow icon next to the ID and click Delete dataset.
In the Delete Dataset dialog, for Dataset ID, type babynames, and click OK.
Switch to the Cloud Platform Console browser tab. In the navigation pane, under Storage, click Browser.
Select the bucket that contains your BigQuery export data, and click Delete.
In the Product & Services menu, click IAM & Admin.
Click Settings, then click the Delete.
In the Shut down project dialog, read the notes and then, type the project ID.
Click Shut down.
Close the Cloud Platform Console.
Congratulations! You have now learned how to use the BigQuery web UI to query public datasets, export data, and load your own data. Note that you did not have to set up or manage clusters to work with datasets.