In this lab, you learn how to build up a complex BigQuery using clauses, subqueries, built-in functions and joins.

What you need

To complete this lab, you need:

Access to a supported Internet browser:

A Google Cloud Platform project

What you learn

In this lab, you:

Step 1

Open the Google Cloud Console (in the incognito window) and using the menu, navigate into BigQuery web UI, click on the Compose Query button on top left, and then click on Show Options, and ensure you are using Standard SQL. You are using Standard SQL if the "Use Legacy SQL" checkbox is unchecked.

Step 2

Click Compose Query.

Step 3

In the New Query window, type (or copy-and-paste) the following query:

SELECT
  airline,
  date,
  departure_delay
FROM
  `bigquery-samples.airline_ontime_data.flights`
WHERE
  departure_delay > 0
  AND departure_airport = 'LGA'
LIMIT
  100

What does this query do? ______________________

Step 4

Click Run Query.

Step 1

To the previous query, add an additional clause to filter by date and group the results by airline. Because you are grouping the results, the SELECT statement will have to use an aggregate function. In the New Query window, type the following query:

SELECT
  airline,
  COUNT(departure_delay)
FROM
   `bigquery-samples.airline_ontime_data.flights`
WHERE
  departure_airport = 'LGA'
  AND date = '2008-05-13'
GROUP BY
  airline
ORDER BY airline

Step 2

Click Run Query. What does this query do? ______________________________________________________

What is the number you get for American Airlines (AA)?

______________________________________________________

Step 3

Now change the query slightly:

SELECT
  airline,
  COUNT(departure_delay)
FROM
   `bigquery-samples.airline_ontime_data.flights`
WHERE
  departure_delay > 0 AND
  departure_airport = 'LGA'
  AND date = '2008-05-13'
GROUP BY
  airline
ORDER BY airline

Step 4

Click Run Query. What does this query do? ______________________________________________________

What is the number you get for American Airlines (AA)?

______________________________________________________

Step 5

The first query returns the total number of flights by each airline from La Guardia, and the second query returns the total number of flights that departed late. (Do you see why?)

How would you get both the number delayed as well as the total number of flights?

______________________________________________________

______________________________________________________

Step 6

Run this query:

SELECT
  f.airline,
  COUNT(f.departure_delay) AS total_flights,
  SUM(IF(f.departure_delay > 0, 1, 0)) AS num_delayed
FROM
   `bigquery-samples.airline_ontime_data.flights` AS f
WHERE
  f.departure_airport = 'LGA' AND f.date = '2008-05-13'
GROUP BY
  f.airline

Step 1

In the New Query window, type the following query:

SELECT
  CONCAT(CAST(year AS STRING), '-', LPAD(CAST(month AS STRING),2,'0'), '-', LPAD(CAST(day AS STRING),2,'0')) AS rainyday
FROM
  `bigquery-samples.weather_geo.gsod`
WHERE
  station_number = 725030
  AND total_precipitation > 0

Step 2

Click Run Query.

Step 3

How would you do the airline query to aggregate over all these dates instead of just ‘2008-05-13'?

______________________________________________________

You could use a JOIN, as shown next.

Step 1

In the New Query window, type the following query:

SELECT
  f.airline,
  SUM(IF(f.arrival_delay > 0, 1, 0)) AS num_delayed,
  COUNT(f.arrival_delay) AS total_flights
FROM
  `bigquery-samples.airline_ontime_data.flights` AS f
JOIN (
  SELECT
    CONCAT(CAST(year AS STRING), '-', LPAD(CAST(month AS STRING),2,'0'), '-', LPAD(CAST(day AS STRING),2,'0')) AS rainyday
  FROM
    `bigquery-samples.weather_geo.gsod`
  WHERE
    station_number = 725030
    AND total_precipitation > 0) AS w
ON
  w.rainyday = f.date
WHERE f.arrival_airport = 'LGA'
GROUP BY f.airline

Step 2

Click Run Query. How would you get the fraction of flights delayed for each airline?

You could put the entire query above into a subquery and then select from the columns of this result

Step 1

In the New Query window, type the following query:

SELECT
  airline,
  num_delayed,
  total_flights,
  num_delayed / total_flights AS frac_delayed
FROM (
SELECT
  f.airline AS airline,
  SUM(IF(f.arrival_delay > 0, 1, 0)) AS num_delayed,
  COUNT(f.arrival_delay) AS total_flights
FROM
  `bigquery-samples.airline_ontime_data.flights` AS f
JOIN (
  SELECT
    CONCAT(CAST(year AS STRING), '-', LPAD(CAST(month AS STRING),2,'0'), '-', LPAD(CAST(day AS STRING),2,'0')) AS rainyday
  FROM
    `bigquery-samples.weather_geo.gsod`
  WHERE
    station_number = 725030
    AND total_precipitation > 0) AS w
ON
  w.rainyday = f.date
WHERE f.arrival_airport = 'LGA'
GROUP BY f.airline
  )
ORDER BY
  frac_delayed ASC

Step 2

Click Run Query

In this lab, you:

©Google, Inc. or its affiliates. All rights reserved. Do not distribute.