Last Updated: 2019-05-28

This codelab demonstrates a data ingestion pattern to ingest FHIR STU3 formatted healthcare data (Regular Resources and Bundles) into BigQuery using Cloud Healthcare FHIR APIs. Test but realistic healthcare data has been generated and made available in Google Cloud Storage bucket for you.

In this code lab you will learn:

What do you need to run this demo?

  1. You need access to a GCP Project.
  2. You need Owner role to the GCP Project.
  3. FHIR STU3 resources in NDJSON format (content-structure=RESOURCE) OR
  4. FHIR STU3 bundles in NDJSON format (content-structure=BUNDLE)

If you don't have a GCP Project, follow these steps to create a new GCP Project.

If you have a GCP Project make sure you have Owner role to the GCP Project.

FHIR STU3 resources and bundles in NDJSON format has been pre-loaded into GCS bucket at:

All of the resources above have new line delimiter JSON (NDJSON) file format but different content structure:

If you need a new dataset, you can always generate it using SyntheaTM. Then upload it to GCS instead of copying it from bucket at Copy input data step.

Follow these steps to enable Healthcare API and grant required permissions:

Initialize shell variables for your environment

To find the PROJECT_NUMBER and PROJECT_ID, refer to Identifying projects.

<!-- CODELAB: Initialize shell variables -->
PROJECT_ID=<PROJECT_ID>
PROJECT_NUMBER=<PROJECT_NUMBER>
BUCKET_NAME=<BUCKET_NAME>
DATASET_ID=<DATASET_ID>
FHIR_STORE=<FHIR_STORE>

Enable Healthcare API

Go to the GCP Console API Library.

  1. From the projects list, select your project.
  2. In the API Library, select the API you want to enable. If you need help finding the API, use the search field and/or the filters.
  3. On the API page, click ENABLE.

This will add Healthcare API service account to the project.

Grant Permissions

Before importing FHIR resources from Cloud Storage and exporting to BigQuery, you must grant additional permissions to the Cloud Healthcare Service Agent service account. For more information, see FHIR store Cloud Storage and FHIR store BigQuery permissions.

Grant Storage Admin Permission

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com \
    --role=roles/storage.admin

Grant BigQuery Admin Permissions

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com \
    --role=roles/bigquery.admin

Follow these steps to ingest data from NDJSON files to healthcare dataset in BigQuery using Cloud Healthcare FHIR APIs:

Create a storage bucket and copy input data

Create GCS bucket to store input data and error logs using gsutil tool

gsutil mb -l us gs://$BUCKET_NAME

Copy input data

gsutil -m cp -r gs://hcls-public-data-fhir-subset/fhir_stu3*ndjson gs://$BUCKET_NAME

Create Healthcare Dataset and FHIR Store

Create Healthcare dataset using Cloud Healthcare APIs

gcloud alpha healthcare datasets create $DATASET_ID --location=us-central1

Create FHIR Store in dataset using Cloud Healthcare APIs

gcloud alpha healthcare fhir-stores create $FHIR_STORE \
  --dataset=$DATASET_ID --location=us-central1

Import test data from Google Cloud Storage to FHIR Store.

We will use preloaded files from GCS Bucket. These files contains FHIR STU3 regular resources and bundles in NDJSON format. Use one or both. If import is not successful the errors are also written back into GCS Bucket to location set with parameter --error-gcs-uri. As a response you will get OPERATION_NUMBER, which can be used in the validation step.

Regular Resources

gcloud alpha healthcare fhir-stores import $FHIR_STORE \
  --dataset=$DATASET_ID --async \
  --error-gcs-uri=gs://$BUCKET_NAME/fhir_stu3_ndjson/error \
  --gcs-uri=gs://$BUCKET_NAME/fhir_stu3_ndjson/**.ndjson \
  --location=us-central1 --content-structure=RESOURCE

Transaction Bundles

gcloud alpha healthcare fhir-stores import $FHIR_STORE \
  --dataset=$DATASET_ID --async \
  --error-gcs-uri=gs://$BUCKET_NAME/fhir_stu3_transaction_ndjson/error \
  --gcs-uri=gs://$BUCKET_NAME/fhir_stu3_transaction_ndjson/transaction.ndjson \
  --location=us-central1 --content-structure=BUNDLE

Collection Bundles

gcloud alpha healthcare fhir-stores import $FHIR_STORE \
  --dataset=$DATASET_ID --async \
  --error-gcs-uri=gs://$BUCKET_NAME/fhir_stu3_collection_ndjson/error \
  --gcs-uri=gs://$BUCKET_NAME/fhir_stu3_collection_ndjson/collection.ndjson  \
  --location=us-central1 --content-structure=BUNDLE

Validate

Validate operation finished successfully. It might take a few minutes operation to finish, so you might need to repeat this command few times with some delay.

gcloud alpha healthcare operations describe OPERATION_NUMBER \  
--dataset=$DATASET_ID --location=us-central1 

Create a BigQuery Dataset

bq mk --location=us --dataset $PROJECT_ID:$DATASET_ID

Export healthcare data from FHIR Store to BigQuery Dataset

gcloud alpha healthcare fhir-stores export $FHIR_STORE \
  --dataset=$DATASET_ID --location=us-central1 --async \
  --bq-dataset=bq://$PROJECT_ID.$DATASET_ID \
  --schema-type=analytics

Validate

Validate operation finished successfully

gcloud alpha healthcare operations describe OPERATION_NUMBER \  
--dataset=$DATASET_ID --location=us-central1

Validate if BigQuery Dataset has all 17 tables

bq ls $PROJECT_ID:$DATASET_ID

Congratulations, you've successfully completed the code lab to ingest healthcare data in BigQuery using Cloud Healthcare APIs

You imported FHIR STU3 data from Google Cloud Storage into Cloud Healthcare FHIR APIs.

Your exported data from Cloud Healthcare FHIR APIs to BigQuery.

You now know the key steps required to start your Healthcare Data Analytics journey with BigQuery on Google Cloud Platform.