Last Updated: 2019-10-25

This codelab demonstrates a data ingestion pattern to ingest FHIR STU3 formatted healthcare data (Regular Resources and Bundles) into BigQuery using Cloud Healthcare FHIR APIs. Realistic healthcare test data has been generated and made available in the Google Cloud Storage bucket (gs://hcls_testing_data_fhir_10_patients/) for you.

In this code lab you will learn:

What do you need to run this demo?

If you don't have a GCP Project, follow these steps to create a new GCP Project.

FHIR STU3 resources and bundles in NDJSON format has been pre-loaded into GCS bucket at the following locations:

All of the resources above have new line delimiter JSON (NDJSON) file format but different content structure:

If you need a new dataset, you can always generate it using SyntheaTM. Then, upload it to GCS instead of copying it from the bucket at the Copy input data step.

Follow these steps to enable Healthcare API and grant required permissions:

Initialize shell variables for your environment

To find the PROJECT_NUMBER and PROJECT_ID, refer to Identifying projects.

<!-- CODELAB: Initialize shell variables -->
PROJECT_ID=<PROJECT_ID>
PROJECT_NUMBER=<PROJECT_NUMBER>
BUCKET_NAME=<BUCKET_NAME>
DATASET_ID=<DATASET_ID>
FHIR_STORE=<FHIR_STORE>

Enable Healthcare API

Following steps will enable Healthcare APIs in your GCP Project and it will add Healthcare API service account to the project.

  1. Go to the GCP Console API Library.
  2. From the projects list, select your project.
  3. In the API Library, select the API you want to enable. If you need help finding the API, use the search field and the filters.
  4. On the API page, click ENABLE.

Grant Permissions

Before importing FHIR resources from Cloud Storage and exporting to BigQuery, you must grant additional permissions to the Cloud Healthcare Service Agent service account. For more information, see FHIR store Cloud Storage and FHIR store BigQuery permissions.

Grant Storage Admin Permission

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com \
    --role=roles/storage.admin

Grant BigQuery Admin Permissions

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:service-$PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com \
    --role=roles/bigquery.admin

Follow these steps to ingest data from NDJSON files to healthcare dataset in BigQuery using Cloud Healthcare FHIR APIs:

Create a storage bucket and copy input data

Create GCS bucket to store input data and error logs using gsutil tool

gsutil mb -l us gs://$BUCKET_NAME

Copy input data

gsutil -m cp -r gs://hcls_testing_data_fhir_10_patients/fhir_stu3*ndjson gs://$BUCKET_NAME

Create Healthcare Dataset and FHIR Store

Create Healthcare dataset using Cloud Healthcare APIs

gcloud beta healthcare datasets create $DATASET_ID --location=us-central1

Create FHIR Store in dataset using Cloud Healthcare APIs

gcloud beta healthcare fhir-stores create $FHIR_STORE \
  --dataset=$DATASET_ID --location=us-central1

Import test data from Google Cloud Storage to FHIR Store.

We will use preloaded files from GCS Bucket. These files contain FHIR STU3 regular resources and bundles in the NDJSON format. Use one or both. As a response, you will get OPERATION_NUMBER, which can be used in the validation step.

Regular Resources

gcloud beta healthcare fhir-stores import gcs $FHIR_STORE \
  --dataset=$DATASET_ID --async \
  --gcs-uri=gs://$BUCKET_NAME/fhir_stu3_ndjson/**.ndjson \
  --location=us-central1 --content-structure=RESOURCE

Transaction Bundles

gcloud beta healthcare fhir-stores import gcs $FHIR_STORE \
  --dataset=$DATASET_ID --async \
  --gcs-uri=gs://$BUCKET_NAME/fhir_stu3_transaction_ndjson/transaction.ndjson \
  --location=us-central1 --content-structure=BUNDLE

Collection Bundles

gcloud beta healthcare fhir-stores import gcs $FHIR_STORE \
  --dataset=$DATASET_ID --async \
  --gcs-uri=gs://$BUCKET_NAME/fhir_stu3_collection_ndjson/collection.ndjson  \
  --location=us-central1 --content-structure=BUNDLE

Validate

The validate operation finished successfully. It might take a few minutes of operation to finish, so you might need to repeat this command a few times with some delay.

gcloud beta healthcare operations describe OPERATION_NUMBER \  
--dataset=$DATASET_ID --location=us-central1 

Create a BigQuery Dataset

bq mk --location=us --dataset $PROJECT_ID:$DATASET_ID

Export healthcare data from FHIR Store to BigQuery Dataset

gcloud beta healthcare fhir-stores export bq $FHIR_STORE \
  --dataset=$DATASET_ID --location=us-central1 --async \
  --bq-dataset=bq://$PROJECT_ID.$DATASET_ID \
  --schema-type=analytics

Validate

Validate operation finished successfully

gcloud beta healthcare operations describe OPERATION_NUMBER \  
--dataset=$DATASET_ID --location=us-central1

Validate if BigQuery Dataset has all 17 tables

bq ls $PROJECT_ID:$DATASET_ID

Congratulations, you've successfully completed the code lab to ingest healthcare data in BigQuery using Cloud Healthcare APIs.

You imported FHIR STU3 data from Google Cloud Storage into the Cloud Healthcare FHIR APIs.

You exported data from the Cloud Healthcare FHIR APIs to BigQuery.

You now know the key steps required to start your Healthcare Data Analytics journey with BigQuery on Google Cloud Platform.