Using the Document AI API with Python

c65b9ae04aa1853.png

What is Document AI?

The Document AI API is a document understanding solution that takes unstructured data, such as documents, emails, and so on, and makes the data easier to understand, analyze, and consume by providing structure through content classification, entity extraction, advanced searching, and more.

In this tutorial, you will focus on using the Document AI API with Python by looking at how this product solves issues in certain scenarios.

What you'll learn

  • How to use Cloud Shell
  • How to enable the Document AI API
  • How to authenticate API requests
  • How to install the client library for Python
  • How to parse data from an invoice
  • How to parse data from a scanned form

What you'll need

  • A Google Cloud Project
  • A Browser, such as Chrome or Firefox
  • Knowledge of Python 3

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud services?

Novice Intermediate Proficient

Self-paced environment setup

  1. Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one.)

dMbN6g9RawQj_VXCSYpdYncY-DbaRzr2GbnwoV7jFf1u3avxJtmGPmKpMYgiaMH-qu80a_NJ9p2IIXFppYk8x3wyymZXavjglNLJJhuXieCem56H30hwXtd8PvXGpXJO9gEUDu3cZw

ci9Oe6PgnbNuSYlMyvbXF1JdQyiHoEgnhl4PlV_MFagm2ppzhueRkqX4eLjJllZco_2zCp0V0bpTupUSKji9KkQyWqj11pqit1K1faS1V6aFxLGQdkuzGp4rsQTan7F01iePL5DtqQ

8-tA_Lheyo8SscAVKrGii2coplQp2_D1Iosb2ViABY0UUO1A8cimXUu6Wf1R9zJIRExL5OB2j946aIiFtyKTzxDcNnuznmR45vZ2HMoK3o67jxuoUJCAnqvEX6NgPGFjCVNgASc-lg

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

  1. Next, you'll need to enable billing in Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost much, if anything at all. Be sure to to follow any instructions in the "Cleaning up" section which advises you how to shut down resources so you don't incur billing beyond this tutorial. New users of Google Cloud are eligible for the $300USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this tutorial you will be using Cloud Shell, a command line environment running in the Cloud.

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

From the GCP Console click the Cloud Shell icon on the top right toolbar:

E0b6xMEnCN6XCtm5OITZ-CHPnhUsO3WrGGJFu0Yr587eWRPZG2xj4U9wHbTxF8d1LTHnk5yzgMxEbhAmTCwbNH8rMoQV70pEkLkz54gtUHD7kRtiSI_2EqrighTDFbuoO0Z146CC3Q

It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:

kXnInzErAjsyeUStcIiTdNi179GwXpgp-2YTay2z0DW_7PoZ7uPWiKlaYk0LXNwv2kvkqUZEjhWjAgwNsgkX4Kpkhu8duXo5FTsog9bM405TSmdC_BUIX4ywkMV-tEc1VHtUzdTykg

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this lab can be done with simply a browser.

Before using the Translation API you must enable it. Enter the following command in the Cloud Shell:

gcloud services enable documentai.googleapis.com

In order to make requests to the Document AI API you need to use a Service Account. A service account belongs to your project. Service accounts allow the Google Client Python library to make Translation API requests. Like any other user account, a service account is represented by an email address. In this section you'll use the Cloud SDK to create and authenticate a service account.

First, set an environment variable with your PROJECT_ID which you'll use throughout this tutorial:

export PROJECT_ID=$(gcloud config get-value core/project)

Test that it was set correctly:

echo $PROJECT_ID

yourproject-XXXX

Create a new service account to access the Document AI API:

gcloud iam service-accounts create my-documentai-sa \
  --display-name "my document AI service account"

Grant your service account the Cloud Document AI API Owner role.

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member serviceAccount:my-documentai-sa@${PROJECT_ID}.iam.gserviceaccount.com \
  --role roles/owner

Create credentials that your Python code will use to log in as your new service account. The credentials are saved as a JSON file ~/key.json:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account  my-documentai-sa@${PROJECT_ID}.iam.gserviceaccount.com

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable. This variable will let the Translation API Python library find your credentials. The environment variable should be set to the full path of the credentials JSON file you created:

export GOOGLE_APPLICATION_CREDENTIALS=~/key.json

For more information, see the Authentication overview page.

Install the client library:

pip3 install --upgrade google-cloud-documentai

You should see something like this:

...
Installing collected packages: google-cloud-documentai
Successfully installed google-cloud-documentai-0.2.0

Now, you're ready to use the Document AI API!

In this tutorial, you'll use an interactive Python interpreter called IPython. Start a session by running ipython in Cloud Shell. This command runs the Python interpreter in an interactive session.

ipython

You should see something like this:

Python 3.7.3 (default, Mar 31 2020, 14:50:17)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

Case Scenario: You have been emailed an invoice (shown below) and instead of typing out the information in order to save it to your database, since you get 100+ invoices everyday, you decide to use the Document AI API to programmatically extract the information you need.

8ed11311f16ec24.png

In this section, you will process this invoice in order to see what text the Document AI is able to extract.

Copy the following code into your iPython session:

from google.cloud import documentai_v1beta2 as documentai


def parse_invoice(project_id='YOUR_PROJECT_ID',
         input_uri='gs://cloud-samples-data/documentai/invoice.pdf'):
    """Procsingle document with the Document AI API, including
    text extraction and entity extraction."""

    client = documentai.DocumentUnderstandingServiceClient()

    gcs_source = documentai.types.GcsSource(uri=input_uri)

    # mime_type can be application/pdf, image/tiff,
    # and image/gif, or application/json
    input_config = documentai.types.InputConfig(
        gcs_source=gcs_source, mime_type='application/pdf')

    # Location can be 'us' or 'eu'
    parent = 'projects/{}/locations/us'.format(project_id)
    request = documentai.types.ProcessDocumentRequest(
        parent=parent,
        input_config=input_config)

    document = client.process_document(request=request)

    # All text extracted from the document
    print('Document Text: {}'.format(document.text))



Take a moment to study the code and see how it uses mime_type to specify what format the file is (either application/pdf, application/json, etc.).The project_id parameter indicates your specific project so it knows where it is making a request to and the input_uri parameter specifies the location of where the file is hosted, for us our file is being hosted on a Google Cloud Storage bucket.

Call the function:

parse_invoice()

You should see the following output:

Document Text: TERMS: 6 month contract
DUE: 01/01/2025
NOTES:
FROM: Company ABC
user@companyabc.com
ADDRESS: 111 Main Street
Anytown, USA
Item Description Quantity Price Amount
Tool A 500 $1.00 $500.00
Service B 1 $900.00 $900.00
Resource C 50 $12.00 $600.00
Supplies used for Project Q.
TO: John Doe
johndoe@email.com
ADDRESS: 222 Main Street
Anytown, USA
Subtotal $2000.00
Tax $140.00
BALANCE DUE $2140.00
DATE: 01/01/1970
INVOICE: NO. 001
Invoice

Based on our case scenario, this information is extremely helpful because you can see the exact details such as the quantity and price for each item you purchased. With this information, you can then easily programmatically transfer it to a database of your choice.

Summary:

In this step, you were able to analyze an invoice by creating a request to the Document AI API and getting back a response of the text that was extracted from the invoice.

Case Scenario: You currently operate a medical clinic and most forms are currently being handwritten by your customers (example shown below). You want to create a better way to input this data into your database without having to manually type all the forms. You have decided to leverage the Document AI API which can extract text from written forms.

507e706f4c34049f.png

In this section, you will process this invoice in order to see what text the Document AI is able to extract.

Copy the following code into your iPython session:

from google.cloud import documentai_v1beta2 as documentai

def parse_form(project_id='YOUR_PROJECT_ID',
               input_uri='gs://cloud-samples-data/documentai/form.pdf'):
    """Parse a form"""

    client = documentai.DocumentUnderstandingServiceClient()

    gcs_source = documentai.types.GcsSource(uri=input_uri)

    # mime_type can be application/pdf, image/tiff,
    # and image/gif, or application/json
    input_config = documentai.types.InputConfig(
        gcs_source=gcs_source, mime_type='application/pdf')

    # Improve form parsing results by providing key-value pair hints.
    # For each key hint, key is text that is likely to appear in the
    # document as a form field name (i.e. "DOB").
    # Value types are optional, but can be one or more of:
    # ADDRESS, LOCATION, ORGANIZATION, PERSON, PHONE_NUMBER, ID,
    # NUMBER, EMAIL, PRICE, TERMS, DATE, NAME
    key_value_pair_hints = [
        documentai.types.KeyValuePairHint(key='Emergency Contact',
                                          value_types=['NAME']),
        documentai.types.KeyValuePairHint(
            key='Referred By')
    ]

    # Setting enabled=True enables form extraction
    form_extraction_params = documentai.types.FormExtractionParams(
        enabled=True, key_value_pair_hints=key_value_pair_hints)

    # Location can be 'us' or 'eu'
    parent = 'projects/{}/locations/us'.format(project_id)
    request = documentai.types.ProcessDocumentRequest(
        parent=parent,
        input_config=input_config,
        form_extraction_params=form_extraction_params)

    document = client.process_document(request=request)

    def _get_text(el):
        """Doc AI identifies form fields by their offsets
        in document text. This function converts offsets
        to text snippets.
        """
        response = ''
        # If a text segment spans several lines, it will
        # be stored in different text segments.
        for segment in el.text_anchor.text_segments:
            start_index = segment.start_index
            end_index = segment.end_index
            response += document.text[start_index:end_index]
        return response

    for page in document.pages:
        print('Page number: {}'.format(page.page_number))
        for form_field in page.form_fields:
            print('Field Name: {}\tConfidence: {}'.format(
                _get_text(form_field.field_name),
                form_field.field_name.confidence))
            print('Field Value: {}\tConfidence: {}'.format(
                _get_text(form_field.field_value),
                form_field.field_value.confidence))




Take a moment to study the code and see how it uses mime_type to specify what format the file is (either application/pdf, application/json, etc.).The project_id parameter indicates your specific project so it knows where it is making a request to and the input_uri parameter specifies the location of where the file is hosted, for us our file is being hosted on a Google Cloud Storage bucket. In addition, look at the key_value_pair_hints so that it is easier to parse the data.

Call the function:

parse_form()

You should see an output similar to this:

Field Value: Software Engineer  Confidence: 0.9999136328697205
Field Name: Referred By:        Confidence: 0.9998615980148315
Field Value: None               Confidence: 0.9998615980148315
Field Name: Date:               Confidence: 0.9998577833175659
Field Value: 9/14/19            Confidence: 0.9998577833175659
Field Name: DOB:                Confidence: 0.9997154474258423
Field Value: 09/04/1986         Confidence: 0.9997154474258423
Field Name: Address:            Confidence: 0.999135434627533

......skip to end 

Field Name: Describe your medical concerns (symptoms, diagnoses, etc):
Confidence: 0.8794541358947754
Field Value: Ranny nose, mucas in thoat, weakness,
aches, chills, tired
Confidence: 0.8794541358947754

Based on our case scenario, this information is extremely helpful because you can see the exact details such as the name of the field and the value of the field as well. In addition, it can give you a confidence level to ensure you get an accurate result. With this information, you can then easily programmatically transfer it to a database of your choice.

Congratulations, you've successfully used the Document AI API to extract data from an invoice and a medical form!

Clean Up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial:

  • In the Cloud Console, go to the Manage resources page.
  • In the project list, select your project then click Delete.
  • In the dialog, type the project ID and then click Shut down to delete the project.

Learn More

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.