Use Procurement Document AI to Parse your Invoices using AI Platform Notebooks

1. Overview

c65b9ae04aa1853.png

What is Procurement Document AI?

Enterprises manage large procurement pipelines including thousands of invoices, receipts, and other related documents every year. Use Procurement DocAI to intelligently process your "dark data" such as pdfs, images and handwritten forms to reduce the manual overhead of your procurement lifecycle. Automate procurement data capture at scale by turning unstructured documents like invoices and receipts into structured data to increase operational efficiency, improve customer experience, and inform decision-making.

In this codelab we'll go over how to set up the Document AI Platform, process a sample invoice, extract and visualize entities in an AI Platform Notebook.

What you'll learn

  • How to get started with the Document AI Platform
  • Extract schematized entities using the Procurement DocAI Solution
  • Create and customize an AI Platform Notebooks instance

What you'll need

  • A Google Cloud Project
  • A Browser, such as Chrome or Firefox
  • Knowledge of Python 3

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud services?

Novice Intermediate Proficient

2. Setup and Requirements

Self-paced environment setup

  1. Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one.)

Remember the project ID, a unique name across all Google Cloud projects. (Your name above has already been taken and will not work for you, sorry!). You must provide this ID later on as PROJECT_ID.

  1. Next, you must enable billing in Cloud Console in order to use Google Cloud resources.

Be sure to to follow any instructions in the "Cleaning up" section. The section advises you how to shut down resources so you don't incur billing beyond this tutorial. New users of Google Cloud are eligible for the $300USD Free Trial program.

3. Enable the Cloud Document AI API

Before you can begin using Document AI, you must enable the API. Open the Cloud Console in your browser.

  1. Click Navigation menu ☰ > APIs & Services > Library. Search API
  2. Search for "Document AI API," then click Enable to use the API in your Google Cloud project

4. Create and Test a Processor

You must first create an instance of the Form Parser processor to use in the Document AI Platform for this tutorial.

  1. In the console, navigate to the Document AI Platform Overview
  2. Click Create Processor and select Invoice ParserProcessors
  3. Specify a processor name and select your region from the list.
  4. Click Create to create your processor
  5. Copy your processor ID. You must use this in your code later.

(Optional) You can test out your processor in the console by uploading a document. Click Upload Document and select a form to parse. You can download and use this sample form if you do not have one available to use.

Invoice

The output should look this: Parsed Invoice

5. Create an AI Platform Notebook

Navigate to AI Platform Notebooks section of your Cloud Console and click New Instance. Then select the latest Python instance type:

a81c82876c6c16f9.png

Use the default options and then click Create. Once the instance has been created, select Open JupyterLab.

6. Get the Sample Code

Directly import the sample code from Document AI Notebooks Github Repo. In your notebook, either navigate to Git > Clone a Repository in the top menu or click on the Git icon: Git

Paste in the following repository URL:

https://github.com/GoogleCloudPlatform/documentai-notebooks.git

Once the repository is cloned, click through the documentai-notebooks/specialized/ directory and open the specialized_form_parser.ipynb notebook. Find the cell where the GCP Project and Document AI Processor IDs are declared.

vars

Paste your GCP Project ID and Processor ID from step 4. Save your notebook.

7. Extract and Visualize the Entities

Now you can extract the schematized entities from the invoices and their corresponding confidence scores. The Document response object contains a list of entities. To read more about the schematized entities, read the Invoice Parser quickstart.

Run all cells in your notebook and scroll down to the tabular output. The code prior iterates through each entity and creates a Pandas DataFrame with the results. table

Now scroll below to the visualization component. The Document object response contains spatial layout information for each page in the document. Below, the layout information on each form field is used to draw bounding boxes on the the image. This data can be used for integration Document AI into a frontend application.

polys

8. Congratulations

Congratulations, you've successfully used the Procurment Document AI Solution to extract data from an invoice. We encourage you to experiment with other form types.

Clean Up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, you may either shutdown your notebook or delete the GCP Project.

Shutting down AI Platform Notebooks instance

Follow these instructions to shutdown an AI Platform Notebooks instance.

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the GCP Console, go to the Projects page. Projects page
  2. In the project list, select the project you want to delete and click Delete.
  3. In the dialog, type the project ID, then click Shut down to delete the project.

Learn More

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.