Document AI: Human in the Loop

1. Introduction

Document AI API is a document understanding solution that takes unstructured data, such as documents, emails, and so on, and makes the data easier to understand, analyze, and consume.

With human review you can achieve higher document processing accuracy with the assurance of human review. Human review can increase accuracy and helps businesses evaluate predictions using purpose-built tools to enable those reviews. In this lab, you will configure and test an expense processor using human review to validate the results from the processor using the human-in-the-loop configuration and management tools.

Prerequisites

This codelab builds upon content presented in other Document AI Codelabs.

It is recommended that you complete the following Codelabs before proceeding.

What you'll learn

  • Configure human review for a processor.
  • Create a human review user resource pool.
  • Create a test human review task.
  • Assign a human review task to a user.
  • Complete a human review of a document.

What you'll need

  • A Google Cloud Project
  • A Browser, such as Chrome or Firefox
  • Knowledge of Python 3

2. Getting set up

This codelab assumes you have completed the Document AI Setup steps listed in the Introductory Codelab.

Please complete the following steps before proceeding:

You will also need to enable the Vertex AI API.

  1. Using the Search Bar at the top of the console, search for "Vertex AI API", then click Enable to use the API in your Google Cloud project
  2. Alternatively, the API can be enabled using the following gcloud command.
gcloud services enable aiplatform.googleapis.com

3. Create a Processor

You must first create an instance of the Expense Processor to use for this lab.

  1. In the console, navigate to the Document AI Platform Overview
  2. Click Create Processor, scroll down to Specialized and select Expense Parser.
  3. Give it the name codelab-expense-parser (Or something else you'll remember) and select the closest region on the list.
  4. Click Create to create your processor
  5. Copy the processor ID. You must use this in your code later.
  6. In the Cloud Shell, create a storage bucket using PROJECT_ID-hitl-results as the name:
export PROJECT_ID=$(gcloud config get-value core/project)
gsutil mb gs://$PROJECT_ID-hitl-results
  1. Bind your user account to the Vertex AI Admin IAM role on your lab project
export USER_ACCOUNT=$(gcloud config get-value core/account)
gcloud projects add-iam-policy-binding $PROJECT_ID --member=user:$USER_ACCOUNT --role=roles/aiplatform.admin

4. Configure Human-in-the-loop

In this task, you will configure human review for the expense processor you created earlier.

  1. In the console, open the Navigation menu and select Document AI.
  2. Click Human-in-the-loop AI. HITLMenu
  3. Click codelab-expense-parser to open the Human Review page for the processor.
  4. Click Configure Human-in-the-Loop.

ConfigureHITL

  1. Select Document Level Filter.
  2. Set the Confidence threshold % slider to 50%.
  3. Leave the Specialists option set to Use my own specialists.

HITLFilters

  1. Click in the Specialist pool drop down box and click NEW SPECIALIST POOL.
  2. For Pool name enter Codelab HITL Pool in the New specialist pool dialog.
  3. Enter your personal email address for the Pool Managers and Specialists
  4. Click Create pool.

HITLSpecialistPool

This will take a few minutes to complete. You should receive an email from Vertex AI noreply-vertex@google.com.

  1. Leave the Auto-assignment Checkbox unchecked.
  2. Click the checkbox in Confirm charges section.
  3. Click Instructions location and copy in this storage location: - Do NOT include the prefix gs:// in the path
cloud-samples-data/documentai/codelabs/hitl/hitl-instructions.pdf
  1. In Results location, click Browse and select the Cloud Storage bucket you created earlier.
  2. Click Select.
  3. Click Save Configuration.

The Console will now say Configuring human-in-the-loop and will take a few minutes to complete.

HITLLoading

  1. When configuration is complete, the console will prompt you to Enable Human-in-the-loop.
  • Click the Switch Button to enable.
  • Then click ENABLE in the pop-up dialog.

HITLEnable

Upload a Sample Expense Form

  1. We have a sample form to use stored in Google Cloud Storage. You can download it with the button or command below:

gsutil cp gs://cloud-samples-data/documentai/codelabs/hitl/expense-claim.pdf .
  1. After enabling Human-in-the-loop, click the Upload Document button and browse for the sample document you just downloaded.
  2. Click Upload and wait for it to complete.

5. Assign an item for human review

  1. On this page, you should see links for the Pool Manager and Specialist Consoles. These links will also show up in an email from Vertex AI noreply-vertex@google.com.
    • They should look like https://datacompute.google.com/cm/cloudml_data_specialists_us_central1_xxxxxxx/tasks
    • Click on the link for the Manager console.

  1. Once in the Data Labeling Console, click the Tasks tab title to open the task assignment page.
  2. Click the Unassigned check box. You should see that a new entry is listed against the codelab-expense-parser-P1 task queue.

image

  1. Select codelab-expense-parser-P1.
  2. Click Manage Assignment.
  3. Enter your own personal email in the Include specialists by email text box and then select it from the dropdown list.
  4. Click Apply.

The display now shows that the task is assigned to you. You may find that this may take a few minutes to propagate and become visible.

image

  1. Select the new user and click the menu icon.
  2. Click Assign to all tasks from the pop-up menu that appears.

image

  1. Click Commit changes.
  2. Click Commit.

image

6. Perform Human Review Task

  1. Go back to the Human-in-the-Loop configuration page in the Cloud Console.

Click the link to visit the Specialist (Worker) console. This will look like https://datacompute.google.com/w/cloudml_data_specialists_us_central1_xxxxxxxxxxx.

The worker console should open and list your new task.

image

  1. Hover over the line item that contains Meeting with 4m and click the edit (pencil) icon.
  2. Edit the value to change the text to say Meeting with Adam. You may have to scroll down in the text box to see the text.
  3. Click Apply.
  4. Click the Confirm (green tick) icon for the below item. image
  5. Click the Confirm icon for the other highlighted entities.
  6. Click Submit. The review task has now been removed from your labeler queue.

7. View Completed Tasks

  1. Return to the Manager console.
  2. Click Tasks and select Ongoingimage
  3. Click Specialists.
  4. Select your email address.
  5. Click Manage Assignment.
  6. Select expense-processor-P1 from Select specialists working on specific tasks and Select tasks dropdown. Click Apply for each selection. In the context menu for expense-processor-P1 that has been assigned to you, select View Specialists

image

Once the labeling task has submitted by the labeler the number of answered tasks and total time taken will be updated but the data in this view can take a few minutes to appear.

  1. Close the specialists pop-up and see the Specialists tab.
  2. Click the context menu for your user name and select View tasks.

This view shows the list of tasks for the user, their completion numbers and the amount of time taken as shown below:

image

8. Congratulations

Congratulations, you've successfully used Document AI Human-in-the-Loop to configure human review for documents processed using a Document AI expense processor.

Cleanup

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial:

  • In the Cloud Console, go to the Manage resources page.
  • In the project list, select your project then click Delete.
  • In the dialog, type the project ID and then click Shut down to delete the project.

Learn More

Continue learning about Document AI with these follow-up Codelabs.

Resources

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.