1. Introduction
Document AI API is a document understanding solution that takes unstructured data, such as documents, emails, and so on, and makes the data easier to understand, analyze, and consume.
With human review you can achieve higher document processing accuracy with the assurance of human review. Human review can increase accuracy and helps businesses evaluate predictions using purpose-built tools to enable those reviews. In this lab, you will configure and test an expense processor using human review to validate the results from the processor using the human-in-the-loop configuration and management tools.
Prerequisites
This codelab builds upon content presented in other Document AI Codelabs.
It is recommended that you complete the following Codelabs before proceeding.
- Optical Character Recognition (OCR) with Document AI (Python)
- Form Parsing with Document AI (Python)
- Specialized Processors with Document AI (Python)
What you'll learn
- Configure human review for a processor.
- Create a human review user resource pool.
- Create a test human review task.
- Assign a human review task to a user.
- Complete a human review of a document.
What you'll need
2. Getting set up
This codelab assumes you have completed the Document AI Setup steps listed in the Introductory Codelab.
Please complete the following steps before proceeding:
You will also need to enable the Vertex AI API.
- Using the Search Bar at the top of the console, search for "Vertex AI API", then click Enable to use the API in your Google Cloud project
- Alternatively, the API can be enabled using the following
gcloud
command.
gcloud services enable aiplatform.googleapis.com
3. Create a Processor
You must first create an instance of the Expense Processor to use for this lab.
- In the console, navigate to the Document AI Platform Overview
- Click Create Processor, scroll down to Specialized and select Expense Parser.
- Give it the name
codelab-expense-parser
(Or something else you'll remember) and select the closest region on the list. - Click Create to create your processor
- Copy the processor ID. You must use this in your code later.
- In the Cloud Shell, create a storage bucket using
PROJECT_ID-hitl-results
as the name:
export PROJECT_ID=$(gcloud config get-value core/project)
gsutil mb gs://$PROJECT_ID-hitl-results
- Bind your user account to the Vertex AI Admin IAM role on your lab project
export USER_ACCOUNT=$(gcloud config get-value core/account)
gcloud projects add-iam-policy-binding $PROJECT_ID --member=user:$USER_ACCOUNT --role=roles/aiplatform.admin
4. Configure Human-in-the-loop
In this task, you will configure human review for the expense processor you created earlier.
- In the console, open the Navigation menu and select Document AI.
- Click Human-in-the-loop AI.
- Click
codelab-expense-parser
to open the Human Review page for the processor. - Click Configure Human-in-the-Loop.
- Select Document Level Filter.
- Set the Confidence threshold % slider to 50%.
- Leave the Specialists option set to Use my own specialists.
- Click in the Specialist pool drop down box and click NEW SPECIALIST POOL.
- For Pool name enter
Codelab HITL Pool
in the New specialist pool dialog. - Enter your personal email address for the Pool Managers and Specialists
- Click Create pool.
This will take a few minutes to complete. You should receive an email from Vertex AI noreply-vertex@google.com
.
- Leave the Auto-assignment Checkbox unchecked.
- Click the checkbox in Confirm charges section.
- Click Instructions location and copy in this storage location: - Do NOT include the prefix
gs://
in the path
cloud-samples-data/documentai/codelabs/hitl/hitl-instructions.pdf
- In Results location, click Browse and select the Cloud Storage bucket you created earlier.
- Click Select.
- Click Save Configuration.
The Console will now say Configuring human-in-the-loop and will take a few minutes to complete.
- When configuration is complete, the console will prompt you to Enable Human-in-the-loop.
- Click the Switch Button to enable.
- Then click ENABLE in the pop-up dialog.
Upload a Sample Expense Form
- We have a sample form to use stored in Google Cloud Storage. You can download it with the button or command below:
gsutil cp gs://cloud-samples-data/documentai/codelabs/hitl/expense-claim.pdf .
- After enabling Human-in-the-loop, click the Upload Document button and browse for the sample document you just downloaded.
- Click Upload and wait for it to complete.
5. Assign an item for human review
- On this page, you should see links for the Pool Manager and Specialist Consoles. These links will also show up in an email from
Vertex AI noreply-vertex@google.com
.- They should look like
https://datacompute.google.com/cm/cloudml_data_specialists_us_central1_xxxxxxx/tasks
- Click on the link for the Manager console.
- They should look like
- Once in the Data Labeling Console, click the Tasks tab title to open the task assignment page.
- Click the Unassigned check box. You should see that a new entry is listed against the codelab-expense-parser-P1 task queue.
- Select codelab-expense-parser-P1.
- Click Manage Assignment.
- Enter your own personal email in the Include specialists by email text box and then select it from the dropdown list.
- Click Apply.
The display now shows that the task is assigned to you. You may find that this may take a few minutes to propagate and become visible.
- Select the new user and click the menu icon.
- Click Assign to all tasks from the pop-up menu that appears.
- Click Commit changes.
- Click Commit.
6. Perform Human Review Task
- Go back to the Human-in-the-Loop configuration page in the Cloud Console.
Click the link to visit the Specialist (Worker) console. This will look like https://datacompute.google.com/w/cloudml_data_specialists_us_central1_xxxxxxxxxxx
.
The worker console should open and list your new task.
- Hover over the line item that contains Meeting with 4m and click the edit (pencil) icon.
- Edit the value to change the text to say Meeting with Adam. You may have to scroll down in the text box to see the text.
- Click Apply.
- Click the Confirm (green tick) icon for the below item.
- Click the Confirm icon for the other highlighted entities.
- Click Submit. The review task has now been removed from your labeler queue.
7. View Completed Tasks
- Return to the Manager console.
- Click Tasks and select Ongoing
- Click Specialists.
- Select your email address.
- Click Manage Assignment.
- Select expense-processor-P1 from
Select specialists working on specific tasks
andSelect tasks
dropdown. Click Apply for each selection. In the context menu for expense-processor-P1 that has been assigned to you, select View Specialists
Once the labeling task has submitted by the labeler the number of answered tasks and total time taken will be updated but the data in this view can take a few minutes to appear.
- Close the specialists pop-up and see the Specialists tab.
- Click the context menu for your user name and select View tasks.
This view shows the list of tasks for the user, their completion numbers and the amount of time taken as shown below:
8. Congratulations
Congratulations, you've successfully used Document AI Human-in-the-Loop to configure human review for documents processed using a Document AI expense processor.
Cleanup
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial:
- In the Cloud Console, go to the Manage resources page.
- In the project list, select your project then click Delete.
- In the dialog, type the project ID and then click Shut down to delete the project.
Learn More
Continue learning about Document AI with these follow-up Codelabs.
Resources
- The Future of Documents - YouTube Playlist
- Document AI Documentation
- Document AI Python Client Library
- Document AI Samples
License
This work is licensed under a Creative Commons Attribution 2.0 Generic License.