1. Overview
What is Document AI?
The Document AI API is a document understanding solution that takes unstructured data, such as documents, emails, and so on, and makes the data easier to understand, analyze, and consume. The API provides structure through content classification, entity extraction, advanced searching, and more.
In this tutorial, you focus on using the Document AI API with Node.js. The tutorial demonstrates how to parse a simple medical intake form.
What you'll learn
- How to enable the Document AI API
- How to authenticate API requests
- How to install the client library for Node.js
- How to parse data from a scanned form
What you'll need
Survey
How will you use this tutorial?
How would you rate your experience with Node.js?
How would you rate your experience with using Google Cloud services?
2. Setup and Requirements
Self-paced environment setup
- Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one.)
Remember the project ID, a unique name across all Google Cloud projects. (Your name above has already been taken and will not work for you, sorry!). You must provide this ID later on as PROJECT_ID
.
- Next, you must enable billing in Cloud Console in order to use Google Cloud resources.
Be sure to to follow any instructions in the "Cleaning up" section. The section advises you how to shut down resources so you don't incur billing beyond this tutorial. New users of Google Cloud are eligible for the $300USD Free Trial program.
Start Cloud Shell
While Google Cloud you can operate Google Cloud remotely from your laptop, this codelab uses Google Cloud Shell, a command line environment running in the Cloud.
Activate Cloud Shell
- From the Cloud Console, click Activate Cloud Shell .
If you've never started Cloud Shell before, you are presented with an intermediate screen (below the fold) describing what it is. If that's the case, click Continue (and you won't ever see it again). Here's what that one-time screen looks like:
It should only take a few moments to provision and connect to Cloud Shell.
Cloud Shell provides you with terminal access to a virtual machine hosted in the cloud. The virtual machine includes all the development tools that you'll need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.
Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.
- Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list
Command output
Credentialed Accounts ACTIVE ACCOUNT * <my_account>@<my_domain.com> To set the active account, run: $ gcloud config set account `ACCOUNT`
gcloud config list project
Command output
[core] project = <PROJECT_ID>
If it is not, you can set it with this command:
gcloud config set project <PROJECT_ID>
Command output
Updated property [core/project].
3. Enable the Cloud Document AI API
Before you can begin using Document AI, you must enable the API. Open the Cloud Console in your browser.
- Click Navigation menu ☰ > APIs & Services > Library.
- Search for "Document AI API," then click Enable to use the API in your Google Cloud project
4. Create and Test a Processor
You must first create an instance of the Form Parser processor to use in the Document AI Platform for this tutorial.
- In the console, navigate to the Document AI Platform Overview
- Click Create Processor and select Form Parser
- Specify a processor name and select your region from the list.
- Click Create to create your processor
- Copy your processor ID. You must use this in your code later.
(Optional) You can test out your processor in the console by uploading a document. Click Upload Document and select a form to parse. You can download and use this sample form if you do not have one available to use.
Your output should look this:
5. Authenticate API requests
In order to make requests to the Document AI API, you must use a Service Account. A Service Account belongs to your project and it is used by the Google Client Node.js library to make API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the Cloud SDK to create a service account and then create credentials you need to authenticate as the service account.
First, set an environment variable with your PROJECT_ID
which you will use throughout this codelab:
export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value core/project)
Next, create a new service account to access the Document AI API by using:
gcloud iam service-accounts create my-docai-sa \
--display-name "my-docai-service-account"
Next, create credentials that your Node.js code uses to login as your new service account. Create these credentials and save it as a JSON file "~/key.json" by using the following command:
gcloud iam service-accounts keys create ~/key.json \
--iam-account my-docai-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com
Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the library to find your credentials. To read more about this form authentication, see the guide. The environment variable should be set to the full path of the credentials JSON file you created, by using:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json"
6. Download the Sample Form
We have a sample form to use stored in our public Google Cloud Storage samples bucket. Use the following command to download it to your working directory.
gsutil cp gs://cloud-samples-data/documentai/form.pdf .
Confirm the file is downloaded to your cloudshell using the below command:
ls -ltr form.pdf
7. Install the Client Library
Next set up your code in your working directory.
Initialize a new Node.js package:
npm init
Install the Document AI client library:
npm install @google-cloud/documentai
8. Make a Synchronous Process Document Request
In this step, you make a process document call using the synchronous endpoint. For processing large amounts of documents at a time you can also use the asynchronous API, to learn more about using the Form Parser APIs, read the guide here.
Create an index.js
file and paste the following code. Fill in the applicable variables with your processor's information.
const { DocumentProcessorServiceClient } = require('@google-cloud/documentai').v1;
const fs = require('fs');
/**
* Runs the sample document through Document AI to get key/value pairs and
* confidence scores.
*/
async function processDocument(projectId, location, processorId, filePath, mimeType) {
// Instantiates a client
const documentaiClient = new DocumentProcessorServiceClient();
// The full resource name of the processor, e.g.:
// projects/project-id/locations/location/processor/processor-id
// You must create new processors in the Cloud Console first
const resourceName = documentaiClient.processorPath(projectId, location, processorId);
// Read the file into memory.
const imageFile = fs.readFileSync(filePath);
// Convert the image data to a Buffer and base64 encode it.
const encodedImage = Buffer.from(imageFile).toString('base64');
// Load Binary Data into Document AI RawDocument Object
const rawDocument = {
content: encodedImage,
mimeType: mimeType,
};
// Configure ProcessRequest Object
const request = {
name: resourceName,
rawDocument: rawDocument
};
// Use the Document AI client to process the sample form
const [result] = await documentaiClient.processDocument(request);
return result.document;
}
/**
* Run the codelab.
*/
async function main() {
const projectId = 'YOUR_PROJECT_ID';
const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
const processorId = 'YOUR_PROCESSOR_ID'; // Should be a Hexadecimal string
// Supported File Types
// https://cloud.google.com/document-ai/docs/processors-list#processor_form-parser
filePath = 'form.pdf'; // The local file in your current working directory
mimeType = 'application/pdf';
const document = await processDocument(projectId, location, processorId, filePath, mimeType);
console.log("Document Processing Complete");
// Print the document text as one big string
console.log(`Text: ${document.text}`);
}
main(...process.argv.slice(2)).catch(err => {
console.error(err);
process.exitCode = 1;
});
Run your code now and you should see the following text printed in your console.
Text: FakeDoc M.D. HEALTH INTAKE FORM Please fill out the questionnaire carefully. The information you provide will be used to complete your health profile and will be kept confidential. Name: Date: Sally Walker DOB: 09/04/1986 Address: 24 Barney Lane City: Towalo State: NJ Zip: 07082 Email: Sally, waller@cmail.com Phone #: (906) 917-3486 Gender: Marital Status: Single Occupation: Software Engineer Referred By: None Emergency Contact: Eva Walker Emergency Contact Phone: (906) 334-8926 Describe your medical concerns (symptoms, diagnoses, etc): Runny nose, mucas in throat, weakness, aches, chills, tired Are you currently taking any medication? (If yes, please describe): Vyvanse (25mg) daily for attention
In the next steps, you extract structured data that can more easily stored in databases or used in other applications.
9. Extract the Form Key/Value Pairs
Now you can extract the key-value pairs from the form and their corresponding confidence scores. The Document response object contains a list of pages from the input document. Each page
object contains a list of form fields and their locations in the text.
The following code iterates through each page and extracts each key, value and confidence score.
Add the following function to your code.
/**
* Extract form data and confidence from processed document.
*/
function extractFormData(document) {
// Extract shards from the text field
function getText(textAnchor, document) {
if (!textAnchor.textSegments || textAnchor.textSegments.length === 0) {
return '';
}
// First shard in document doesn't have startIndex property
const startIndex = textAnchor.textSegments[0].startIndex || 0;
const endIndex = textAnchor.textSegments[0].endIndex;
return document.text.substring(startIndex, endIndex);
}
var formData = [];
const pages = document.pages;
pages.forEach((page) => {
const formFields = page.formFields;
formFields.forEach((field) => {
// Get the extracted field names and remove extra space from text
const fieldName = getText(field.fieldName.textAnchor, document);
// Confidence - How "sure" the API is that the text is correct
const nameConfidence = field.fieldName.confidence.toFixed(4);
const fieldValue = getText(field.fieldValue.textAnchor, document);
const valueConfidence = field.fieldValue.confidence.toFixed(4);
formData.push({
fieldName: fieldName,
fieldValue: fieldValue,
nameConfidence: nameConfidence,
valueConfidence: valueConfidence
});
});
});
return formData;
}
Add a call to the extractFormData()
function from inside the main function and print the resulting object as a table.
/**
* Run the codelab.
*/
async function main() {
const projectId = 'YOUR_PROJECT_ID';
const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
const processorId = 'YOUR_PROCESSOR_ID'; // Should be a Hexadecimal string
// Supported File Types
// https://cloud.google.com/document-ai/docs/processors-list#processor_form-parser
filePath = 'form.pdf'; // The local file in your current working directory
mimeType = 'application/pdf';
const document = await processDocument(projectId, location, processorId, filePath, mimeType);
const formData = extractFormData(document);
console.log('\nThe following form key/value pairs were detected:');
console.table(formData);
}
Now run your code. You should see the following output if using our sample document:
The following form key/value pairs were detected: ┌─────────┬────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────┬────────────────┬─────────────────┐ │ (index) │ fieldName │ fieldValue │ nameConfidence │ valueConfidence │ ├─────────┼────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┼────────────────┼─────────────────┤ │ 0 │ 'Marital Status: ' │ 'Single ' │ '1.0000' │ '1.0000' │ │ 1 │ 'DOB: ' │ '09/04/1986\n' │ '0.9999' │ '0.9999' │ │ 2 │ 'City: ' │ 'Towalo ' │ '0.9996' │ '0.9996' │ │ 3 │ 'Address: ' │ '24 Barney Lane ' │ '0.9994' │ '0.9994' │ │ 4 │ 'Referred By: ' │ 'None\n' │ '0.9968' │ '0.9968' │ │ 5 │ 'Phone #: ' │ '(906) 917-3486\n' │ '0.9961' │ '0.9961' │ │ 6 │ 'State: ' │ 'NJ ' │ '0.9960' │ '0.9960' │ │ 7 │ 'Emergency Contact Phone: ' │ '(906) 334-8926\n' │ '0.9925' │ '0.9925' │ │ 8 │ 'Name:\n' │ 'Sally\nWalker\n' │ '0.9922' │ '0.9922' │ │ 9 │ 'Occupation: ' │ 'Software Engineer\n' │ '0.9914' │ '0.9914' │ │ 10 │ 'Zip: ' │ '07082\n' │ '0.9904' │ '0.9904' │ │ 11 │ 'Email: ' │ 'Sally, waller@cmail.com ' │ '0.9681' │ '0.9681' │ │ 12 │ 'Emergency Contact: ' │ 'Eva Walker ' │ '0.9430' │ '0.9430' │ │ 13 │ 'Describe your medical concerns (symptoms, diagnoses, etc):\n' │ 'Runny nose, mucas in throat, weakness,\naches, chills, tired\n' │ '0.7817' │ '0.7817' │ └─────────┴────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────┴────────────────┴─────────────────┘
10. Congratulations!
Congratulations, you've successfully used the Document AI API to extract data from a handwritten form. We encourage you to experiment with other form images.
Clean Up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial:
- In the Cloud Console, go to the Manage resources page.
- In the project list, select your project then click Delete.
- In the dialog, type the project ID and then click Shut down to delete the project.
Learn More
- The Future of Documents - YouTube Playlist
- Document AI Documentation
- Document AI Node.js Client Library Reference
- Document AI Node.js Samples