Artificial intelligence has the potential to unlock real value from documentation. In this second part of his series on applied AI, TEKenable’s Mohammad Zeeshan Khan explains how Azure AI Document Intelligence can augment search and automate document processing.
Azure Cognitive Services Form Recognizer, now known as Azure AI Document Intelligence¹, is a cloud-based Azure AI service that uses machine-learning models to automate your data processing in applications and workflows⁴. It applies advanced machine learning to extract text, key-value pairs, tables, and structures from documents automatically and accurately². This service is essential for enhancing data-driven strategies and enriching document search capabilities⁴.
Key features of Azure AI Document Intelligence
Azure AI Document Intelligence offers three types of models¹:
1. Document Analysis Models: These models enable text extraction from forms and documents and return structured business-ready content ready for your organization’s action, use, or progress¹. They can extract printed and handwritten text, text and document structure, and text, structure, and key-value pairs¹.
2. Prebuilt Models: These models enable you to add intelligent document processing to your apps and flows without having to train and build your own models¹. They can extract customer and vendor details from invoices, sales transaction details from receipts, identification and verification details from identity cards, health insurance details from health insurance cards, business contact details from business cards, agreement and party details from contracts, taxable compensation details from W2 forms, student loan interest details from US Tax 1098-E forms, mortgage interest details from US Tax 1098 forms, and qualified tuition details from US Tax 1098-T forms¹.
3. Custom Models These models are trained using your labelled datasets to extract distinct data from forms and documents, specific to your use cases¹. Standalone custom models can be combined to create composed models¹. Custom extraction models are trained to extract labelled fields from documents¹.
An example of solution architecture
Azure AI Document Intelligence can be used to build an automated document processing pipeline⁶. Here’s an example of how it can be integrated into a typical business process:
1. Data Ingestion and Extraction: Documents are ingested through a browser at the front end of a web application⁶. The back-end application posts a request to a Form Recognizer REST API endpoint that uses one of the models mentioned above⁶. The response from Form Recognizer contains raw OCR data and structured extractions⁶. The App Service back-end application uses the confidence values to check the extraction quality⁶. When the extraction quality meets requirements, the data enters Azure Cosmos DB for downstream application consumption⁶.
2. Data Enrichment: The pipeline used for data enrichment depends on the use case⁶. Data enrichment can include named entity recognition (NER), the extraction of personal information, key phrases, health information, and other domain-dependent entities⁶.
How to use Azure AI to extract text from images in SharePoint
Let’s look at how Azure AI Document Intelligence fits into a larger solution architecture to solve a real-world business use case.
Have you ever wanted to search for text that’s embedded in images, such as diagrams, charts, or shapes? If you have a lot of documents that contain such images, you might find it hard to manually scan them for relevant information. Fortunately, there is a solution that can help you automate this process and make your documents more searchable and accessible.
We can use Azure AI to extract text from images in stored SharePoint. By using AI Builder and Azure Form Recognizer, you can configure a Power Automate workflow to use a trained model to extract text from an image. Once you’ve configured a workflow, you can quickly search documents for meaningful text that’s embedded in shapes and objects.
The following diagram shows the architecture of the solution:
The solution consists of the following components:
- AI Builder: A Power Platform capability that lets you train models to recognise objects in images. You can also use prebuilt models for object detection.
- Form Recognizer: An Azure Cognitive Service that uses machine-learning models to extract and analyse form fields, text, and tables from your documents.
- Power Automate: An online workflow service that automates actions across apps and services.
- Azure Functions: An event-driven serverless compute platform that runs on demand and at scale in the cloud.
- PnP Modern Search: A set of SharePoint Online modern web parts that let you create highly flexible and personalised search-based experiences.
The solution works as follows:
- An object detection model is trained in AI Builder to recognise objects that you specify, such as pumps, valves, switches, etc.
- A new document enters a SharePoint document library, OneDrive, or Teams.
- Power Automate runs the AI Builder model on the document and returns a JSON file that contains the pixel coordinates of any detected objects.
- Power Automate sends the document to Form Recognizer for a full optical character recognition (OCR) scan and returns a JSON file that contains scanned-in text and pixel coordinates of the text.
- Power Automate runs a function in Azure Functions that analyses the pixel coordinates in the AI Builder and Form Recognizer output files. If detected objects intersect with scanned-in text, the function returns the matched data in a JSON file.
- Power Automate enters the metadata, or the text from detected objects, into a document library.
- Users search for the metadata by using PnP Modern Search web parts.
By using this solution, you can:
- Save time and effort by automating the extraction of text from images in your documents.
- Improve the searchability and accessibility of your documents by adding metadata that reflects the content of the images.
- Enhance your document management and analysis by using AI to identify and extract relevant information from complex diagrams.
The use cases for this approach
This solution can be applied to various types of documents that contain images with embedded text, such as:
- Complicated engineering schematic diagrams that show various types of components. By using this solution, you can quickly search for specific components on a diagram. This can help you with investigations, exposing shortages, or looking for recall and failure notices.
- Industrial diagrams that show the components in a manufacturing assembly. This solution can help you identify pumps, valves, automated switches, and other components. This can help you with preventative maintenance, isolating hazardous components, and increasing the visibility of risk management in your organization.
The steps to Implement
To implement this solution, you need to follow these steps:
- Train an object detection model in AI Builder by using your own images or prebuilt models.
- Create a Power Automate workflow that triggers when a new document is added to a document library, OneDrive, or Teams.
- Add an action to run the AI Builder model on the document and store the output JSON file in a variable.
- Add an action to send the document to Form Recognizer for OCR scan and store the output JSON file in another variable.
- Add an action to call an Azure Function that takes the two JSON files as input and returns the matched data as output.
- Add an action to update the document properties with the metadata from the Azure Function output.
- Configure PnP Modern Search web parts to display the metadata in SharePoint.
Conclusion: AI-driven efficiency in document processing
Azure AI Document Intelligence is a game-changer for businesses looking to automate their document processing workflows. It not only reduces manual labour but also increases efficiency by providing accurate and structured data extraction. By integrating this service into their business processes, organisations can focus more on acting on information rather than compiling it².
I showed you how to use Azure AI to extract text from images in SharePoint. This solution can help you make your documents more searchable and accessible by using AI Builder and Azure Form Recognizer to identify and extract relevant information from complex diagrams.
I hope you found this useful and interesting. If you have any questions or feedback, please leave a comment below. Thanks for reading!
- What is Azure AI Document Intelligence (formerly Form Recognizer …. https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/overview?view=doc-intel-3.1.0.
- Azure AI Document Intelligence documentation. https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/?view=doc-intel-3.1.0.
- Form Recognizer – Automated Data Processing Systems | Microsoft Azure. https://azure.microsoft.com/en-in/products/form-recognizer/.
- Automate document processing with Azure Form Recognizer – Azure …. https://learn.microsoft.com/en-us/azure/architecture/ai-ml/architecture/automate-document-processing-azure-form-recognizer.
- azure – ai-form-recognizer vs. cognitiveservices-computervision – Stack …. https://stackoverflow.com/questions/71071309/ai-form-recognizer-vs-cognitiveservices-computervision.