Given below is a step-by-step guide on how to use bounding boxes to train custom document models in Azure Vision + Document AI:
Step 1: Prepare Your Document Samples
Collect a minimum of about 5-10 sample documents representative of the type you want the model to learn. Ensure the documents contain the fields or visual elements you want to extract (e.g., invoice numbers, tables, checkboxes).
Step 2: Upload Documents to Azure Document Intelligence Studio or AI Foundry Portal
Navigate to the Azure Document Intelligence Studio or the AI Foundry portal. Create a new custom model project and upload your labeled documents here.
Step 3: Annotate the Documents with Bounding Boxes
Open each document in the annotation tool. Use the interface to draw bounding boxes around each field or element you want your model to detect. For example, draw a rectangle around the "Invoice Number" field or the table area. Assign a meaningful label/tag to each bounding box (e.g., "InvoiceNumber," "TotalAmount," "Table").
Step 4: Review and Adjust Annotations
Carefully review each bounding box for accuracy and completeness. Adjust sizes and positions as needed to tightly encase the relevant text or visual elements.
Step 5: Train the Custom Model
Once all documents are annotated, start the training process. The AI will learn to recognize visually similar regions and extract text or data associated with each labeled bounding box.
Step 6: Evaluate the Model
Test the model using a set of new, unseen documents. Review the extracted fields to check accuracy and completeness. If necessary, add more labeled documents or refine annotations and retrain.
Step 7: Deploy and Use the Model
When satisfied with the model’s performance, deploy it via the Azure portal. You can now integrate the model through APIs or SDKs to automate document processing in your applications.
This bounding-box annotation process is crucial for training effective custom document AI models in Azure Vision + Document AI, ensuring the system understands exactly where and what information to extract from documents.
Azure Vision + Document AI supports two main types of custom models:
- Custom Template Model (formerly Custom Form Model): Best for documents with a consistent and static layout or visual template (e.g., questionnaires, structured forms, applications). Extracts labeled key-value pairs, selection marks (checkboxes), tables, signature fields, and regions from documents with little variation in structure.
- Custom Neural Model (also called Custom Document Model): Designed for documents with more layout variation, including structured, semi-structured, or unstructured document types (e.g., invoices, receipts, purchase orders). Uses deep learning trained on a base of diverse document types and fine-tuned on your labeled dataset. Recommended for higher accuracy and advanced extraction scenarios when documents vary in layout or complexity.