Product news

AI-Powered PDF Field Detection and Template Automation

By Mang-Git Ng

∙

Published on April 29, 2026

Updated on May 21, 2026

Anvil Document AI automatically detects and labels every field in any uploaded PDF, whether structured, flat, or scanned, and generates clean JSON aliases like firstName instead of cryptic strings like topmostSubform[0].Page1[0].f1_1[0]. The setup phase between “I have a PDF” and “I can fill this PDF programmatically” collapses from weeks of manual mapping to seconds. UI-based template creation is on the free tier (20-page cap per PDF); programmatic Document AI for the API workflows in this post is on the AI Pack ($99/month) and above. Generated webforms, AI signer detection, and template aliases that stay consistent across forms come along for the ride.

AI

PDF Fill

Structured data

Data-first

Back to all articles

topmostSubform[0].Page1[0].f1_1[0] is a real field name on the IRS W-4. It means “First Name.” The next field, f1_2[0], means “Last Name.” Nothing in the field name tells you that. You find out by opening the PDF, clicking into the field, reading the label, and guessing from context. Now do that for every field on every form your integration needs to support. A library of state-specific insurance forms can burn weeks of engineering time before you write a single line of production code.

Anvil Document AI eliminates that entire phase. It’s the setup layer between “I have a PDF” and “I can fill this PDF programmatically.”

Summary

Anvil Document AI is an AI-powered feature that automatically detects every field in an uploaded PDF, assigns correct types and human-readable labels, and generates clean, JSON-friendly aliases like firstName and dateOfBirth. It works on structured PDFs with embedded form fields, flat PDFs without metadata, and scanned documents alike. Document AI runs automatically on every PDF uploaded to Anvil with no configuration. UI-based template creation is available on the free tier (with a 20-page cap per PDF), and programmatic Document AI for the API workflows in this post is on the AI Pack ($99/month) and above. Any PDF becomes an API-fillable template without manual field mapping, ready for programmatic filling, webform generation, and e-signatures within seconds of upload.

The Problem with PDF Templates at Scale

PDFs were designed for visual fidelity, not developer ergonomics. When a PDF has embedded form fields at all, those fields carry names assigned by whatever tool originally created the document. A W-4 exported from a government authoring system labels its first field topmostSubform[0].Page1[0].f1_1[0]. Nothing tells you what data each field expects.

The developer workflow: open the PDF, click into every field, read the cryptic name, figure out its purpose by visual context, create a mapping in code. A single multi-page form can take an hour. A library of 50 state-specific forms takes weeks.

Flat PDFs and scanned documents have no embedded field metadata whatsoever. Every text input, checkbox, and signature block has to be manually drawn and configured. An ops team doing this work for hundreds of forms is doing data entry to enable data entry.

Manual mapping is also fragile. One mistyped field name, one overlooked checkbox, and the API fills the wrong value into the wrong place. The field names themselves give no semantic clue about what went wrong, so debugging takes longer than the original mapping.

What Document AI Does

Document AI runs automatically on every PDF uploaded to Anvil, with no configuration required. UI-based template creation with Document AI is available on the free tier (20-page cap per PDF). Programmatic Document AI, including the API workflows described in this post, requires the AI Pack ($99/month) or above.

Field Detection

Computer vision scans the uploaded PDF and identifies all fillable regions: text inputs, checkboxes, date fields, signature blocks. Works on structured PDFs with embedded form fields and on flat or scanned PDFs with no metadata. For flat documents, Document AI creates the form layer from scratch by reading the visual layout and recognizing where inputs, checkboxes, and signature blocks should exist.

Field Labeling

Each detected field gets a type (text, checkbox, date, signature) and a human-readable label based on its visual context. A field positioned next to “First Name” on the PDF gets labeled accordingly. Date fields accept dates, checkboxes accept booleans, signature fields route to signers. No manual correction of field types after upload.

AI-Generated Field Aliases

Instead of topmostSubform[0].Page1[0].f1_1[0], Document AI generates firstName. Instead of topmostSubform[0].Page1[0].f1_13[0], you get employerEIN. These aliases become the keys in your JSON payload when calling the PDF Filling API.

Your fill request goes from unreadable:

{
  "topmostSubform[0].Page1[0].f1_1[0]": "Jane",
  "topmostSubform[0].Page1[0].f1_2[0]": "Smith"
}

to self-documenting:

{
  "firstName": "Jane",
  "lastName": "Smith"
}

Cleaner code, faster code reviews, fewer bugs from misidentified fields.

Webform Generation

One click converts any PDF template into an embeddable Webform with field names, types, and groupings already mapped. Non-technical users fill it out in a browser. No frontend development required.

Webform Translation

Generated webforms translate into any language instantly. Create once, translate without rebuilding.

AI Signer Detection

Long contracts with multiple signers (buyer, seller, notary, witness) get their signer roles detected and mapped to the correct signature fields automatically. Signer assignments feed directly into Etch E-sign packet creation.

From Upload to API-Ready: The Full Workflow

Upload a PDF. Any PDF. Structured with embedded fields, flat, scanned, or exported from a government form builder that produces incomprehensible field names.
Document AI processes the file automatically. Fields are detected, labeled, typed, and given clean aliases. No button to press, no feature to enable. Anvil’s infrastructure is SOC 2 certified and compliant with HIPAA, GDPR, and eIDAS, so uploaded documents are handled under the same security posture used for signed documents with PKI digital certificates and HSM-stored keys.
The PDF is now an API-fillable template. Your integration code references aliases like dateOfBirth and socialSecurityNumber instead of machine-generated strings.
Fill the template via API. Send a JSON payload with your aliased field names and Anvil returns a completed PDF. The API documentation covers authentication, supported languages (Node.js, Python, curl), and payload structure.
Generate a Webform. If end users need to provide the data themselves, one click produces an embeddable webform that collects data and fills the PDF on submission.
Route for signatures. AI-detected signers flow into Etch E-sign, so the completed document moves to signature collection without additional configuration.

Structured data collected through this workflow is reusable across documents. Capture a new employee’s name, address, SSN, and tax status through a W-4, and that data pre-fills their I-9 and state withholding form without asking them to re-enter anything. One data capture populates every subsequent document that needs it, whether that’s an insurance application feeding into endorsements or a loan application feeding into closing documents.

Who Benefits Most

Insurance

Consider a mid-size carrier writing commercial lines across 30 states. That carrier might maintain 400+ active PDF templates: ACORD forms, surplus lines applications, state-specific endorsements, disclosure forms. Each state publishes its own versions and revises them on its own schedule, sometimes with 30 days’ notice.

When Texas updates its surplus lines stamping form, someone on the ops team opens the new PDF, clicks through every field, maps it, and tests the integration. A day of work for a single form. A full product line refresh across all active states puts an ops analyst in a chair for three to four weeks doing nothing but field mapping. And when the next revision cycle hits six months later, the work resets.

Document AI compresses that cycle. Batch upload the revised forms. Every field gets detected, typed, and aliased automatically. A form that consumed an hour of manual mapping is ready for integration in under a minute. The quarterly staffing problem becomes an afternoon task.

The compounding value shows up in template maintenance. Because Document AI applies consistent aliasing conventions across all uploaded forms, a field labeled insuredName on an ACORD 125 uses the same alias as the corresponding field on a state-specific endorsement. Your integration code references one key per data point, not a different cryptic string for every form variant. When a state releases a new revision, re-upload it and the aliases regenerate. No reverse-engineering which field maps to what.

HR

The same W-4, W-9, I-9, state withholding form, direct deposit authorization, and benefits enrollment packet, processed for every single hire. A company onboarding ten people a quarter sets up those templates once. A staffing agency placing 200 workers a month across 15 states, each with its own withholding form variant, never stops maintaining templates. Someone leaves the ops team, and their replacement has to reverse-engineer which field maps to what, because topmostSubform[0].Page1[0].f1_1[0] doesn’t explain itself.

Document AI turns f1_1[0] into firstName without anyone opening the PDF in an editor. Every form uses the same aliasing conventions, so a single data payload fills the W-4, the I-9, and the state withholding form without field-by-field remapping.

Real Estate

A brokerage closing transactions across three California counties uses different disclosure forms for each one. Cross state lines and the variation multiplies: purchase agreements differ by state, lease addenda differ by municipality, seller disclosure requirements change county by county. The same data points (buyer name, property address, purchase price, closing date) appear on nearly every document in a transaction, but the field names on each PDF are completely different. Document AI produces consistent aliases across all of them. Your integration code references buyerName and purchasePrice regardless of which template those fields came from, which is the difference between automating a multi-document closing and hand-wiring each form individually.

Finance

A mapping error on a Truth in Lending disclosure isn’t a bug ticket. It’s a compliance violation. An APR populated where the interest rate should go, a co-borrower’s SSN swapped with the primary applicant’s: these trigger regulatory scrutiny, delay closings, and can require re-disclosure with reset waiting periods. Manual field mapping under these conditions is slow because it has to be slow, with every mapping double-checked or triple-checked. Document AI gives finance teams API-ready templates where a field labeled annualPercentageRate is unambiguous in a way that field_47 never will be. That clarity reduces the surface area for mapping errors and makes compliance review of the integration itself far more straightforward.

Edge Cases and Limitations

Document AI handles the vast majority of standard PDF documents accurately, but some inputs require attention. Very low-resolution scans (below 150 DPI) can reduce field detection accuracy, particularly for small checkboxes and fine-print fields. Handwritten annotations on scanned documents are not detected as fillable fields, since Document AI looks for printed form structures rather than freeform handwriting.

Ambiguous labels on the original PDF (a field next to text that just says “Date” when there are six date fields on the page) can produce aliases that need manual refinement. Anvil’s template editor lets you rename any alias in seconds, so a quick review pass after upload catches these cases before they reach production code.

Document AI eliminates the vast majority of manual field mapping work in PDF template setup. The remaining edge cases are fast to fix.

Frequently Asked Questions

Does Document AI work on scanned or flat PDFs?

Yes. Document AI uses computer vision to detect fields on flat PDFs and scanned documents that have no embedded form metadata. It reads the visual layout of the page and creates the form layer from scratch, identifying text inputs, checkboxes, signature blocks, and other fillable regions based on the document’s printed structure.

What happens if field detection misses a field or assigns the wrong alias?

You can manually add, remove, or rename any field in Anvil’s template editor after Document AI processes the file. Missed fields can be drawn in directly, and aliases can be renamed with a single click. The AI handles the bulk of the mapping; the editor handles exceptions.

Is Document AI available on the free tier?

Partially. UI-based template creation with Document AI, including AI-assisted box finding and field labeling, is on the free tier with a 20-page cap per PDF. The programmatic Document AI features that power the API workflows in this post (programmatic Document AI, AI schema mapping, 50-page cap per AI-tagged PDF) are on the AI Pack ($99/month) and above. There is no credit card requirement to try the free tier.

Can I use Document AI with the API directly?

Document AI processes PDFs when they are uploaded as templates through Anvil. Once processed, the resulting template with its detected fields and aliases is fully accessible through the PDF Filling API, Etch E-sign API, and workflow APIs. The field detection itself runs on upload, and all downstream API interactions use the generated aliases.

Get Started

UI-based template creation with Document AI is on Anvil's free tier with no credit card and a 20-page cap per PDF. The programmatic Document AI features described in this post (API field aliases, programmatic fill, AI schema mapping, 50-page cap per AI-tagged PDF) are on the AI Pack ($99/month) and above.

Start for free and upload your first PDF template
Read the API documentation for fill, e-sign, and workflow integration details
Request a demo if you want to see Document AI process your specific PDFs before committing engineering time

View allView all articles

Dive Deeper