Document Intake Agent
Supplier invoices end-to-end with an agentic pipeline
At a glance
| Client | Demo build (template for our AP engagements) |
| Industry | Accounts payable / mid-market distribution |
| Engagement | Demo / template — full client builds typically 4–6 weeks |
| Stack | Claude vision, Zod, Firestore, Cloud Functions, NetSuite integration |
| Status | Template, ready to instantiate for client engagements |
The challenge
Mid-market finance teams routinely spend dozens of hours per week keying supplier invoices into their ERP. The work is high-volume, low-margin, and prone to errors that surface downstream when accounts don't reconcile.
Existing tools have problems:
- OCR-only tools miss line-item logic. They return characters, not decisions. They get confused by varied layouts and never improve.
- Full ERP AP modules (NetSuite AP, Dynamics 365 AP) are heavyweight, require dedicated implementations, and still need a human to key in the fields OCR couldn't read.
- Off-the-shelf "AI invoice processing" SaaS ship impressive demos against their own clean test invoices, then meet your real-world supplier mix and disappoint.
The demo target: a pipeline that actually works on real-world supplier invoices, with honest measurement of error rates, and a human-in-the-loop that catches the edge cases gracefully.
What we built
The reference pipeline that we now use as the starting point for every AP automation engagement we ship.
1. Ingestion
Supplier invoices arrive via three channels: email-to-AP inbox, supplier portal upload, and the occasional SFTP drop. A Resend webhook captures email attachments, a portal route handles uploads, and a scheduled job watches the SFTP folder. All three converge on a single Firestore queue with content-hash deduplication.
2. Vision extraction
The agent picks up a queue item and runs Claude Sonnet vision with a typed Zod schema for the invoice structure:
const InvoiceSchema = z.object({
vendor: z.object({
name: z.string(),
taxId: z.string().optional(),
addressLines: z.array(z.string()),
}),
invoice: z.object({
number: z.string(),
date: z.string(),
dueDate: z.string().optional(),
currency: z.string(),
}),
lineItems: z.array(z.object({
description: z.string(),
quantity: z.number(),
unitPrice: z.number(),
taxRate: z.number(),
lineTotal: z.number(),
poRef: z.string().optional(),
})),
totals: z.object({
subtotal: z.number(),
tax: z.number(),
total: z.number(),
}),
paymentTerms: z.string().optional(),
});
Schema-validated output means downstream code can rely on structure. Validation failures route the document to a human review queue with the structured error.
3. PO matching with tolerance
For each extracted line item, the agent queries NetSuite for candidate POs from the same vendor in a reasonable date window. Matching logic applies tolerance rules:
- Line total within €5 or 1% (whichever is greater) is a match.
- Description fuzzy-match against PO line description.
- Tax rate must match exactly.
- Currency must match.
Matched line items get a PO reference attached. Unmatched ones are flagged for human review.
4. Confidence-tier routing
The agent assigns each document one of three tiers:
| Tier | Criteria | Action |
|---|---|---|
| High | All line items matched to PO, all schema-valid, vendor on whitelist, no duplicate | Auto-post to NetSuite |
| Medium | Some matches, minor mismatches, or new vendor | Route to human review queue |
| Low | Schema validation failed, no PO matches, suspicious patterns | Reject with structured reason; notify supplier |
The tier thresholds are tunable per client.
5. NetSuite posting + audit
For documents that auto-post or pass review:
- NetSuite bill record created with line items.
- Original PDF attached to the bill record.
- Audit log entry written: document ID, extraction trace, decision history, reviewer (if applicable), NetSuite record reference.
- Email marked processed in the source mailbox.
For documents that reject:
- Reject reason logged.
- Supplier auto-notified with the structured reason and a request to resubmit.
- Document held in the queue for resubmission tracking.
Architecture
[Email inbox] [Portal upload] [SFTP]
\ | /
[Firestore ingestion queue with content-hash dedup]
↓
[Claude vision extraction]
↓
[Zod schema validation]
↓
[PO match against NetSuite]
↓
[Confidence tier]
├─ high → NetSuite post
├─ med → Review queue (Next.js admin UI)
└─ low → Reject + supplier notify
↓
[Audit log + dashboard]
The reviewer UI
A single-page Next.js app showing the document beside the extracted fields. Reviewer corrects in 30 seconds — type, click, confirm — and the agent learns from the correction by storing it as a future training/eval example.
Per-document review actions:
- Approve (auto-post on approval)
- Correct field and approve
- Reject (with reason)
- Escalate to manager
Key features shipped
- Multi-channel ingestion (email, portal, SFTP) with deduplication.
- Claude vision extraction with typed Zod schema.
- PO matching with configurable tolerance rules.
- Confidence-tier routing (auto-post / review / reject).
- NetSuite integration with custom field mapping.
- Reviewer UI — fast keyboard-driven correction workflow.
- Audit trail queryable per document, per supplier, per period.
- Eval suite with sample documents per supplier.
- Cost-per-document attribution on the dashboard.
Target outcomes for a typical engagement
| Metric | Before | After (week 8) |
|---|---|---|
| Human time per invoice | 4–6 minutes | ~30 seconds (reviewer only) |
| % keyed manually | 100% | ~13% |
| Cycle time (receipt → posted) | 2–4 days | ~4 hours |
| Visible error rate | ~3% (estimated) | <1% |
| Cost per invoice (loaded) | ~€3.80 | ~€0.18 |
What we learned
The schema sprint is the highest-leverage step. Spending the first week sitting with the AP team and defining what "structured invoice" means for their business — not the generic ANSI 810 standard — is what makes the rest of the pipeline work.
General extraction beats per-vendor templates 80% of the time. Vision LLMs handle layout variation better than template-based OCR ever did. We reserve per-vendor templates for the highest-volume suppliers where they shave cost and latency.
The reviewer UI matters more than the model. Reviewer ergonomics dictate whether the system actually saves time. We spent more engineering on the review UX than on the extraction prompt.
Shadow mode is non-negotiable. Two weeks of running the agent in parallel with the human team before cutover catches a class of bugs that no eval suite can find.
Where to go next
For the full technical walkthrough of how this pipeline works, see our How AI invoice processing works post and our AI agents for accounts payable post.
If you have a real-world AP volume problem, drop us a note. We'll come back within a business day with a feasibility take and a discovery-phase quote.
Related
AI Document Processing
Invoices, contracts, receipts, forms — extracted, validated, and pushed straight into your system of record.
AI Agents Development
Custom agents that read documents, hold conversations, take phone calls, and execute multi-step workflows — wired into the systems you already run.
Document Processing Agent
Invoices, contracts, receipts, and forms → structured data with confidence-tier human review
Workflow Orchestrator Agent
Cross-SaaS triggers — Microsoft 365, Slack, Sheets, HubSpot, Stripe — with idempotency and approvals
How AI invoice processing actually works (and where it breaks)
Modern AI invoice processing uses vision LLMs (Claude, GPT-4o, Gemini) to extract structured data from PDFs and images, then validates against business rules and routes by confidence — auto-post, review queue, or reject. The model is not the hard part; the schema, the reviewer UI, and the eval suite are.
AI agents for accounts payable: a deployment guide
AI agents in AP automate the high-volume, low-margin work of invoice keying and PO matching. Honest savings: €3-5 per invoice in loaded cost, 70-90% reduction in human handling time, payback typically 4-8 months on €25-50k builds. The agent isn't the hard part — the reviewer UI and the ERP integration are.
Have a similar problem?
A 30-minute call will tell us if there's a fit. No prep needed — just bring the messy version of the workflow.