agents

AI Document Processing

Invoices, contracts, receipts, forms — extracted, validated, and pushed straight into your system of record.

What document processing actually is

You receive PDFs, scans, photos, or emails containing documents — invoices, contracts, receipts, forms. Someone reads each one, types the data into your system, and files the original. We replace the "reads and types" part with code that does it more accurately and at any scale, while flagging the documents that need a human's attention.

Modern document processing is not OCR. OCR returns characters; document agents return decisions.

A modern document agent:

  1. Receives the document.
  2. Runs vision-capable LLM extraction against a typed schema (Zod / Pydantic).
  3. Applies business rules (PO matching, duplicate detection, vendor whitelist, tax validation).
  4. Routes by confidence tier — auto-post, review queue, or reject.
  5. Writes the structured data to your ERP / CRM / Drive with full audit trail and the original file attached.

For a deep walkthrough, see our How AI invoice processing works post.

When it pays off

Document processing pays back quickly when:

  • Volume is non-trivial. At least a few hundred documents per month, ideally a few thousand.
  • Documents are reasonably standardised. Invoices, receipts, POs, claims — yes. Free-form contracts — possible but harder.
  • There's a system of record. ERP, CRM, finance system. We need somewhere to write.
  • The cost of errors is recoverable. A wrong line item is fixable; a wrong wire transfer is not. We build in human review proportional to the cost of error.

It doesn't pay off when volume is low (under ~100 docs/month) or when documents are too varied for a useful schema. We will tell you which side of the line you're on after a discovery sprint.

The pipeline

A typical AP invoice pipeline looks like this:

[Email inbox / portal / scanner]
        ↓
[Ingestion queue with deduplication]
        ↓
[Vision extraction (Claude / Gemini / GPT-4o)] → typed schema
        ↓
[Business rules: PO match, vendor check, tax validation, duplicate check]
        ↓
[Confidence routing]
   ├─ High confidence + matched PO → auto-post to ERP
   ├─ Medium confidence → review queue (human, 60 seconds per doc)
   └─ Low confidence / unknown → reject with structured reason
        ↓
[ERP write + original file attached + audit log]

Every step is observable. Every transition is logged. The reviewer UI is a single-page app that shows the document side-by-side with the extracted fields so the reviewer can correct in seconds, not minutes.

Where the wins come from

Each of the five layers contributes:

LayerWithout itWith it
Vision extractionManual keying, ~3 min/docLLM extraction, ~5 sec/doc + cost €0.02
Schema validationGarbage in, garbage to ERPInvalid → review queue with structured reason
Business rulesErrors discovered downstreamErrors caught at the boundary
Confidence routingAll-or-nothing automationReviewer only sees the 10–15% that need attention
Audit trailCompliance painFull provenance per posted record

Real numbers from a real engagement

From a mid-market distributor we worked with (anonymised), processing ~400 invoices per week before automation:

MetricBeforeAfter (week 8)
Human time per invoice4–6 min30 sec (reviewer only)
% invoices keyed manually100%13%
Cycle time (receipt → posted)2–4 days4 hours
Visible error rate3.2% (estimated)0.8%
Cost per invoice~€3.80~€0.18

Payback hit at month 4. Full details in the Document Intake Agent case study.

How we build it

Phase 1 — Schema definition (1 week)

We sit with the AP / ops team and define what "structured invoice" means for your business — not the generic ANSI 810 standard, but the actual fields you care about, the tax codes you use, the cost centre mappings, the approval thresholds. The output is a Zod / Pydantic schema and a stack of 50–100 sample documents to test against.

Phase 2 — Prototype (2 weeks)

We build the extraction + validation + simple review UI. We run the prototype against the sample documents and against a fresh week of real documents in shadow mode (the human team still does the actual work; we compare). We tune.

Phase 3 — Production build (2–4 weeks)

Confidence routing, ERP integration, audit trail, observability. Vendor-specific templates for high-volume vendors where we know the layouts. Eval suite that runs on every prompt change.

Phase 4 — Phased rollout (1–2 weeks)

10% of inbound → 50% → 100%. Reviewer queue staffed at typical volume from day one to make sure the workflow is real. Daily standup with your team during rollout.

Phase 5 — Iterate (ongoing)

Documents drift. Vendors change layouts. New document types arrive. We keep the eval cadence on retainer and add new schemas as needed.

The stack

LayerDefault
IngestionEmail (Resend webhook) / SFTP / portal upload
StorageFirestore + Cloud Storage
Vision extractionClaude Sonnet 4.6 / Gemini 2.0 Flash for cost-sensitive
Schema validationZod (TypeScript) end-to-end
Business rulesTypeScript on Cloud Functions
Reviewer UINext.js 16 + shadcn/ui + Firebase Auth
ERP integrationCustom — depends on your ERP (we have shipped NetSuite, Dynamics 365, Xero, Sage)
ObservabilityLangfuse + Sentry + custom dashboard
Cost attributionPer-document trace

Pricing — the honest version

EngagementScopeInvestment
Discovery + schema spec1 week€4,000–7,000
Single doc-type pipeline (e.g. invoices)4–6 weeks€25,000–45,000
Multi-doc pipeline8–14 weeks€60,000–120,000
Multi-ERP / multi-tenant12–20 weeks€100,000–200,000
Retainer (operations, evals, new schemas)Monthlyfrom €2,000/month

Pass-through LLM costs typically run €0.01–€0.05 per document depending on length and vision usage.

What we won't do

  • Promise a specific accuracy number before the prototype phase. Accuracy depends on your documents, and we don't bluff.
  • Skip the reviewer queue. Document automation without human review is how clients end up with €50,000 of duplicate payments. We always build the queue, even if the eventual auto-post rate is 95%+.
  • Hide failures. Every document the model couldn't confidently process gets routed to a human with the reason for the routing. Silent failures are the failure mode that breaks trust.

If you have a document workflow that costs your team hours per week, send a note. We respond within one business day.

Frequently asked questions

Related work

Ready to scope ai document processing?

A discovery call is the fastest way to know if there's a fit.