Does the 6-week timeline really hold?

For single-purpose production agents with one or two integrations, yes — we hit it consistently. Bigger builds (multi-channel, multi-integration, enterprise compliance) take 10-16 weeks honestly. The 6-week target is the focused single-purpose case. We tell clients up front if their scope can't hit it.

Is this just because Claude / Cursor / Claude Code are so good now?

Partly. AI-assisted development genuinely shifted our throughput in the last 18 months — we ship ~2× the surface area at the same headcount. But the process patterns (paid discovery, prototype on real data, evals from day one, idempotent tools) were already what made our work different. The AI tooling just made it faster.

What's the biggest difference between this and a typical agency engagement?

Probably the discovery sprint as a paid first engagement. Most agencies do free pitches and then learn the actual scope during the build. We charge for discovery, do real work in it, and the resulting build estimate is grounded in reality. Clients consistently say the discovery alone was worth what they paid.

Does the 6-week timeline really hold?

For single-purpose production agents with one or two integrations, yes — we hit it consistently. Bigger builds (multi-channel, multi-integration, enterprise compliance) take 10-16 weeks honestly. The 6-week target is the focused single-purpose case. We tell clients up front if their scope can't hit it.

Is this just because Claude / Cursor / Claude Code are so good now?

Partly. AI-assisted development genuinely shifted our throughput in the last 18 months — we ship ~2× the surface area at the same headcount. But the process patterns (paid discovery, prototype on real data, evals from day one, idempotent tools) were already what made our work different. The AI tooling just made it faster.

What's the biggest difference between this and a typical agency engagement?

Probably the discovery sprint as a paid first engagement. Most agencies do free pitches and then learn the actual scope during the build. We charge for discovery, do real work in it, and the resulting build estimate is grounded in reality. Clients consistently say the discovery alone was worth what they paid.

All resources

Process

The AI Development playbook: how we ship agents in 6 weeks

May 16, 2026· updated May 21, 20265 min read

What the 6 weeks actually look like

For a typical single-purpose production agent — say, a document intake pipeline or a voice booking agent — our 6-week shape:

Week	What we ship
1	Discovery + spec + eval set + go/no-go
2-3	Working prototype on real data; tune until evals pass
4-5	Production hardening (auth, retries, idempotency, observability, integration polish, reviewer UI if needed)
6	Phased rollout, runbooks, handover

Bigger builds extend each phase proportionally. Smaller ones compress. The shape is consistent.

What we will not do

The hard part isn't what we do — it's what we refuse to do.

No free pitch for a multi-month build. A one-hour intro call doesn't surface real scope. We charge for discovery and the scope is real after.

No agent without evals. Shipping AI-touched workflows without evals is the difference between something that works in a demo and something that breaks in production.

No skip on the reviewer UI. For document agents, the reviewer ergonomics dictate whether the system actually saves time. We treat reviewer UX as a first-class component, not an afterthought.

No approval gates skipped on irreversible actions. Money moves, contracts sign, emails send to customers. All gated. Always.

No lock-in to a single LLM provider. Every agent sits behind a thin provider interface. You can swap in hours.

No "we'll figure out the data flows during implementation." We map them in discovery. Surprises in production cost 10× what discovery costs.

The opinionated stack

We try not to be cargo-cult about tools. We do have defaults that earn their keep.

Layer	Default
Reasoning model	Claude Sonnet 4.6 / 4.7
Voice realtime	OpenAI GPT-4o Realtime
Cost-sensitive volume	Gemini 2.0 Flash / Claude Haiku
Application framework	Next.js 16 + React Server Components
Type safety	TypeScript end-to-end with Zod schemas
Styling	Tailwind + shadcn/ui
Database (relational)	Postgres (Supabase or Cloud SQL)
Database (NoSQL / docs)	Firestore
Vector	pgvector if Postgres exists, Pinecone otherwise
Auth	NextAuth / Firebase Auth / Microsoft Entra
Observability	Langfuse for AI traces, Sentry for errors
Telephony	Twilio
Deploy	Firebase App Hosting / Vercel / Cloud Run
CI	GitHub Actions
Email	Resend

This stack is boring on purpose. It's the one we and our clients can maintain in 2 years.

How AI-assisted development changed our process

In 2024, a 6-week production agent build needed a 3-person team. In 2026, the same build needs a 2-person team with AI accelerators.

What changed:

Code agents write the boring engineering. Typed API clients, schema mappings, migration scripts, test fixtures, glue logic. 5-10× faster than humans.
Eval-driven development is easier. LLMs grade eval outputs at scale. Manual review for edge cases only.
Documentation generates itself. Reasonable first drafts of READMEs, runbooks, API docs.
Pair programming with AI replaces solo work. Two humans + AI is roughly as productive as 3-4 humans without.

What didn't change:

Product judgement still human-driven. Novel architecture, ambiguous requirements, edge case design.
Discovery still in-person (or video). Talking to the people who do the work.
Reviewer UX, system design, integration boundaries all still human work.

See Code & Integration Agent for how we use code agents internally.

The discovery sprint pattern

Week 1 is where the engagement is won or lost. The pattern:

Day 1: kick-off call with sponsors + the operational owner. Map the current workflow. Identify success metrics in finance-grade language.

Day 2-3: shadow the people doing the actual work. Screen-record (with consent). Note the failure modes the documentation misses.

Day 4: write the eval set. 50-100 representative cases with expected behaviors.

Day 5: write the spec. Architecture, technology choices, cost & timeline estimate, risk register, go/no-go criteria.

Day 6-7: review with the client. Iterate. Sign off on the build engagement or stop here.

Cost: €4,000-8,000 depending on complexity. Value: clients consistently say the discovery alone was worth what they paid. Even when we decide together not to do the build, the client walks with a clear understanding of their workflow and a documented spec.

The prototype phase pattern

Weeks 2-3: real end-to-end pipeline on real data. Thin scaffolding around production-grade core.

Production-grade in the prototype:

Real auth.
Real database writes.
Real LLM calls (not mocked).
Real integration with at least one downstream system.

Thin scaffolding (deferred to build phase):

Retry / idempotency polish.
Full reviewer UI (basic version only).
Observability beyond basic logging.
Edge case handling.

The prototype runs in shadow mode — alongside the human process — for a week. We compare outcomes. We tune. The go/no-go decision for full build is made on evidence, not promises.

The build phase pattern

Weeks 4-5: production engineering.

Auth + authorization: real role-based access, never "everyone is admin."
Idempotency: every external action gets a key.
Retry: exponential backoff with jitter; dead-letter queue.
Observability: per-trace logging, dashboards, alerting.
Eval suite in CI: every PR runs against the eval set; regressions block merge.
Cost guardrails: per-run ceilings, per-day budgets, automated alerts.
Approval gates: irreversible actions pause for human approval.
Reviewer UI (for document agents): keyboard-driven, side-by-side document view, fast.
Runbooks: what to do when X breaks.

By end of week 5, the system is ready for phased rollout.

The rollout phase pattern

Week 6: 10% → 50% → 100% over the week. Daily standup with the operational team. Aggressive monitoring.

By end of week 6:

Production traffic is live.
Eval cadence is set (monthly typical).
On-call rota is established.
Handover docs delivered.
Optionally: retainer agreement signed for ongoing operations.

Where to go next

For our full services menu, for the agent types we routinely build, or drop us a note with a workflow in mind. We respond within one business day with an honest take on whether the 6-week shape fits.

For our take on the broader industry context see The state of AI development in 2026. For the buyer's checklist when evaluating any agency see 12 questions to ask.

Frequently asked questions

Keep reading

Article

How much does an AI agent cost? Real numbers from real builds

AI agent builds in 2026 typically cost €4-8k for discovery, €15-30k for a working prototype, €25-80k for production, €2-5k/month for retainer. Per-call infrastructure cost runs €0.01-€0.40 depending on shape. Honest numbers from real builds, with the trade-offs explained.

Article

Hiring an AI development agency: 12 questions to ask

Twelve questions that separate serious AI dev shops from demoware vendors. Asks about evals, observability, code ownership, provider lock-in, references, and what they'll refuse to do. If a vendor can't answer cleanly, walk.

Article

The state of AI development in 2026

In 2026, AI development is shipping production agents that earn their keep — document processing, voice, workflow orchestration — backed by Claude / GPT / Gemini and engineered with evals, observability, and guardrails. What's underrated: well-engineered automation with one or two LLM-judgment steps. What's overrated: 'autonomous AGI' marketing.

Service

AI Agents Development

Custom agents that read documents, hold conversations, take phone calls, and execute multi-step workflows — wired into the systems you already run.

Service

Custom Development

Web apps, mobile apps, dashboards, internal tools. React, Next.js, React Native, Power Apps — picked for the job, not the hype.

Agent type

Code & Integration Agent

API plumbing, schema mapping, OpenAPI client generation, internal tooling

Want this delivered in your stack?

If the article describes a workflow you'd like to ship, drop us a note. We reply within one business day.

Get a proposal