Question 1

How well does the agent handle accents, background noise, or callers who interrupt?

Accepted Answer

Modern realtime voice models (GPT-4o Realtime, Whisper-large-v3) handle accent and noise far better than the IVRs you remember from 2020. Interruption handling depends on the model — GPT-4o Realtime supports proper barge-in and turn-taking. We test against real call recordings from your business before declaring an agent production-ready, and we tune the voice and pacing to your customer base.

Question 2

What does the cost per call look like in production?

Accepted Answer

Roughly €0.10–€0.40 per call depending on call length, model used, and integrations called. A typical 90-second booking call costs around €0.12 (Twilio voice + GPT-4o Realtime + a few tool calls). We surface cost-per-call on the dashboard so you can correlate spend to outcomes.

Question 3

What happens if the agent can't handle a call?

Accepted Answer

Warm transfer to a human during business hours, or a call-back ticket logged in your CRM for follow-up outside business hours. The agent is configured with explicit fallback intents — 'I'd like to speak to a person', 'this is an emergency', repeated misunderstandings — that trigger transfer or escalation. Every call ends with one of three states: handled, transferred, or queued for follow-up.

Question 4

Can the agent book appointments directly into our calendar?

Accepted Answer

Yes. We integrate with Google Calendar, Microsoft 365 / Outlook, Calendly, Cal.com, Acuity, and most major scheduling systems. The agent finds availability, confirms with the caller, books the slot, and emails / texts the confirmation. If your scheduling tool is unusual, we build a custom connector.

Question 5

Do you record calls? Where are the recordings stored?

Accepted Answer

Yes, by default, with caller consent disclosure at call start (legally required in most jurisdictions). Recordings and transcripts are stored in your cloud — your Firebase project, your S3 bucket, your data residency. We do not retain them on our side. We make it easy to set retention windows (e.g. delete after 90 days) and purge on request.

Question 6

How do you handle multiple languages?

Accepted Answer

Single-language deployment is straightforward (English, German, French, Spanish, Italian, Portuguese, plus most European and major Asian languages via GPT-4o Realtime). Multilingual single-call (where one call switches between two languages) is harder and requires more careful prompt engineering — we have done it for clients with EN+DE and EN+SR mixed customer bases. We will tell you what works and what doesn't before we build.

Question 7

What's the SLA / uptime story?

Accepted Answer

Twilio's voice SLA is 99.95%. Our infrastructure (Cloud Functions or Cloud Run) is configured for high availability with regional failover where the customer needs it. Realtime LLM providers (OpenAI, Anthropic) have their own SLAs. We monitor end-to-end call success rate on the dashboard and alert when it drops below threshold.

Question 8

How long does a voice agent take to build?

Accepted Answer

Simple booking line with one intent (book / cancel / reschedule): 3–4 weeks. Multi-intent qualification line with CRM logging and warm transfer: 4–6 weeks. Multi-language or multi-business-line: 6–10 weeks. We always ship a working version first that handles the top three intents, then iterate.

Layer	Default choice
Telephony	Twilio Voice (Stream API)
Realtime model	OpenAI GPT-4o Realtime
ASR fallback (if non-realtime)	Whisper-large-v3
TTS fallback	OpenAI TTS / ElevenLabs
Function calling	OpenAI tools / Anthropic tools
Backend	Node.js on Cloud Run with WebSocket support
State	Firestore or Postgres
Recording / transcript	Twilio + your cloud storage
Observability	Langfuse + structured logs + Slack alerts

Engagement	Scope	Investment
Discovery + scripted demo	5–7 days	€3,500–6,000
Single-intent booking line	3–4 weeks	€15,000–25,000
Multi-intent qualification line + CRM	4–6 weeks	€25,000–50,000
Multi-language or multi-line program	6–10 weeks	€50,000–100,000
Ongoing retainer (eval + tuning)	Monthly	from €1,500/month

Voice & Phone AI Agents

What a voice agent actually does

When a voice agent is the right call

The stack

Anatomy of a production voice agent

Process

1. Discovery — 3 to 5 days

2. Script + prompt design — 3 to 5 days

3. Build — 1 to 3 weeks

4. Eval + tuning — 1 week

5. Launch — phased

6. Iterate

What good observability looks like for voice

Pricing — the honest version

What we will not do

Frequently asked questions

Related work

Voice & Phone Agent

Voice Concierge

Building a phone agent with Twilio + GPT-4o: a complete walkthrough

Voice AI for service businesses: a buyer's guide

Why your AI chatbot fails (and what to fix)

Ready to scope voice & phone ai agents?