Does it sound like a robot?

Modern realtime voice models with the right voice and pacing settings are uncanny — most callers don't realise it's AI for the first 20–30 seconds. We tune voice, pacing, and personality during the build, and we always disclose at call start that the caller is speaking with an AI. Disclosure builds trust; pretending to be human breaks it the moment something goes wrong.

Can it interrupt and be interrupted?

Yes. GPT-4o Realtime supports proper barge-in — the agent stops talking when the caller starts. This is the single biggest UX difference between a 2026 voice agent and the 2020-era IVR you remember. We tune the sensitivity so the agent doesn't stop on background noise or short backchannels ('mm-hm', 'right').

What if the caller asks something the agent doesn't know?

Explicit escalation triggers: explicit 'speak to a human' intent, repeated misunderstanding, sensitive topics, anything outside the configured scope. Warm transfer with conversation context attached so the human picks up where the agent left off. Outside business hours, the agent logs a callback request with CRM context.

How does it integrate with our calendar / CRM?

Direct integrations with Google Calendar, Microsoft 365 / Outlook, Calendly, Cal.com, Acuity, HubSpot, Salesforce, Pipedrive. For unusual or custom CRMs we build a thin custom connector. The agent uses function-calling — concrete tools like `findAvailability(date, durationMinutes)`, `bookAppointment(slotId, patient)`, `logLead(qualifyingInfo)` — that we implement against your APIs.

What about recording compliance?

Recording is opt-in per jurisdiction. The agent's opening message includes consent disclosure where required. Recordings and transcripts are stored in your cloud (Firebase, S3, Azure Blob) with the retention policy you set — typical default is 90 days, then auto-delete. We do not store recordings on our infrastructure.

Does it sound like a robot?

Modern realtime voice models with the right voice and pacing settings are uncanny — most callers don't realise it's AI for the first 20–30 seconds. We tune voice, pacing, and personality during the build, and we always disclose at call start that the caller is speaking with an AI. Disclosure builds trust; pretending to be human breaks it the moment something goes wrong.

Can it interrupt and be interrupted?

Yes. GPT-4o Realtime supports proper barge-in — the agent stops talking when the caller starts. This is the single biggest UX difference between a 2026 voice agent and the 2020-era IVR you remember. We tune the sensitivity so the agent doesn't stop on background noise or short backchannels ('mm-hm', 'right').

What if the caller asks something the agent doesn't know?

Explicit escalation triggers: explicit 'speak to a human' intent, repeated misunderstanding, sensitive topics, anything outside the configured scope. Warm transfer with conversation context attached so the human picks up where the agent left off. Outside business hours, the agent logs a callback request with CRM context.

How does it integrate with our calendar / CRM?

Direct integrations with Google Calendar, Microsoft 365 / Outlook, Calendly, Cal.com, Acuity, HubSpot, Salesforce, Pipedrive. For unusual or custom CRMs we build a thin custom connector. The agent uses function-calling — concrete tools like `findAvailability(date, durationMinutes)`, `bookAppointment(slotId, patient)`, `logLead(qualifyingInfo)` — that we implement against your APIs.

What about recording compliance?

Recording is opt-in per jurisdiction. The agent's opening message includes consent disclosure where required. Recordings and transcripts are stored in your cloud (Firebase, S3, Azure Blob) with the retention policy you set — typical default is 90 days, then auto-delete. We do not store recordings on our infrastructure.

All agent types

Agent type

Voice & Phone Agent

After-hours bookings, lead qualification, customer service overflow, FAQ lines

What a voice agent actually does

You give it a phone number. It picks up. It listens, speaks, interrupts cleanly, and recovers from confusion. It executes the tools you give it. It hangs up.

That's the whole surface. The interesting parts are everything around it — the model choice, the voice, the pacing, the tool implementations, the observability, the escalation paths — but the user-facing product is exactly that simple.

A typical 90-second booking call goes:

Agent: "Thanks for calling [practice]. You're speaking with an AI assistant — I can help you book, reschedule, or get you to a human if you prefer. How can I help?"
Caller: "I need to book a check-up for next Tuesday."
Agent: [calls findAvailability(2026-05-26, 30)] "I have 10am, 11:30am, or 2pm. Which works?"
Caller: "10am please."
Agent: "Can I have your name and date of birth?"
Caller: "[gives details]"
Agent: [calls bookAppointment(slotId, patient)] "Booked. You'll get a confirmation by SMS. Anything else?"

End: transcript and recording in Firestore, CRM updated, SMS sent, call counted in the dashboard.

When this agent is the right call

You should consider a voice agent when:

You are losing calls. After-hours, peak hours, vacation cover, unanswered inbound. Voicemail-to-callback has brutal drop-off.
The conversation has shape. Booking, qualification, order status, FAQ — predictable structure where the human work is mostly transactional.
Volume justifies it. ~100+ relevant calls per month at the bottom end.
You can record. Recording is non-negotiable for evaluation; if regulation forbids it, the agent will silently drift.

You should not use a voice agent when:

High-empathy work (medical triage, mental health, complaint resolution). Use it as a router to a human, not as the conversation itself.
Compliance forbids automation (some debt collection laws, some healthcare scenarios).
Your audience hates phone bots categorically. (More common in some markets than others.)

Our Voice AI buyer's guide covers the decision in detail.

The stack

Layer	Default
Telephony	Twilio Voice (Stream API for realtime audio)
Realtime model	OpenAI GPT-4o Realtime
Fallback ASR	Whisper-large-v3
Fallback TTS	OpenAI TTS / ElevenLabs
Function calling	OpenAI tools
Backend	Node.js on Cloud Run with WebSocket
State	Firestore or Postgres
Recording	Twilio + your cloud storage
Observability	Langfuse + structured logs + Slack alerts

Anatomy of a production voice agent

The minimum production voice agent has:

System prompt that defines persona, scope, escalation triggers, and the voice's "rules of engagement."
Tool definitions that map intents to concrete actions in your systems.
Function-calling loop that lets the model invoke tools mid-conversation.
Recording + transcription for every call, stored in your cloud.
Eval harness that replays past calls against new prompts/models.
Observability dashboard showing per-call cost, latency, outcomes, and trends.
Escalation paths — warm transfer during business hours, callback queue outside.

A reference implementation in code lives in our Twilio + GPT-4o walkthrough post.

Cost economics

Roughly €0.10–€0.40 per call in production:

Twilio voice: ~€0.015/minute
GPT-4o Realtime: ~€0.10/minute conversation (counts both directions)
Function-call tool latency / cost: variable
Storage and observability: trivial

A typical 90-second booking call: ~€0.12. A 5-minute support call: ~€0.40. Surface this on the dashboard so you can correlate spend to outcomes.

Timeline

Scope	Duration
Single-intent booking line	3–4 weeks
Multi-intent qualification + CRM	4–6 weeks
Multi-language or multi-line	6–10 weeks
Enterprise voice platform (many lines, many integrations)	10–16 weeks

Common failure modes we've seen

The voice sounds wrong for the brand. Default voices are too neutral or too enthusiastic. We tune voice + pacing during the build, not after deployment.
The agent answers questions it shouldn't. A booking agent suddenly being asked medical advice. We define scope explicitly and add refusal patterns.
Cold transfer to a human, who restarts from scratch. We always pass conversation context with the transfer.
No recording. Cannot evaluate. Cannot tune. Cannot debug. The agent silently degrades over months. We refuse engagements without recording.
Vendor lock-in to Twilio's "Studio" or similar low-code IVR builders. Easy to start, painful to scale. We default to code from day one.

Where it pairs

Voice agents commonly chain with:

Conversational agents for the same knowledge surface served in chat.
Workflow orchestrators that pick up after the call — send follow-up emails, schedule reminders, kick off downstream automations.
Document processing agents when the caller references documents that need to be retrieved or validated mid-call.

See Voice Concierge for a full end-to-end build, or drop us a note with the call surface you'd like to automate.

Frequently asked questions

Parent service

Voice & Phone AI Agents

AI receptionists, booking lines, and qualification calls — wired to your calendar, CRM, and ticketing.

Case study

Voice Concierge

AI phone agent for after-hours bookings

Article

Building a phone agent with Twilio + GPT-4o: a complete walkthrough

Build a phone agent: Twilio provisions the number and streams audio, a Node.js bridge on Cloud Run pipes the audio to GPT-4o Realtime, function-calling tools execute real actions (book appointment, log lead, transfer). Recording, transcript, and observability on every call. Production deployment in 3-6 weeks.

Article

Voice AI for service businesses: a buyer's guide

Voice AI works for service businesses with predictable call patterns and meaningful inbound volume. Booking, qualification, status, FAQ. Real cost ~€0.10-0.40/call. Real build cost €15-50k for a single-line deployment. Evaluate vendors on recording, escalation paths, and CRM integration — not on the demo.

Article

Why your AI chatbot fails (and what to fix)

Most chatbots that fail in production fail for one of six reasons: no retrieval, bad retrieval, no evals, no escalation, no observability, no scope. Tuning the prompt won't fix any of them. The fix is engineering — and the engineering is well-understood by now.

Want to scope a voice & phone agent project?

Tell us the workflow. We'll come back within one business day with a clear next step.

Get a proposal

Frequently asked questions

Does it sound like a robot?

Can it interrupt and be interrupted?

What if the caller asks something the agent doesn't know?

How does it integrate with our calendar / CRM?

What about recording compliance?

Related

Voice & Phone AI Agents

Voice Concierge

Building a phone agent with Twilio + GPT-4o: a complete walkthrough

Voice AI for service businesses: a buyer's guide

Why your AI chatbot fails (and what to fix)

Want to scope a voice & phone agent project?