monthly benchmark · open methodology
The Public AI Receptionist Benchmark
Every month we run 50 standardized test calls against every major AI receptionist on the market - including VantaWeb's Anna - and publish the scores. Open methodology, open transcripts, open scoring rubric. No vendor controls the test, including us.
Why this exists
Service business owners considering an AI receptionist read marketing pages, watch demo videos, and try to triangulate which platform actually works in their use case. There is no neutral comparison anywhere. Every comparison page is run by a vendor.
So we built the comparison we wanted to exist. We publish the methodology, the test call scripts, the scoring rubric, and the raw transcripts. Anyone can re-run the benchmark. Anyone can challenge a score with evidence.
Calibrated honesty. If every score favored VantaWeb, this benchmark would be worthless as evidence. We include scenarios where Anna underperforms. That is the whole point.
What we measure (8 criteria)
Answer latency
Time from first ring to first word
Intent capture
Did the AI gather: name, callback, service type, urgency, address (where relevant)?
Emergency routing
Did the AI correctly escalate clear emergencies to a human?
Pricing accuracy
Did the AI quote prices that match the vendor's published rates?
Handoff quality
Was the human dispatcher given structured intake data?
Calendar booking
Did the AI book directly into the practice's PMS / scheduling tool?
Voice naturalness
Subjective score on conversational quality (5-judge panel)
Failure handling
When the AI didn't know an answer, did it gracefully escalate or hallucinate?
How a test call works
Each test scenario is a scripted call: a five-judge panel writes 50 scenarios across the most common service-business call types - HVAC emergency, dental new-patient intake, plumbing after-hours, roofing storm-week overflow, towing roadside, veterinary appointment, etc. Each scenario has a defined "ideal outcome" that the receptionist should reach.
We dial each vendor through their public phone number or booked demo, run the scripted scenario, record the call (with consent disclosed in the opening), and score on the 8 criteria above. Scores are averaged across the 50 calls and published monthly.
Current vendor lineup
| Vendor | Website | First measured | Current score |
|---|---|---|---|
| VantaWeb (Anna) | vantaweb.io | June 2026 (scheduled) | pending |
| Smith.ai | smith.ai | June 2026 (scheduled) | pending |
| Ruby Receptionists | www.ruby.com | June 2026 (scheduled) | pending |
| Goodcall | goodcall.com | June 2026 (scheduled) | pending |
| Rosie | heyrosie.com | June 2026 (scheduled) | pending |
| Arini | arini.ai | June 2026 (scheduled) | pending |
| MyAIFrontDesk | myaifrontdesk.com | June 2026 (scheduled) | pending |
Want your AI receptionist included? Email [email protected] with a public demo or trial link. We add vendors at the start of each monthly cycle.
FAQ
Why is VantaWeb running a benchmark that includes itself?
Because no neutral comparison exists, and shoppers deserve one. We publish the methodology, transcripts, and rubric so anyone can re-run it. Open methodology is the credibility - not our marketing copy.
How often is it updated?
Monthly. New scores and transcripts published the same day the test cycle completes.
Can my vendor be added?
Yes. Email [email protected] with a public demo or trial link.
Is there a paid tier or sponsorship?
No. The benchmark is free. We do not accept sponsorship from any included vendor (including ourselves). The cost is absorbed by VantaWeb.
Try the AI receptionist that's already publishing its own scores.
VantaWeb's Anna runs 24/7 for HVAC, plumbing, dental, roofing, and other service businesses. Plans start at $149/mo.
See pricing What is an AI receptionist?