monthly benchmark · open methodology

The Public AI Receptionist Benchmark

Every month we run 50 standardized test calls against every major AI receptionist on the market - including VantaWeb's Anna - and publish the scores. Open methodology, open transcripts, open scoring rubric. No vendor controls the test, including us.

v0.1 · first run scheduled June 2026 7 vendors 8 criteria 50 calls / vendor / month

Why this exists

Service business owners considering an AI receptionist read marketing pages, watch demo videos, and try to triangulate which platform actually works in their use case. There is no neutral comparison anywhere. Every comparison page is run by a vendor.

So we built the comparison we wanted to exist. We publish the methodology, the test call scripts, the scoring rubric, and the raw transcripts. Anyone can re-run the benchmark. Anyone can challenge a score with evidence.

Calibrated honesty. If every score favored VantaWeb, this benchmark would be worthless as evidence. We include scenarios where Anna underperforms. That is the whole point.

What we measure (8 criteria)

Answer latency

Time from first ring to first word

Intent capture

Did the AI gather: name, callback, service type, urgency, address (where relevant)?

Emergency routing

Did the AI correctly escalate clear emergencies to a human?

Pricing accuracy

Did the AI quote prices that match the vendor's published rates?

Handoff quality

Was the human dispatcher given structured intake data?

Calendar booking

Did the AI book directly into the practice's PMS / scheduling tool?

Voice naturalness

Subjective score on conversational quality (5-judge panel)

Failure handling

When the AI didn't know an answer, did it gracefully escalate or hallucinate?

How a test call works

Each test scenario is a scripted call: a five-judge panel writes 50 scenarios across the most common service-business call types - HVAC emergency, dental new-patient intake, plumbing after-hours, roofing storm-week overflow, towing roadside, veterinary appointment, etc. Each scenario has a defined "ideal outcome" that the receptionist should reach.

We dial each vendor through their public phone number or booked demo, run the scripted scenario, record the call (with consent disclosed in the opening), and score on the 8 criteria above. Scores are averaged across the 50 calls and published monthly.

Current vendor lineup

Vendor Website First measured Current score
VantaWeb (Anna) vantaweb.io June 2026 (scheduled) pending
Smith.ai smith.ai June 2026 (scheduled) pending
Ruby Receptionists www.ruby.com June 2026 (scheduled) pending
Goodcall goodcall.com June 2026 (scheduled) pending
Rosie heyrosie.com June 2026 (scheduled) pending
Arini arini.ai June 2026 (scheduled) pending
MyAIFrontDesk myaifrontdesk.com June 2026 (scheduled) pending

Want your AI receptionist included? Email [email protected] with a public demo or trial link. We add vendors at the start of each monthly cycle.

FAQ

Why is VantaWeb running a benchmark that includes itself?

Because no neutral comparison exists, and shoppers deserve one. We publish the methodology, transcripts, and rubric so anyone can re-run it. Open methodology is the credibility - not our marketing copy.

How often is it updated?

Monthly. New scores and transcripts published the same day the test cycle completes.

Can my vendor be added?

Yes. Email [email protected] with a public demo or trial link.

Is there a paid tier or sponsorship?

No. The benchmark is free. We do not accept sponsorship from any included vendor (including ourselves). The cost is absorbed by VantaWeb.

Try the AI receptionist that's already publishing its own scores.

VantaWeb's Anna runs 24/7 for HVAC, plumbing, dental, roofing, and other service businesses. Plans start at $149/mo.

See pricing What is an AI receptionist?