Voice — NanoClaw Operations

Verdict

Should you build voice calls? Not now — low ROI for personal use

WhatsApp voice notes? Yes — covers 90% of the use case

Best voice call approach OpenAI Realtime SIP (if ever needed)

Cost per minute (voice call) ~$0.30/min (OpenAI Realtime) or ~$0.15/min (DIY pipeline)

1. Why Voice Is a Distraction (For Now)

Where Text (WhatsApp) Is Better

Precision — ticker symbols, numbers, parameters are more reliable in text
Audit trail — searchable, quotable, archivable
Async — respond when ready, not in real-time
Cost — 10-100x cheaper than voice API calls
Complexity — multi-step instructions are clearer in text
Charts/tables — voice cannot show you a chart

Where Voice Adds Value

Driving/walking — dictating trade ideas hands-free
Quick status checks — "What's my PnL today?"
Brainstorming — talking through a thesis is more natural
Emergency alerts — agent calls YOU when something breaks

Assessment: WhatsApp voice notes handle the "I want to talk to my agent" use case at near-zero cost and zero new infrastructure. The 10-second round-trip latency of a voice note is rarely a problem when you're the only user. Build phone calling only if you find yourself repeatedly wishing you could interrupt the agent mid-response.

2. Option Comparison

Approach	Setup	Monthly Fixed	Per Minute	LLM	Best For
WhatsApp Voice Notes	1-2 days	$0	~$0.05	Any (Claude)	Daily async use
OpenAI Realtime SIP	1-2 days	~$1 (Twilio)	~$0.30	GPT only	Real-time phone calls
LiveKit + Claude	3-5 days	~$1 + hosting	~$0.15-0.20	Any (Claude)	Full control, Claude as brain
Vapi.ai	1 hour	$0	~$0.15-0.30	Any	Fast prototype
ElevenLabs Agent	1-2 days	$5-99	~$0.10	Any	Best voice quality

3. How Each Approach Works

A. WhatsApp Voice Notes (Recommended First Step)

NanoClaw already has a /add-voice-transcription skill that adds voice note processing. The flow:

Voice Note
WhatsApp

→

Whisper
OpenAI STT

→

Claude
processes text

→

Text Reply
WhatsApp

Cost: ~$0.003 per 30-second voice note (Whisper) + Claude API. No new infrastructure. Built-in NanoClaw skill.

B. OpenAI Realtime SIP (If Voice Calls Are Needed Later)

OpenAI added native SIP support, making phone-to-AI remarkably simple:^[1]

Phone Call
any phone

→

Twilio SIP
~$1/mo + $0.014/min

→

OpenAI Realtime
native speech-to-speech

→

Response
voice output

The model supports function calling mid-conversation — it could check portfolio positions or trigger actions while you're talking. However, it's GPT only (no Claude).

Cost: ~$0.30/min with gpt-realtime, ~$0.11/min with gpt-realtime-mini. A 5-minute call runs $0.55-1.50.

C. LiveKit + Claude (If Claude Must Be the Brain)

Open-source WebRTC framework with Python SDK. Plug in any STT (Deepgram), LLM (Claude), and TTS (ElevenLabs). Self-hostable. Native Twilio SIP integration for phone calls.^[2]

Most flexible but most setup work. Higher latency (~500-800ms) vs. OpenAI Realtime (~300-500ms) because it's a pipeline, not native speech-to-speech.

4. Recommended Priority

Priority	What	Effort	Value
1	WhatsApp voice notes — receive voice, transcribe, respond with text	1-2 days	High (huge convenience gain)
2	Skip to other priorities (email, calendar, scheduled tasks)	—	Higher ROI than voice calls
3	Phone calling via SIP — only after months of WhatsApp use proves the need	1-2 days	Low for personal use

Sources

OpenAI Realtime SIP Integration — Official Docs
LiveKit Voice AI Quickstart — LiveKit Docs
OpenAI Realtime API Pricing — OpenAI
Vapi.ai Pricing
ElevenLabs Pricing

Voice Calls with Your AI Agent