| Approach | Setup | Monthly Fixed | Per Minute | LLM | Best For |
|---|---|---|---|---|---|
| WhatsApp Voice Notes | 1-2 days | $0 | ~$0.05 | Any (Claude) | Daily async use |
| OpenAI Realtime SIP | 1-2 days | ~$1 (Twilio) | ~$0.30 | GPT only | Real-time phone calls |
| LiveKit + Claude | 3-5 days | ~$1 + hosting | ~$0.15-0.20 | Any (Claude) | Full control, Claude as brain |
| Vapi.ai | 1 hour | $0 | ~$0.15-0.30 | Any | Fast prototype |
| ElevenLabs Agent | 1-2 days | $5-99 | ~$0.10 | Any | Best voice quality |
NanoClaw already has a /add-voice-transcription skill that adds voice note processing. The flow:
Cost: ~$0.003 per 30-second voice note (Whisper) + Claude API. No new infrastructure. Built-in NanoClaw skill.
OpenAI added native SIP support, making phone-to-AI remarkably simple:[1]
The model supports function calling mid-conversation — it could check portfolio positions or trigger actions while you're talking. However, it's GPT only (no Claude).
Cost: ~$0.30/min with gpt-realtime, ~$0.11/min with gpt-realtime-mini. A 5-minute call runs $0.55-1.50.
Open-source WebRTC framework with Python SDK. Plug in any STT (Deepgram), LLM (Claude), and TTS (ElevenLabs). Self-hostable. Native Twilio SIP integration for phone calls.[2]
Most flexible but most setup work. Higher latency (~500-800ms) vs. OpenAI Realtime (~300-500ms) because it's a pipeline, not native speech-to-speech.
| Priority | What | Effort | Value |
|---|---|---|---|
| 1 | WhatsApp voice notes — receive voice, transcribe, respond with text | 1-2 days | High (huge convenience gain) |
| 2 | Skip to other priorities (email, calendar, scheduled tasks) | — | Higher ROI than voice calls |
| 3 | Phone calling via SIP — only after months of WhatsApp use proves the need | 1-2 days | Low for personal use |