Crafting a Voice Agent with Hume AI and Claude Code

I built RyBot, the voice agent on my portfolio, using Hume's EVI (Empathic Voice Interface) for voice I/O and Claude for intelligence. This combination gives you best-in-class emotion understanding and reasoning in one package.

This article explains why Hume + Claude is such a powerful pairing and what you can build with it.

Why Hume EVI?

Hume's Empathic Voice Interface handles the hard parts of voice AI:

Speech-to-text - Converts user speech to text in real-time
Text-to-speech - Natural voice output with customizable voices
Emotion detection - Analyzes 48 emotions from voice prosody
Interruption handling - Users can interrupt naturally
Voice cloning - Create custom voices from samples

You never touch raw audio. Hume abstracts all of that complexity.

Why Claude?

Claude provides the intelligence layer:

Natural conversation - Responses feel human, not robotic
Tool use - Execute functions, fetch data, trigger actions
Context awareness - Understands nuance and maintains conversation flow
Customizable personality - Define exactly how your agent should behave

The Architecture

The key insight is using a proxy server between Hume and Claude. This gives you full control over the conversation flow.

┌─────────────┐      ┌─────────────┐      ┌─────────────────┐      ┌───────────┐
│    User     │ ←──→ │  Hume EVI   │ ←──→ │  Proxy Server   │ ←──→ │  Claude   │
│  (Browser)  │      │  (Voice)    │      │  (Your Code)    │      │   API     │
└─────────────┘      └─────────────┘      └─────────────────┘      └───────────┘

Your proxy server can:

Inject context (user data, memories, current page)
Execute tools when Claude requests them
Process emotion data to shape responses
Log conversations for analytics
Apply rate limiting and security controls

What Makes It Special

Emotion-aware responses. Hume detects when users are frustrated, excited, confused, or curious. Your agent can adapt its tone accordingly - being more patient when frustration is detected, or matching enthusiasm when excitement is high.

Tool execution. Claude can decide to call functions mid-conversation. Weather lookups, web searches, database queries, UI actions - whatever you wire up.

Memory. Store facts about users across sessions. Your agent remembers names, preferences, and past conversations.

Personality. Voice is fundamentally different from text. Responses need to be shorter, more conversational, and natural-sounding. Claude excels at this with the right prompting.

What You Can Build

With this stack, you can create:

Portfolio assistants - Like RyBot, answering questions about your work
Customer support agents - Handle inquiries with empathy
Interactive tutorials - Voice-guided learning experiences
Accessibility tools - Voice interfaces for any application
Companion apps - Agents that remember and grow with users

Getting Started

If you want to build something similar:

Create a Hume account and explore the EVI playground
Get an Anthropic API key for Claude
Set up a simple Node.js server that proxies between them
Iterate on your system prompt until the personality feels right

RyMetrics