Voicemode
Pending ScanThis skill provides voice interaction capabilities for AI assistants. This skill should be used when users mention "voice mode", "voicemode", "speak to me", "talk to me", "have a voice conversation", "converse", ask to "check voice service status", "start Whisper", "start Kokoro", ask about "voice configuration", mention "STT", "TTS", "speech-to-text", "text-to-speech", or need help with "voice setup", "voice troubleshooting", or "voice preferences".
Skill Content
18,602 charactersVoiceMode
Voice interaction capabilities for Claude Code - enabling natural conversations through speech-to-text (STT) and text-to-speech (TTS) services.
Naming Clarification
There are two related names to be aware of:
| Name | What it is | Example usage |
|------|------------|---------------|
| voicemode | CLI command (no hyphen) | voicemode whisper service status |
| voice-mode | Python package on PyPI (with hyphen) | uvx voice-mode-install |
Check if CLI is installed:
which voicemode # Should show path like ~/.local/bin/voicemode
voicemode --version # Should show version number
If not installed:
# Option 1: Install permanently
uv tool install voice-mode
# Option 2: Run without installing (uses uvx)
uvx voice-mode <command> # Equivalent to: voicemode <command>
When to Use MCP Tools vs CLI
| Use Case | Recommended | Why |
|----------|-------------|-----|
| Voice conversations | MCP (voicemode:converse) | Faster - MCP server already running |
| Service management | CLI (voicemode service) | Works without MCP server |
| Installation | CLI (voice-mode-install) | One-time setup |
| Model management | CLI (voicemode whisper model) | Administrative task |
| Configuration | CLI (voicemode config) | Edit settings directly |
Claude Code Plugin
VoiceMode is available as a Claude Code plugin from the marketplace:
# Install from marketplace
/plugin marketplace add mbailey/voicemode
/plugin install voicemode
The plugin provides:
- MCP Server - Full voice capabilities via
voicemode-mcp - Slash Commands -
/voicemode:converse,/voicemode:status, etc. - Hooks - Sound feedback during tool execution
After installing the plugin, install voice services:
/voicemode:install
For detailed plugin documentation, see docs/guides/claude-code-plugin.md in the voicemode repo.
Quick Start
When a user wants to use voice mode for the first time, guide them through these steps:
1. Check Service Status
First, check if voice services are already running:
# Check STT service (Whisper)
voicemode:service("whisper", "status")
# Check TTS service (Kokoro)
voicemode:service("kokoro", "status")
2. Install Services if Needed
If services aren't installed, guide the user to install them:
Prerequisites:
- FFmpeg (required for audio processing)
- Python 3.11+ (for VoiceMode installation)
Installation:
# Install VoiceMode with UV (recommended)
uvx voice-mode-install --yes
This installs the VoiceMode package and CLI. It does NOT install local speech services.
Local Voice Services (Apple Silicon Recommended)
When to offer local services:
- On Apple Silicon Macs, local services are highly recommended - they provide privacy, speed, and work offline
- Check architecture with:
uname -m(arm64 = Apple Silicon) - If Apple Silicon, ask the user: "Would you like to install local voice services? This provides faster, private, offline voice capabilities."
Get informed consent before installing:
Tell the user what will be downloaded:
| Service | Download Size | Disk Space | First Start Time | |---------|---------------|------------|------------------| | Whisper (tiny) | ~75MB | ~150MB | 30 seconds | | Whisper (base) | ~150MB | ~300MB | 1-2 minutes | | Whisper (small) | ~460MB | ~1GB | 2-3 minutes | | Kokoro TTS | ~350MB | ~700MB | 2-3 minutes |
Recommended setup for most users: Whisper base + Kokoro = ~500MB download, ~1GB disk space.
After user consents, install services:
# Install Whisper for local STT (base model recommended)
voicemode whisper service install
# Install Kokoro for local TTS
voicemode kokoro install
Services auto-start after installation and are configured to start on login.
First Run - Model Downloads:
When services start for the first time, they download AI models. The first converse call may be slow while models load. Subsequent starts are instant.
Check Model Download Progress:
# Whisper model location - check if download complete
ls -lh ~/.voicemode/services/whisper/models/
# Kokoro model location
ls -lh ~/.voicemode/services/kokoro/models/
# Watch service logs during download
voicemode whisper service logs -f
voicemode kokoro logs -f
Choose a Different Whisper Model:
# Smaller/faster (good for testing)
voicemode whisper install --model tiny # ~75MB
# Larger/more accurate
voicemode whisper install --model small # ~460MB
voicemode whisper install --model medium # ~1.5GB
3. Start Your First Conversation
Once services are running, start a voice conversation:
# Simple greeting
voicemode:converse("Hello! I'm ready to talk. What would you like to discuss?")
# The tool will:
# - Speak the message using TTS
# - Listen for the user's response
# - Return the transcribed text
That's it! You're now in a voice conversation.
Core Capabilities
Voice Conversations
The converse tool is your primary interface for voice interactions:
# Basic usage - speak and listen
voicemode:converse("How can I help you today?")
# Speak without waiting for response (for narration)
voicemode:converse("Let me search for that information", wait_for_response=False)
# With specific voice
voicemode:converse(
message="I found the answer",
voice="nova",
tts_provider="openai"
)
Key Parameters:
message(required): Text to speakwait_for_response(default: true): Whether to listen for user responsevoice: TTS voice name (auto-selected if not specified)tts_provider: "openai" or "kokoro" (auto-selected based on availability)listen_duration_max: Maximum listening time in seconds (default: 120)
Service Management
Manage voice services using the service tool:
# Check status
voicemode:service("whisper", "status")
voicemode:service("kokoro", "status")
# Start/stop services
voicemode:service("whisper", "start")
voicemode:service("kokoro", "stop")
voicemode:service("whisper", "restart")
# View logs for troubleshooting
voicemode:service("whisper", "logs", lines=50)
Available Services:
whisper: Local STT using Whisper.cppkokoro: Local TTS with multiple voiceslivekit: Room-based real-time communication (advanced)
Service Actions:
status: Check if running and resource usagestart: Start the servicestop: Stop the servicerestart: Restart the servicelogs: View recent logsenable: Configure to start at boot/logindisable: Remove from startup
Common Workflows
Having a Voice Conversation
Pattern 1: Question and Answer
# Ask a question
voicemode:converse("What would you like to work on today?")
# User responds via voice
# Response text is returned for you to process
# Continue the conversation
voicemode:converse("Great! Let me help you with that.")
Pattern 2: Narrating Actions (Default Behavior)
When performing actions, speak without waiting to create natural flow:
# Announce action without waiting
voicemode:converse("Let me search the codebase for that", wait_for_response=False)
# Perform the action in parallel
Grep(pattern="function_name", path="/path/to/code")
# Announce results
voicemode:converse("I found 5 matches. Would you like me to show them?")
Pattern 3: Step-by-Step Guidance
When asking questions in voice mode:
- Ask one question at a time
- Wait for the answer before proceeding
- Keep questions clear and concise
# Good - one question at a time
voicemode:converse("Would you like to use local or cloud TTS?", wait_for_response=True)
# Wait for answer...
voicemode:converse("Should I install Kokoro for you?", wait_for_response=True)
# Avoid - multiple questions bundled together
# This is overwhelming in voice conversations
Checking and Troubleshooting Setup
Check if everything is working:
# Check service status
voicemode:service("whisper", "status")
voicemode:service("kokoro", "status")
# If services aren't running, start them
voicemode:service("whisper", "start")
voicemode:service("kokoro", "start")
Using CLI for diagnostics:
# Check all dependencies
voicemode deps
# Diagnostic information
voicemode diag info
voicemode diag devices # List audio devices
voicemode diag registry # Show provider registry
# View service logs
voicemode whisper service logs
voicemode kokoro logs
Managing Voice Preferences
Voice Selection:
Available voices depend on your TTS provider:
OpenAI Voices: alloy, echo, fable, onyx, nova, shimmer
Kokoro Voices: Multiple voices (check with voicemode kokoro voices)
Configuration:
# View current configuration
voicemode config list
# Set default voice
voicemode config set VOICEMODE_TTS_VOICE nova
# Set default provider
voicemode config set VOICEMODE_TTS_PROVIDER kokoro
# Edit full configuration
voicemode config edit
Project and User Preferences:
- Project-level:
.voicemodefile in project root - User-level:
~/.voicemodefile in home directory - System config:
~/.voicemode/config/config.yaml
Provider Options
VoiceMode supports both cloud and local voice services. You can use either or both.
OpenAI API (Cloud)
If OPENAI_API_KEY is set, VoiceMode can use OpenAI's cloud services:
- STT: OpenAI Whisper API
- TTS: OpenAI voices (alloy, echo, fable, onyx, nova, shimmer)
This works without installing local services - just set the API key.
Local Services
For privacy, speed, and offline use, install local services:
| Service | Port | Purpose | |---------|------|---------| | Whisper | 2022 | Speech-to-text (STT) | | Kokoro | 8880 | Text-to-speech (TTS) |
Provider Priority
VoiceMode automatically selects providers based on availability:
- If local services are running, they're used by default (faster, private)
- If local services aren't available, falls back to OpenAI API (if key is set)
- You can override with
tts_providerandstt_providerparameters
Checking Provider Status
# Check what providers are available
voicemode diag registry
# Check specific service ports
nc -z localhost 2022 && echo "Whisper running" || echo "Whisper not running"
nc -z localhost 8880 && echo "Kokoro running" || echo "Kokoro not running"
Advanced Topics
Provider System Details
VoiceMode uses OpenAI-compatible endpoints for all services, enabling seamless switching between providers.
The system automatically:
- Discovers available providers
- Performs health checks
- Fails over to working providers
- Negotiates audio formats
Audio Processing
Requirements:
- FFmpeg for format conversion
- WebRTC VAD for voice activity detection
Supported Formats:
- PCM, MP3, WAV, FLAC, AAC, Opus
Configuration Options:
disable_silence_detection: Keep listening even during silencevad_aggressiveness: 0-3 (default: 2) - how strict voice detection islisten_duration_min: Minimum recording time before silence detection (default: 2.0s)speed: Speech rate 0.25-4.0 (default: 1.0)chime_enabled: Enable/disable audio feedback chimes
Batching Voice Announcements with Audio
When playing audio files, you can batch multiple announcements and playback commands. Tools execute sequentially within the batch:
# Batch announce-play sequences
voicemode:converse("Chapter 1 - Introduction", wait_for_response=False)
Bash(command="mpv --start=00:00 --length=3 song.mp3")
voicemode:converse("Chapter 2 - Main Theme", wait_for_response=False)
Bash(command="mpv --start=00:10 --length=5 song.mp3")
This creates natural narration with audio playback.
Environment Variables
Configure VoiceMode behavior:
VOICEMODE_TTS_VOICE: Default TTS voiceVOICEMODE_TTS_PROVIDER: Default TTS provider (openai, kokoro)VOICEMODE_STT_PROVIDER: Default STT providerVOICEMODE_AUDIO_FORMAT: Audio format preferenceVOICEMODE_DEBUG: Enable debug logging
Logging and Debugging
VoiceMode maintains logs in ~/.voicemode/:
Log Structure:
logs/conversations/: Daily conversation transcriptslogs/events/: Operational events and errorsaudio/: Saved audio recordingsconfig/: Configuration files
Enable Debug Mode:
# Via environment variable
export VOICEMODE_DEBUG=true
# Via CLI flag
voicemode converse --debug
# Via MCP parameter
voicemode:converse(message="Test", debug=True)
Quick Reference
Essential MCP Tool Calls
# Start conversation
voicemode:converse("Hello!")
# Speak without waiting
voicemode:converse("Working on it...", wait_for_response=False)
# Check service status
voicemode:service("whisper", "status")
voicemode:service("kokoro", "status")
# Start services
voicemode:service("whisper", "start")
voicemode:service("kokoro", "start")
# View logs
voicemode:service("whisper", "logs", lines=50)
Common CLI Commands
# Check status
voicemode whisper service status
voicemode kokoro status
# Start services
voicemode whisper service start
voicemode kokoro start
# View logs
voicemode whisper service logs
voicemode kokoro logs
# Configuration
voicemode config list
voicemode config set VOICEMODE_TTS_VOICE nova
voicemode config edit
# Diagnostics
voicemode deps
voicemode diag info
voicemode diag devices
Conversation History Search
VoiceMode logs all exchanges and provides powerful search capabilities to find and replay past conversations.
Load conversation history into SQLite:
# Load all new exchanges since last sync
voicemode history load
# Load all exchanges (ignore last sync)
voicemode history load --all
# Load from specific date
voicemode history load --since 2025-12-01
# Load last 7 days
voicemode history load --days 7
Search conversations:
# Full-text search
voicemode history search "minion indirectly"
# Search only agent speech (TTS)
voicemode history search --type tts "hello"
# Search only user speech (STT)
voicemode history search --type stt "hello"
# Search specific date
voicemode history search --date 2025-12-27 "keyword"
# Search and play first result automatically
voicemode history search --play "memorable quote"
# Limit results
voicemode history search --limit 50 "conversation"
Play audio clips:
# Play by exchange ID (from search results)
voicemode history play ex_abc123def456
Search Features:
- Full-text search using SQLite FTS5 (fast, supports complex queries)
- Filter by type (stt/tts), date, or conversation
- Audio files automatically resolved from timestamp
- Incremental loading - won't duplicate already-loaded exchanges
- All conversations stored in
~/.voicemode/cache/conversations.db
Use Cases:
- Find memorable moments or important discussions
- Review what was said in past conversations
- Create clips of agent responses for testing
- Debug conversation issues by reviewing exact exchanges
Token Efficiency Tip
When using CLI commands directly (not MCP tools), redirect STDERR to save tokens:
# Suppresses FFmpeg warnings and debug output
voicemode converse -m "Hello" 2>/dev/null
# Omit when debugging
voicemode converse -m "Hello" # Shows all diagnostic info
This only applies to Bash tool calls - MCP tools handle this automatically.
Best Practices
- Use parallel operations: Speak without waiting when narrating actions
- One question at a time: Don't bundle multiple questions in voice mode
- Check status first: Always verify services are running before starting conversations
- Let VoiceMode auto-select: Don't hardcode providers unless user has preference
- Use local services: Whisper and Kokoro provide privacy and speed
- Monitor logs: Check service logs when troubleshooting issues
- Set user preferences: Configure default voice and provider in
~/.voicemode
Integration Notes
- VoiceMode runs as an MCP server via stdio transport
- Compatible with Claude Code and other MCP clients
- Supports concurrent instances with audio playback management
- Works with tmux and terminal multiplexers
- Requires microphone access when listening for responses
Additional Resources
For detailed documentation:
- VoiceMode README: Installation and overview
docs/reference/: Complete API and parameter documentationdocs/tutorials/: Step-by-step guidesdocs/services/: Service-specific documentationdocs/testing/installer-testing.md: Installer testing guide for Tart VMs- VoiceMode CLAUDE.md: Project-specific Claude guidance
Troubleshooting
First conversation is slow or times out:
This is normal on first run - the services are downloading AI models:
- Check Whisper logs:
voicemode whisper service logs -f - Check Kokoro logs:
voicemode kokoro logs -f - Wait for downloads to complete (2-5 minutes total)
- Subsequent starts will be instant
Model not loading:
- Check disk space: Models need ~500MB for base+kokoro
- Verify model files exist:
ls -lh ~/.voicemode/services/whisper/models/ - Try reinstalling:
voicemode whisper install --model base
Services won't start:
- Check FFmpeg is installed:
ffmpeg -version - View service logs:
voicemode:service("whisper", "logs") - Try restart:
voicemode:service("whisper", "restart")
Audio quality issues:
- Check audio devices:
voicemode diag devices - Adjust VAD aggressiveness:
vad_aggressiveness=1(more permissive) - Review conversation logs in
~/.voicemode/logs/conversations/
Conversations not working:
- Verify services are running:
voicemode:service("whisper", "status") - Check provider registry:
voicemode diag registry - Enable debug mode to see detailed logs
- Ensure microphone permissions are granted
Configuration issues:
- List current config:
voicemode config list - Check for environment variable conflicts
- Review config file:
~/.voicemode/config/config.yaml - Reset to defaults: Remove config file and restart services
Installation
/plugin marketplace add mbailey/voicemode/plugin install voicemode@voicemode