Voice AI
Sarathi Voice Agent
Production voice assistant for NE India in Assamese, Bodo & Hindi. Local STT/TTS on a $24/month server.
About the Project
Sarathi Voice Agent is a production voice assistant that helps citizens in Northeast India navigate government services in their native languages. It runs speech-to-text and text-to-speech models locally (no cloud APIs) because the languages it supports (Assamese, Bodo) aren't well-served by major cloud providers. Cross-lingual RAG allows regional language queries to match English knowledge base documents without a translation step.
Key Features
- Local STT (IndicConformer, 0.5s) and TTS (VITS ONNX) -- no cloud speech APIs
- Cross-lingual RAG: Assamese/Bodo/Hindi queries match English documents
- Streaming audio via NDJSON -- text and audio play in parallel
- Circuit breaker fallback: Pinecone to ChromaDB
- Runs on a $24/month DigitalOcean server (4 vCPU, 8GB RAM)
- Live at regional-agent.sarathi.studio serving real citizens
Impact
Live in production, serving real citizens. Demonstrates that meaningful AI can run on commodity hardware for underserved populations.
Tech Stack
FastAPIONNX RuntimeIndicConformerVITS TTSPineconeChromaDBNext.js
Metrics
Live in production
3 languages
Local STT/TTS
$24/month server
Cross-lingual RAG
Interested in this project?
Let's discuss how I can build something similar for you.