Interactive AI

SIMO Avatar

Vision-aware interactive AI avatar with voice, emotion detection, and gesture recognition. Built for Dubai AI Summit.

About the Project

SIMO is a real-time interactive AI avatar that sees, hears, thinks, and speaks. Built for a Dubai AI Summit demo, it uses GPT-4 Vision and MediaPipe to detect faces, read emotions, and recognize hand gestures. When someone approaches, it auto-greets them. It holds natural voice conversations using Whisper for speech-to-text, GPT-4o-mini for responses, and ElevenLabs for voice synthesis, all synced with HeyGen avatar lip movements.

Key Features

GPT-4 Vision for real-time scene understanding, face detection, and badge reading
MediaPipe + face-api.js for gesture recognition and emotion detection
Real-time voice conversation via Whisper STT + ElevenLabs TTS + HeyGen lip sync
Auto-greets visitors, reacts to hand waves and thumbs up
Multi-language support: English, Arabic, French
WebSocket-based real-time communication pipeline

Impact

Demo'd at Dubai AI Summit. Demonstrates real-time multi-modal AI interaction combining vision, voice, and natural language in a single coherent experience.

Tech Stack

TypeScriptReactExpressWebSocketGPT-4 VisionWhisperElevenLabsHeyGenMediaPipe

Metrics

Real-time voice + vision

Emotion detection

Gesture recognition

Multi-language (EN/AR/FR)

Links

Live Demo Source Code

Interested in this project?

Let's discuss how I can build something similar for you.

All Projects Get in Touch