geekyroshan
Back to Projects
SIMO Avatar - Vision-Aware AI Avatar
Interactive AI

SIMO Avatar

Vision-aware interactive AI avatar with voice, emotion detection, and gesture recognition. Built for Dubai AI Summit.

About the Project

SIMO is a real-time interactive AI avatar that sees, hears, thinks, and speaks. Built for a Dubai AI Summit demo, it uses GPT-4 Vision and MediaPipe to detect faces, read emotions, and recognize hand gestures. When someone approaches, it auto-greets them. It holds natural voice conversations using Whisper for speech-to-text, GPT-4o-mini for responses, and ElevenLabs for voice synthesis, all synced with HeyGen avatar lip movements.

Key Features

  • GPT-4 Vision for real-time scene understanding, face detection, and badge reading
  • MediaPipe + face-api.js for gesture recognition and emotion detection
  • Real-time voice conversation via Whisper STT + ElevenLabs TTS + HeyGen lip sync
  • Auto-greets visitors, reacts to hand waves and thumbs up
  • Multi-language support: English, Arabic, French
  • WebSocket-based real-time communication pipeline

Impact

Demo'd at Dubai AI Summit. Demonstrates real-time multi-modal AI interaction combining vision, voice, and natural language in a single coherent experience.

Tech Stack

TypeScriptReactExpressWebSocketGPT-4 VisionWhisperElevenLabsHeyGenMediaPipe

Metrics

Real-time voice + vision
Emotion detection
Gesture recognition
Multi-language (EN/AR/FR)

Interested in this project?

Let's discuss how I can build something similar for you.