Data Engineering
Hyves WhatsApp Scraper
Real-time WhatsApp group scraper for luxury watch marketplace. Live on DigitalOcean.
About the Project
A production data pipeline that captures messages from WhatsApp watch trading groups in real-time, deduplicates them, enriches with contact metadata, and stores to Google Sheets. Built with production-grade patterns: circuit breaker for resilience, dedup to prevent duplicates, rate limiting to respect API quotas, and dead letter queue for error recovery.
Key Features
- Real-time WhatsApp message capture via WAHA webhooks
- Circuit breaker pattern for production resilience
- SQLite deduplication with LID resolution for participant tracking
- Rate-limited batch writes to Google Sheets API
- NOWEB engine for stable WhatsApp connection
- Live and deployed on DigitalOcean
Impact
Live production pipeline enabling market intelligence from WhatsApp communities. The boring reliability engineering that makes data pipelines not break at 3am.
Tech Stack
FastAPIWAHAGoogle Sheets APISQLiteDockerNext.js
Metrics
Live on DigitalOcean
Real-time webhooks
Circuit breaker
Dedup pipeline
Links
Source CodeInterested in this project?
Let's discuss how I can build something similar for you.