AI Real-Time Interpreting Platform - Audio and Video

Customer: AI | Published: 05.12.2025
Бюджет: 10000 $

I am building a platform that takes live or pre-recorded video and delivers a second-by-second translation overlay so viewers hear and/or read the content instantly in another language. The goal is seamless real-time interpretation—think keynote streams, online classes, or multi-lingual meetings—without noticeable lag. Core needs • A pipeline that accepts common video inputs (RTMP, WebRTC, MP4) • Speech recognition, machine translation, and speech-synthesis modules chained together with latency consistently under two seconds • Dynamic caption generation that can be burned into the video or delivered as a separate subtitle track • Modular language models so new language pairs can be plugged in quickly; the initial pair will be decided together during discovery Tech flexibility I am open to whichever stack—Python, Node, Rust—best meets the latency target, but please be comfortable working with tools such as Whisper, DeepL, Google Cloud Speech-to-Text or equivalent, plus FFmpeg and media servers for routing. Acceptance criteria 1. Live demo translating a sample video stream end-to-end in real time 2. Translation accuracy ≥ 85 % on a 10-minute test clip I provide 3. Clear deployment instructions (Docker or similar) and API documentation If you have shipped low-latency audio/video or NLP products before, I would love to see them.