Need an AI specialist: deployment of a fast local voice assistant (STT + Local LLM + TTS)

Бюджет: 880 $

Project: Web Panel for AI Outbound Calling with Dynamic Agent Configuration Core Concept: Develop a fully functional web application to manage outbound calls powered by an AI agent. The system is based on a local LLM (Llama, Deepseek, Gemma) and must feature a configuration panel to tailor the agent's behavior per call (voice, language, prompt), a lead management module, and detailed call analytics. Key Quality Requirements: Low latency under 800ms and natural, human-like speech with appropriate pacing and pauses. Core Modules: 1. Agent Configuration Panel (Web UI) Allows users to select the following before a call: - Languages: EN, DE, ES, NL (determines available voices and transcription accuracy) - STT Model: Choose transcription engine (Deepgram / Cartesia / Gemini) - TTS Provider & Model: Choose synthesis backend (Cartesia / Deepgram / ElevenLabs) - Voice Selection: Select specific voice to define tone and style - Silence Timeout: Set delay before re-prompting/call end (Default 30s) - First Message Mode: Toggle between Bot Speaks First or Wait for User - Background Noise: Add ambient sound (office, call center) for realism - Prompt & Context: Field for custom LLM prompts (full conversation flow) - Support for uploading example dialogues for few-shot learning + export for learning/feeding model 2. Lead & Call Management (Web UI) - Upload and delete contact lists (CSV or manual entry) - Real-time call controls in browser: Start, Pause, Stop - Automatic call recording linked to each lead 3. Reporting & Analytics Per call data includes: - AI-generated call summary - Call duration - Full audio recording - Translated transcript (English translation of the conversation) 4. Integrations & Telephony - WebRTC calling direct from browser - Integration with external SIP trunks (IP&IP SIP BASED)and Asterisk 5. Technical Requirements - End-to-end latency must be 800ms or less - Telegram notifications for call start, end, and results delivery - Server recommendation and setup guidance to meet performance targets Tech Stack Preferred: - Backend: Python (FastAPI / Django / Flask) - Frontend: React, Vue, or core HTML/JS - AI: - Local LLM as the core reasoning engine (Llama, Deepseek, Gemma) – developer must select and optimize the most suitable model for speed and quality. - Cloud APIs for low-latency STT/TTS (Deepgram, Cartesia, Gemini, ElevenLabs) to ensure performance. Ideal Candidate: An experienced full-stack developer with expertise in orchestrating complex voice pipelines and the ability to rightly choose the most optimal, fastest, and most cost-effective models for each component (STT, local LLM, TTS) based on specific use cases and requirements. Start: as soon as possible (ASAP) Fixed budget: $1000 (motivated budget increase possible) с фул сорсами Long-term Cooperation: We are also considering candidates who would be available for paid ongoing support and future project enhancements after the initial MVP is delivered. Please include in your proposal: Links or descriptions of similar past work (AI calling, voice bots) Confirmation that you can independently choose and justify LLM + STT + TTS Deadline by which you can provide a working pipeline with latency ≤ 800ms communication languages: UA RUEN *The LLM names listed are just examples from my experience. If you know better, faster, or cheaper solutions for this task, feel free to suggest them. We're looking for a motivated candidate for long-term collaboration with appropriate financial reward.

Python

Реєстрація