I’m building a real-time speech-to-text application for Android that must run completely offline and handle medical vocabulary accurately. In short, I need Whisper—or an equivalently powerful model—ported and optimised so an Android tablet can transcribe spoken consultations on the fly, without sending any audio to the cloud. Here is what I’m aiming for: • Real-time streaming from the device microphone, with latency low enough to follow natural conversation. • Robust recognition of medical terminology, drug names, and common clinical phrases. • 100 % offline operation: model, inference, and punctuation all on device. • A clean Android front-end demonstrating live captions (Kotlin is preferred, but Java is fine) plus a lightweight settings panel for language/model selection. • Build instructions and source so I can reproduce the APK and tune the model size later. Key technical expectations – Whisper, Vosk, or another transformer-based engine compiled for ARM64 with on-device quantisation to keep memory usage manageable. – Efficient audio streaming pipeline (e.g., Oboe/NDK) feeding the inference thread without dropped frames. – Clear README covering model download, conversion, and licensing considerations. Acceptance will be based on a demo APK running on a Snapdragon-class tablet, showing live, accurate medical transcription while the device is in airplane mode. After that, I’ll run my own test cases with extended medical dialogues to confirm the model’s coverage. If you’ve already worked with Whisper, PyTorch Mobile, TensorFlow Lite, or similar on Android, I’d love to see an example. Let’s discuss milestones and get this prototype running.