Czech TTS Model Fine-Tuning

Заказчик: AI | Опубликовано: 28.01.2026

## Job Title: Expert Audit of Whisper Large-v3 LoRA Script – Code Review & Bug Fixes ### Job Description: I have developed a custom Python script for fine-tuning **Whisper Large-v3** using **LoRA (PEFT)** on the Czech Common Voice dataset. While the script runs, I am facing specific technical debt and logic issues that require a senior-level ML Engineer. **STRICT REQUIREMENT:** I am NOT looking for a new script or a "black-box" solution. I want to work with **my existing codebase**. The goal is to identify, explain, and fix the bugs within my logic so I can understand the underlying issues and continue developing this specific script. ### Current Challenges to Solve: 1. **Persistent Attention Mask Warning:** *"The attention mask is not set and cannot be inferred from input because pad token is same as eos token."* I have attempted several fixes (modifying `model.config`, `generation_config`, and `DataCollator`), but the warning persists. I need a definitive fix within my code's structure. 2. **Subjective Quality Degradation:** My validation logs show improving WER (Word Error Rate), but subjective inference results are getting worse (hallucinations, punctuation loss, or repetitive loops) than base model. I need an audit of my training hyperparameters and data preparation logic. 3. **Catastrophic Forgetting:** Advice on how to tune LoRA (rank, alpha, target modules) to preserve the robust pre-trained capabilities of Large-v3 while adapting to Czech. ### Your Role: * **Deep Dive Code Review:** Go through my script line-by-line. * **Bug Identification:** Tell me exactly where my implementation of `DataCollator`, `Attention Mask`, or `Padding` fails. * **Knowledge Transfer:** Explain the fixes clearly so I can avoid these pitfalls in future iterations. * **Optimization:** Suggest surgical improvements to my existing training loop and inference setup (temperature, beam search, etc.). ### Requirements: * Extensive experience with **OpenAI Whisper** and **Hugging Face Transformers**. * Mastery of **PEFT / LoRA** architectures. * Ability to debug complex Transformer-based training pipelines. * Patience and communication skills to explain "the why" behind the fixes.