I have a Vision Transformer + CNN hybrid that classifies prostate cancer on histopathological slides. Even after adding dropout layers, aggressive data augmentation, and standard regularization, training accuracy hovers in the 70 – 85 % range while validation lags notably, so the model is still over-fitting. The task is to diagnose why the gap persists, adjust the architecture or training pipeline, and push validation accuracy closer to the training score without sacrificing recall on minority classes. You will receive the current PyTorch codebase, preprocessing scripts, and a representative slide subset to reproduce the issue. Typical avenues might include fine-tuning ViT blocks, stain normalization, mixup/cutmix, smarter learning-rate schedules, class-balanced sampling, or loss-function tweaks, but I’m open to any evidence-based approach you favour. Deliverables • Refactored code (clearly commented) • A training run that beats the present benchmark and shows minimal divergence between train/val curves • Brief report outlining the changes, hyper-parameters, and the resulting metrics I’ll handle full-dataset re-training once your solution proves its generalisation on the provided validation split.