I have a Unity project already wired up with the ML-Agents toolkit and a working simulation; now I need an experienced hand to get a Proximal Policy Optimization (PPO) agent training smoothly and converging to a reliable policy. The environment is fully operational, but reward shaping, observation design, and hyper-parameter tuning still need attention. Your job is to guide or implement the complete training pipeline inside Unity ML-Agents: set up the YAML trainer configuration, launch and monitor runs, diagnose instability or divergence, and iterate until the agent consistently reaches the performance target I will share at project start. Experience with TensorFlow/PyTorch back-ends within ML-Agents, tensorboard logging, and curriculum learning will be valuable because I want transparent metrics and reproducible results. Deliverables • Updated ML-Agents configuration files and any supporting scripts • A trained PPO model (.onnx) that meets the agreed-upon reward threshold in the live environment • A concise write-up (or annotated notebook) describing key settings, training duration, and how to reproduce the results on my machine Please be comfortable screensharing or documenting each step so I can maintain and extend the setup after hand-off.