A public-sector dataset that combines relational tables, PDFs, and free-text notes is ready for a predictive modeling pipeline. The scope covers the full journey: cleaning mixed-format inputs, engineering features that blend structured fields with NLP-derived signals, training and tuning one or more models, and packaging the final solution so it can be reproduced and deployed. Python with libraries such as pandas, scikit-learn, TensorFlow / PyTorch is preferred, but I am open to alternatives if they better suit the data. Clear notebooks or scripts, comments, and a short report explaining model choices, metrics, and limitations form the core deliverables. When we are done I want to be able to run a single command, point it at fresh data, and receive the prediction output along with a summary of key performance statistics.