Predict Car Accident Risk

Заказчик: AI | Опубликовано: 18.01.2026

I need a complete Python workflow that forecasts the likelihood of road accidents under different conditions. I will be working in PyCharm and expect the code to rely on Scikit-learn for the classical algorithms and XGBoost for gradient boosting, with Seaborn for exploratory visuals. Data strategy • Accident records: start with reputable, publicly available crash datasets. • Weather: scrape real-time and historical conditions so each accident row carries temperature, rain, visibility, etc. • Traffic: enrich the set with crowdsourced density figures (e.g., Waze, TomTom, or similar feeds). Core tasks 1. Clean and merge the three data streams, handling missing values responsibly. 2. Engineer features around time of day, road type, weather categories, and traffic congestion levels. 3. Benchmark Logistic Regression, Random Forest, and XGBoost, then tune hyper-parameters for the best performer. 4. Report precision, recall, F1-score, and plot the confusion matrix; the chosen model should reach a solid accuracy uplift over a naïve baseline. Deliverables • Well-commented Jupyter notebook(s) or .py scripts. • A brief markdown report that explains data sources, preprocessing steps, the final model’s metrics, and any trade-offs. • All scraping utilities and a requirements.txt so I can reproduce results on my side. Acceptance I’ll consider the job complete when I can rerun the pipeline, generate the evaluation figures, and see metrics matching those in your report. Feel free to suggest improvements or additional external data if you believe they can push performance even further.