Ensemble Churn Classification Models

Бюджет: 250 $

I have a telecom-style churn dataset that needs a full machine-learning pipeline, from preprocessing through model comparison. The raw file contains some missing values, several categorical columns, and features with different scales. Here is what I need done: • Data preparation – impute the gaps, one-hot (or target) encode the categoricals, and standardize or normalize the numeric features so every algorithm receives a clean, comparable matrix. • Class-imbalance strategy – any sensible approach works (SMOTE, class weighting, etc.) as long as minority churn cases are properly represented during training and validation. • Model building – train three ensemble learners: Random Forest, Gradient Boosting, and AdaBoost. For each, run a systematic hyper-parameter search (grid, randomized, or Bayesian) with cross-validation to squeeze out the best performance. • Evaluation – report Accuracy, Precision, Recall, and ROC-AUC on a separate hold-out test set. Please include the confusion matrix and ROC curves to make the performance story visually clear. • Comparison & recommendation – summarise which model you would deploy in production and why, backed by the metrics above. Deliverables: 1. Clean, well-commented Python notebook or .py script (scikit-learn, imbalanced-learn, pandas, NumPy, matplotlib/seaborn). 2. A brief PDF/Markdown summary highlighting preprocessing choices, tuned parameters, and metric tables. 3. Saved model objects or joblib files so I can reproduce the results quickly. I’ll consider the job complete when I can rerun the notebook end-to-end, reproduce the metrics on my machine, and clearly see which ensemble wins the churn challenge.

Python

Реєстрація