IMDB Review Sentiment Classification

Бюджет: 30 $

I’m sitting on roughly 10 k–50 k raw IMDB movie reviews and need a clean, reproducible sentiment-analysis pipeline that lets me compare classical machine-learning approaches with a deep-learning baseline. Here is the workflow I expect: • Pre-process and tokenise the text, then generate both TF-IDF vectors and Word2Vec embeddings. • Train Logistic Regression, Multinomial Naïve Bayes, SVM, Random Forest and AdaBoost on those features. • Build an LSTM network for sequential modelling. • Use an 80 % / 20 % train-test split throughout for a fair head-to-head. • Report accuracy, precision, recall, F1 and confusion matrices for every model, plus remarks on training time and memory footprint. • Summarise the trade-offs you observe between engineered features and the neural network. Deliverables 1. A single, well-documented Jupyter notebook (Python) that executes end-to-end on my machine. 2. requirements.txt (or environment.yml) listing all libraries—scikit-learn, gensim, TensorFlow/Keras, etc.—needed to reproduce the results. 3. A short write-up (markdown inside the notebook is fine) interpreting the metrics and highlighting where each approach shines or falls short. Acceptance criteria • Notebook runs without modification. • LSTM surpasses at least one classical model on F1 score. • Every metric requested is clearly printed and, where helpful, plotted. If anything is unclear, flag it before diving in; otherwise, I look forward to seeing your code and insights.

Python

Реєстрація