Skip to content
This repository was archived by the owner on Aug 13, 2020. It is now read-only.

luispintoc/Mini-project2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mini-project2

In order to run these scripts, the positive-words and negative-words textfiles should be in the same folder. Also, the packages used are scikit-learn, nltk and pattern. Moreover, to download one of the lexicons use nltk.download('movie_reviews') (only needed in the Naive Bayes Model)

LinearSVC_gridsearch.py

  • *Task: Gets the k best features using GridSearchCV and SelectKBest(chi2)
  • Preprocessing: Removes punctuation, replaces special characters with spaces
  • Features: Uni-grams, bi-grams and tri-grams using tf-idf vectorization
  • Validation Pipeline: 4-fold cross validation
  • Classifiers: Linear SVC

LogRegression_gridsearch.py

  • *Task: Gets the k best features using GridSearchCV and SelectKBest(chi2)
  • Preprocessing: Removes punctuation, replaces special characters with spaces
  • Features: Uni-grams and bi-grams using tf-idf vectorization
  • Validation Pipeline: 4-fold cross validation
  • Classifiers: Logistic Regression

RandomForest_gridsearch.py

  • *Task: Gets the k best features using GridSearchCV and SelectKBest(chi2)
  • Preprocessing: Removes punctuation, replaces special characters with spaces
  • Features: Uni-grams, bi-grams and tri-grams using tf-idf vectorization
  • Validation Pipeline: 4-fold cross validation
  • Classifiers: Random Forest

Log+SVC_gridsearch.py

  • *Task: Gets the k best features using GridSearchCV and SelectKBest(chi2)
  • Preprocessing: Removes punctuation, replaces special characters with spaces
  • Features: Uni-grams, bi-grams and tri-grams using tf-idf vectorization and also uni-grams using count vectorization and the Bing Liu's opinion lexicon
  • Validation Pipeline: 4-fold cross validation
  • Classifiers: Ensemble of Logistic Regression and Linear SVC

SVC+Log+Rf_gridsearch.py

  • *Task: Gets the k best features using GridSearchCV and SelectKBest(chi2)
  • Preprocessing: Removes punctuation, replaces special characters with spaces
  • Features: Uni-grams, bi-grams and tri-grams using tf-idf vectorization and also uni-grams using count vectorization and the Bing Liu's opinion lexicon
  • Validation Pipeline: 4-fold cross validation
  • Classifiers: Ensemble of Logistic Regression, Linear SVC and Random Forest

Standard_algorithm_printCSV.py

  • Outputs a CSV file to submit on Kaggle evaluating the model on the test set
  • Only works for Linear SVC, Logistic Regression and Random Forest (uncomment to test each classifier)

Ensemble_algorithm_printCSV.py

  • Outputs a CSV file to submit on Kaggle evaluating the model on the test set
  • Only works for the two ensembles mentioned in the write-up (uncomment to test each classifier)

bin_naive_bayes.py *Implements binary naive bayes using Bing Liu's opinion lexicon (both positive and negative words) and nltk.corpus list of words from movie reviews as features *Outputs accuracy, error rate, precision, recall, and specificity.

bin_naive_bayes_bl.py *Implements binary naive bayes using Bing Liu's opinion lexicon (both positive and negative words) as features *Outputs accuracy, error rate, precision, recall, and specificity.

bin_naive_bayes_pos.py *Implements binary naive bayes using Bing Liu's opinion lexicon (only positive words) as features *Outputs accuracy, error rate, precision, recall, and specificity.

bin_naive_bayes_neg.py *Implements binary naive bayes using Bing Liu's opinion lexicon (only negative words) as features *Outputs accuracy, error rate, precision, recall, and specificity.

make_plot.py *Plots accuracy of binary naive bayes for different feature sets.

submission_kaggle.csv *Empty CSV where the output of Standard_algorithm_printCSV or Ensemble_algorithm_printCSV will be written.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages