Skip to content

Commit

Permalink
fix typo
Browse files Browse the repository at this point in the history
  • Loading branch information
Carolyn Liu authored and Carolyn Liu committed May 23, 2023
1 parent 4bc1006 commit cf84bd3
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@ The project is divided into two main parts and used PySpark run on Google Colabo
1. Data cleaning
1. Tokenize text
2. Clean & normalize tokens: remove stop words, punctuation, make all text lowercase and lemmatize words (extracting base words, e.g., "running" becomes "run")
2. Convert text into matrix format: convert preprocessed tokens to a numerical format models can take in using a count vectorizer which takes in n-grams from the corpus and counts the number of instances that n-gram is seen in the example
2. Embed text: convert preprocessed tokens to a numerical format models can take in using a count vectorizer which takes in n-grams from the corpus and counts the number of instances that n-gram is seen in the example

2. Build a machine learning pipeline to obtain predictions (each notebook performs requisite text cleaning and preprocessing before applying models)
1. Build and tune two models, logistic regression and gradient boosted trees, to predict whether an article is from an unreliable source (fake)
1. Build and tune two models, logistic regression and gradient boosted tree, to predict whether an article is unreliable (fake)
* Code: [fake_news_prediction.ipynb](https://github.com/macs30123-s23/final-project-fake_news/blob/main/fake_news_prediction.ipynb)
2. Perform Linear discriminant analysis (LDA) topic modeling to analyze which topics are more likely to be manipulated into fake news.
* Code: [lda.ipynb](https://github.com/macs30123-s23/final-project-fake_news/blob/main/lda.ipynb)
Expand Down

0 comments on commit cf84bd3

Please sign in to comment.