Topic and Polarity Classification of Dutch news related to the Corona virus outbreak.
Input Data: Google News from USA related to COVID-19 outbreak from the topics of Healthcare, Science, Economy, and Travel. Google News gives the possibility to filter the news based on the country, COVID-19, and topic.
Main goal: Classify the news based on the topic ( Healthcare, Science, Economy, and Travel) and on the polarity (positive, negative, neutral).
End result: Deploy a dashboard which shows for each newsitem the detected topic and polarity labels.
-
Scraping news from the Google News.
We scraped the news from 18-04-2020 until 10-05-2020, the period of the coronavirus outbreak. -
Topic Classification
For having better results, we used an ensemble model, which combines the results of two Machine Learning algorithms.
a) Logistic Regression
b) FastText -
Polarity Clasification with VADER algorithm
-
Dashboard, which displays the news with their labels
Clone the repository and run main.py
Input Data:
• Corona-ScrapedData: folder that contains the scraped data and the Python scripts used to scrape the data from Google News
Main implementation:
• preprocessing.py: reads and preprocesses the scraped data
• main.py : the main function which reads the training data, trains the models for topic and polarity classification, and predicts the labels for unknown newsitems
• simple_text_classification.py: implements TFIDF and Logistic Regression training and prediction
• FastText.py: implements the FastText algorith for Topic Classification.
Training: It uses the Google News with the topic categories (Healthcare, Science, Economy, and Travel)
Prediction: Given a newsitem it predicts its label (Healthcare, Science, Economy, and Travel)
• polarity_analysis.py: implements the polarity classification algorithm (VADER). It uses a rule-based technique
Prediction: Given a newsitem it predicts its label (positive, negative, neutral)
Results:
• topic_classification_predictions.csv: prediction results of topic classification
• polarity_predictions.csv : prediction results of polarity classification