Learning how to process text is a skill required for Data Scientists. In this project, you will put these skills into practice to identify whether a news headline is real or fake news.
In the file dataset/data.csv
you will find a dataset containing news headlines and their tags:
- 0, the headline is fake news.
- 1, the headline is real news.
Your goal is to build a classifier that is able to distinguish between the two.
You will need to split the dataset into training and testing sets. Use the training set to build your classifier and then use the testing set to evaluate its performance.
Like in a real life scenario, you are able to make your own choices and text treatment. Use the techniques you have learned and the common packages to process this data and classify the text.
- Python Code: Provide well-documented Python code that conducts the analysis.
- Predictions: A csv file with the predicted labels (0 or 1) for the test set.
- Accuracy estimation: Provide the teacher with your estimation of how your model will perform.
- Presentation: You will present your model in a 10-minute presentation. Your teacher will provide further instructions.