This repository focuses on NLP (Natural Language Processing) using the Python Language.
News-Classifier
notebook focus on categorizing news articles by processing the body of the content.
You can tweak the code a bit to improve the accuracy of the predictions by combining both Body and Title for processing.
First clone the repository.
git clone https://github.com/DevDHera/Guide-to-NLP-with-Python.git
Now open the juputer notebook and classify news articles to your choice.
All the data sets are included inside dataset
directory.
Following are some of the packages we use to build our classifier.
- nltk - Stopwords, Stemming
- pandas - To read TSV, create dataframes
- matplotlib, seaborn - Data visualizations
- string - To remove punctuations
- sklearn - CountVectorizer, TfidfTransformer, MultinomialNB, Pipeline
We import the data into notebook like below.
news = pd.read_csv('dataset/trainset.txt', sep='\t', names=['CLASS', 'TITLE', 'DATE', 'BODY'])
Also, we use pipelines to make our life easier 😴.
pipeline = Pipeline([
('bow', CountVectorizer(analyzer=text_process)),
('tfidf', TfidfTransformer()),
('classifier', MultinomialNB())
])
Improve the classifier and ❤️ share the knowledge 💐 😊