NLP Text Classification and Prediction using Logistic Regression

This repository contains an implementation of text classification and prediction using Logistic Regression. The project applies Natural Language Processing (NLP) techniques to process text data and train a model for classification tasks.

Features

Implements text classification using Logistic Regression.
Performs text prediction using n-gram models.
Utilizes scikit-learn and NLTK for text processing and model training.
Processes text using tokenization, vectorization, and feature extraction.
Evaluates model performance using accuracy, precision, and recall metrics.

Dataset

The project works with a labeled text dataset containing:

Text samples for training and testing.
Preprocessing steps include tokenization, stopword removal, and vectorization.

Model Architectures

Text Classification Model

TF-IDF Vectorization: Converts text data into numerical feature representations.
Logistic Regression Classifier: A simple yet effective model for binary or multi-class classification.
Probability Estimation: Uses the sigmoid function to predict class probabilities.

Text Prediction Model

N-gram Language Model: Uses previous words to predict the next word.
Feature Extraction: Computes frequency-based features for prediction.
Logistic Regression: Predicts the most probable next word in a sequence.

Training Process

Data Preprocessing:
- Tokenization and text cleaning.
- Vectorization using TF-IDF or CountVectorizer.
Model Training:
- Logistic Regression trained with cross-entropy loss.
- Optimization using Stochastic Gradient Descent (SGD) or other solvers.
Validation & Evaluation:
- Accuracy, precision, recall, and F1-score for classification tasks.
- Sample text predictions and evaluation of model accuracy.

Evaluation

Accuracy: Measures how well the classifier performs.
Precision & Recall: Evaluates classification performance in imbalanced datasets.
Confusion Matrix: Visualizes model performance on test data.

Usage

The text classification model predicts the category of input text.
The text prediction model generates the next word in a given sequence using n-grams and Logistic Regression.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
NLP_Text_Classification_and_Prediction_Logistic_Regression.ipynb		NLP_Text_Classification_and_Prediction_Logistic_Regression.ipynb
README.md		README.md
factual_data.json		factual_data.json
factuality_annotations_xsum_summaries.csv		factuality_annotations_xsum_summaries.csv
non_factual_data.json		non_factual_data.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Text Classification and Prediction using Logistic Regression

Features

Dataset

Model Architectures

Text Classification Model

Text Prediction Model

Training Process

Evaluation

Usage

About

Releases

Packages

Languages

License

muhammadsaadx/Text-Classification-and-Prediction-Using-NLP-and-Logistic-Regression

Folders and files

Latest commit

History

Repository files navigation

NLP Text Classification and Prediction using Logistic Regression

Features

Dataset

Model Architectures

Text Classification Model

Text Prediction Model

Training Process

Evaluation

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages