GitHub - R-Mahesh45/HR---Resume-Text-Classification: Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.

Step-by-Step Explanation 🌟

1. Data Preparation 📋

Objective: Load the dataset and prepare it for analysis.
Actions:
- Create a sample DataFrame containing document information such as file names, text content, and categories.
- This DataFrame will be used for further processing.

2. Text Preprocessing 🧹

Objective: Clean and transform the raw text data into a format suitable for analysis.
Actions:
- Convert to Lowercase: Convert all text to lowercase to ensure uniformity.
- Tokenize: Split the text into individual words.
- Remove Stopwords: Remove common words (e.g., "and", "the") that do not carry significant meaning.
- Stemming: Reduce words to their root forms (e.g., "running" to "run").
- Lemmatization: Further reduce words to their base forms (e.g., "better" to "good").

3. Feature Extraction 🔍

Objective: Convert the processed text data into numerical features using TF-IDF.
Actions:
- TF-IDF Vectorization: Transform the text data into a matrix of TF-IDF features. TF-IDF (Term Frequency-Inverse Document Frequency) captures the importance of each word in the document relative to the entire corpus.

4. Encode the Target Variable 🔢

Objective: Convert the categorical target variable (document category) into numerical values.
Actions:
- Use LabelEncoder to encode the categories into numerical values.

5. Split the Data ✂️

Objective: Split the dataset into training and testing sets for model evaluation.
Actions:
- Use train_test_split to split the data into training (80%) and testing (20%) sets.

6. Model Training 🧠

Objective: Train a machine learning model on the training data.
Actions:
- Model Selection: Use a Logistic Regression model.
- Training: Fit the model on the training data.

7. Model Evaluation 📊

Objective: Evaluate the model's performance on both the training and testing sets.
Actions:
- Predictions: Make predictions on the training and testing sets.
- Accuracy: Calculate the accuracy of the model on both sets.
- Classification Report: Generate a classification report to provide detailed metrics (precision, recall, F1-score) for each class.
- Bar Plot: Visualize the accuracy on the training and testing sets using a bar plot.

8. Create Word Cloud ☁️

Objective: Visualize the most important words in the dataset based on their TF-IDF scores.
Actions:
- TF-IDF Means: Calculate the average TF-IDF scores for each feature (word).
- Word Cloud Generation: Generate a word cloud where the size of each word indicates its importance (TF-IDF score).

Summary 🌟

This step-by-step explanation guides you through the process of preparing text data, extracting features, training and evaluating a machine learning model, and visualizing important words using a word cloud. This comprehensive approach provides insights into the importance of different words in your dataset and helps you understand the performance of your classification model.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Clean_resumes.csv		Clean_resumes.csv
HR-Resume Project.ipynb		HR-Resume Project.ipynb
MODEL BUILDING ppt.pptx		MODEL BUILDING ppt.pptx
Model.pkl		Model.pkl
README.md		README.md
classification_model.ipynb		classification_model.ipynb
deployment.mp4		deployment.mp4
text_1.py		text_1.py
vect.pkl		vect.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Step-by-Step Explanation 🌟

1. Data Preparation 📋

2. Text Preprocessing 🧹

3. Feature Extraction 🔍

4. Encode the Target Variable 🔢

5. Split the Data ✂️

6. Model Training 🧠

7. Model Evaluation 📊

8. Create Word Cloud ☁️

Summary 🌟

About

Releases

Packages

Languages

R-Mahesh45/HR---Resume-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Step-by-Step Explanation 🌟

1. Data Preparation 📋

2. Text Preprocessing 🧹

3. Feature Extraction 🔍

4. Encode the Target Variable 🔢

5. Split the Data ✂️

6. Model Training 🧠

7. Model Evaluation 📊

8. Create Word Cloud ☁️

Summary 🌟

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages