- Overview
- Features
- Requirements
- Installation
- Usage
- Explanation of Key Components
- Data Availability
- File Structure
- Contributing
- License
- Acknowledgements
This project is a Job Recommendation System built using Streamlit, allowing users to upload their CVs (in PDF format) and receive tailored job recommendations based on the contents of their CV. The system processes the CV text, compares it with job descriptions, and uses several machine learning techniques to recommend the most relevant job openings.
- PDF CV Upload: Users can upload their CVs in PDF format.
- Text Preprocessing: The system tokenizes, removes stop words, and lemmatizes the text from both the CV and job descriptions.
- Machine Learning Models: Utilizes TF-IDF Vectorizer, CountVectorizer, and K-Nearest Neighbors (KNN) to compute similarity scores and recommend jobs.
- Customizable Recommendations: Users can select the number of job recommendations they wish to receive.
- Interactive UI: Built with Streamlit for an intuitive user experience.
- Python 3.8+
- Streamlit 1.14.0
- Pandas
- pdfplumber
- NLTK
- Scikit-learn
- NumPy
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/thisisvk45/Job-Recommendation-System.git cd Job-Recommendation-System
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Download NLTK resources:
import nltk nltk.download('punkt') nltk.download('stopwords') nltk.download('wordnet')
To run the job recommendation system, execute the following command in your terminal:
streamlit run candidate_app.py
This will start a local server. Open your browser and navigate to http://localhost:8501
to interact with the application.
- PDF Extraction: The
pdfplumber
library is used to extract text from PDF files. - Text Preprocessing: The text is tokenized, stop words are removed, and lemmatization is performed using NLTK.
- Similarity Calculation: TF-IDF Vectorizer, CountVectorizer, and KNN models are used to calculate similarities between the CV text and job descriptions.
- Weighted Scoring: Final recommendations are based on a weighted combination of similarity scores.
The job data used for this project was scraped using Selenium. However, due to various reasons, I am not sharing the scraped data or the code used for web scraping. Instead, I have provided a sample dataset (job_recommendations.csv
) that can be used for basic tasks and testing the application.
Job-Recommendation-System/
│
├── app.py # Main application file
├── job_recommendations.csv # Sample job data used for recommendations
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── .gitignore # Files and directories to ignore in Git
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -m 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Streamlit for the intuitive framework.
- pdfplumber for PDF text extraction.
- NLTK for text preprocessing tools.
- Scikit-learn for machine learning libraries.