This project uses Natural Language Processing (NLP) and Machine Learning techniques to classify emails as Spam or Ham. It includes data preprocessing, model training, evaluation, and a web-based deployment using Streamlit.
- Preprocesses email content for classification.
- Classifies emails as spam or ham with a trained machine learning model.
- Provides a user-friendly web interface for real-time email classification.
- Load the dataset and clean the data by removing unnecessary columns and handling null values.
- Map labels (
ham
andspam
) to numerical values for machine learning.
- Convert email text into numerical features using CountVectorizer (bag-of-words approach).
- The Multinomial Naive Bayes algorithm is used for its efficiency in text classification tasks.
- Train the Naive Bayes model using the preprocessed and vectorized data.
- Evaluate the model's accuracy on a test dataset.
- Save the trained model and vectorizer using Pickle.
- Build and deploy the classification interface using Streamlit.
Run the app with the following command:
streamlit run SpamDetect.py