Guhan Kabbina
Harshita Vidapanakal
Hanuraag Baskaran
Rohan M
This repository contains source code for the following projects:
Run the script files present in the config
folder.
To Install both Hadoop and Spark on your Linux machine
Run the requirements script files present in the config
folder.
To install all the required libraries for all the projects in this repository
The required data files for all the projects is present in the data
folder.
The data files are pre-processed and a sample of the data is stored, but the link for the entire dataset is provided in the data\README.md
file.
The source code for all the projects is present in the src
folder.
PLEASE READ THE DOCUMENTATION AND REPORT TO UNDERSTAND THE WORKING OF THE CODE
Run the respective script files present in the tools
folder for each project.
The output for each project is present in the sample
folder.
Pre-Trained models for Spam_Ham_Classification
are present in the build
folder to be used for the classification of the emails using the test src\Spam_Ham\models\model_test.py
file.
The peformance analysis of the models in the projects is provided in the report\images
folder.
Please raise a Github issue if you have any questions or suggestions.