Email spam Classification
-
Each record consists of 3 features - the subject, the email content and the label
-
Each email is one of 2 classes, spam or ham
-
30k examples in train and 3k in test
Dataset Link: Email spam
run the python file which will send the data over tcp connection
python3 stream.py -f <dataset name> -b <batch size>
execute the spark fetch with the help of spark submit
$SPARK_HOME/bin/spark-submit spark_fetch.py 2>log.txt
need to experiment with the batch size ( >1000).