GitHub - parkshub/python-neural-network-spam-ham-classifier

Neural Network: Spam Filter

photo by @alinnnaaaaaa

Please begin with "neural_network.ipynb" and refer to the files in folders "modules" and "saved_data" for clarity. This project was completed using Gradient Notebooks and their PyTorch repository (Python version 3.8.12). A list of the necessary packages can be found in "requirements.txt."

As of March 28th, 2022, "neural_network.ipynb" is missing some outputs. Please refer to "neural_network.pdf" or download and load "neural_network.html."

Background

For this project, I challenged myself to build and train a neural network to successfully detect spam, text messages. I've manually coded the functions for gradient checking, and forward and back propogation by using matrix multiplication. The training dataset was derived from https://www.kaggle.com/rushirdx/spam-and-ham-dataset. Throughout the project, I explore various topics such as bias vs. variance trade-off, regularization, optimization, hyper-parameter tuning and batch gradient descent. Then, I compare the models' accuracy and speed using matplotlib. After determining the best model, I ran a test on my own text messages, which turned about to a great success!

Discussion

I started this project to apply what I've learned from professor Andrew Ng's Coursera lectures and other online courses. I purposely avoided using APIs to better demonstrate my knowledge of neural networks. What I had anticipated taking, at most, 2 weeks turned into 2 and a half months. Initially, I had outlined an even larger project where I would tune the hyperparameters and compare other regularization and optimization techniques—exponential learning rate decay, batch normalization, principal component analysis (PCA), and drop-out regularization. However, I quickly realized the barriers to taking on such a large project, with the main one being time. I had tested a short script to tune the learning rate and the estimated time came out to be over a week. Thankfully, due to Adam optimization’s genius, I was able to avoid tuning hyperparameters.

If you have taken a look at “module.ipynb” and “model.ipynb,” you may have noticed that the cost function isn't embedded in the model. Originally, the cost function was called after the forward propagation function. It received the outputs from forward prop and calculated the cost, but due to an issue—which I believe was due to a rounding error caused by Numpy—gradient checking indicated something was wrong. I tested the model with a simpler 2 x 2 dataset and compared the outputs with my manual calculations, and weirdly, everything seemed to be in order. While searching for the cause, I had discovered a couple minor mistakes. My dictionary-to-vector function was flattening the parameters in the wrong order, but, to my surprise, it had no bearing on the results. Regardless, it was fixed. After two weeks of unnecessarily tinkering with my forward prop, backprop, and cost function, I found the issue and embedded the cost function inside forward propagation instead of the model. I’ve left some of the previous versions of the functions in the module for the reader’s viewing, but ulteriorly because I’m afraid to erase them.

With all the frustrations aside, I genuinely enjoyed working on this project. Through this experience, I've gained an abundance of valuable hands-on experience, not just about neural networks, but coding in general. If you have any questions about the project, or would like to provide feedback, please email me at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
modules		modules
saved_data		saved_data
LICENSE		LICENSE
README.md		README.md
alina-grubnyak-ZiQkhI7417A-unsplash.jpg		alina-grubnyak-ZiQkhI7417A-unsplash.jpg
neural_network.html		neural_network.html
neural_network.ipynb		neural_network.ipynb
neural_network.pdf		neural_network.pdf
requirements.txt		requirements.txt
spam.csv		spam.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Network: Spam Filter

Background

Discussion

About

Releases

Packages

Languages

License

parkshub/python-neural-network-spam-ham-classifier

Folders and files

Latest commit

History

Repository files navigation

Neural Network: Spam Filter

Background

Discussion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages