Skip to content

Development of a spam filter using a custom multinomial Naive Bayes algorithm.

Notifications You must be signed in to change notification settings

billy-moore-98/spam_filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

spam_filter

In this project I have explored the multinomial Naive Bayes' algorithm and applied it to build a text SMS spam filter. The model has an accuracy of over 90%.

The dataset used for both training and testing of the algorithm was created by Tiago A. Almeida and José María Gómez Hidalgo, and can be found at The UCL Machine Learning Repository. The dataset contains SMS messages that are already classified as being spam or not.

The Naive Bayes algorithm will assess whether each individual SMS message is spam by evaluating the word contents of the message. As the algorithm is 'Naive', it assumes there is conditional independence between the words in the message which may not be as accurate.

Overall, the algorithm correctly predicts 98.7% of the test data. The messages which were wrongly predicted contained various elements which may have escaped the algorithm capabilities such as punctual emojis, abbreviations and acronyms.

About

Development of a spam filter using a custom multinomial Naive Bayes algorithm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published