Phish-Sight-replication

This repository replicates and extends the work of Pandey and Mishra (2023) where they use the color palette of a website to understand if a given website is trying to phish or not.

Our direct replication of the work did not show the state-of-the-art results reported and we hypothesize that the original work did not preprocess the data carefully and thus predicted on 404 pages.

Our extension to the work is we add both a more careful preprocessing pipeline and NLP-based features based on the URL and the actual text of the website. The highest accuracy we achieve is 84% accuracy using a random forest on our new feature set, this is higher than any of the replication models.

Preprocessing

Replication Results (vs. reported vs. extension)

Usage

To use any of these files, please

pip install -r requirements.txt

/data contains links and scraped data used for the research
/metrics contains the metrics from our run of the repo
/models contains pickled models for future use
web_scraping.py is the script which does its namesake. It uses selenium and does a bulk of the feature extraction
preprocessing.py cleans the data and continues a bit of the feature extraction
classical_trainer.py tunes and trains the classical ML models and stores their results in metrics/.

Contributors

Neh Majmudar ([email protected])

Daniel Yakubov ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
data		data
model_training		model_training
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phish-Sight-replication

Preprocessing

Replication Results (vs. reported vs. extension)

Usage

Contents

Contributors

About

Releases

Packages

Contributors 2

Languages

DanielYakubov/Phish-Sight-replication

Folders and files

Latest commit

History

Repository files navigation

Phish-Sight-replication

Preprocessing

Replication Results (vs. reported vs. extension)

Usage

Contents

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages