-
Notifications
You must be signed in to change notification settings - Fork 58
CLI Utility
- CLI Utility
Invoke the CLI utility with the following command:
docker-compose exec cli streamingphish
Users should immediately be presented with the main menu:
wes@phishtest-4:~$ sudo docker-compose exec cli streamingphish
_____ __ _
/ ___// /_________ ____ _____ ___ (_)
\__ \/ __/ ___/ _ \/ __ `/ __ `__ \/ / __ \/ __ `/
___/ / /_/ / / __/ /_/ / / / / / / / / / / /_/ /
/____/\__/_/ \___/\__,_/_/ /_/ /_/_/_/ /_/\__, /
____ __ _ __ /____/
/ __ \/ /_ (_)____/ /
/ /_/ / __ \/ / ___/ __ \
/ ____/ / / / (__ ) / / /
/_/ /_/ /_/_/____/_/ /_/ by Wes Connell
@wesleyraptor
1. Deploy phishing classifier against certstream feed.
2. Operate phishing classifier in manual mode.
3. Manage classifiers (list active classifier and show available classifiers).
4. Train a new classifier.
5. Print configuration.
6. Exit.
Please make a selection [1-6]:
Select option 1 to run the active classifier against the Certificate Transparency log network. Users will be prompted with an error message if no classifiers are trained (in which case, select option 4 to train a classifier and try again).
Please make a selection [1-6]: 1
[*] Fetching active classifier name from config.
[*] Fetching classifier artifacts from database.
[+] Loaded feature extractor.
[+] Loaded 4_22_v1 classifier.
[*] Analysis started - press CTRL+C to quit at anytime.
[cPanel, Inc.] [HIGH] [SCORE:1.000] cpanel.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] mail.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] webdisk.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] webmail.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] www.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] cpanel.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] mail.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] webdisk.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] webmail.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [HIGH] [SCORE:1.000] www.unverifieduser-agreementauthlogin-detailinformation-paypal.tk
[cPanel, Inc.] [SUSPICIOUS] [SCORE:0.841] amazon-services-com.gq
[cPanel, Inc.] [SUSPICIOUS] [SCORE:0.841] www.amazon-services-com.gq
Select option 2 to run the classifier in manual mode, where users may manually type FQDNs on the command line to be scored by the active classifier.
Please make a selection [1-6]: 2
[*] Fetching active classifier name from config.
[*] Fetching classifier artifacts from database.
[+] Loaded feature extractor.
[+] Loaded 4_22_v1 classifier.
[+] Deploying in manual mode. Type 'exit' or 'quit' at any time to return to the main menu.
FQDN/Host/URL: chasebnk-com.ml
[PHISHING]: 0.976
FQDN/Host/URL: apppleid.support-forgot.reset-password.mweiewjsdfewt.com
[PHISHING]: 0.969
FQDN/Host/URL: apple.com
[NOT PHISHING]: 0.002
FQDN/Host/URL: paypal.org
[NOT PHISHING]: 0.002
Select option 3 of the main menu to view a summary of performance metrics from all trained classifiers, change the active classifier, or delete a trained classifier. The classifier management menu looks like this:
Please make a selection [1-6]: 3
[+] Active classifier: better_training_data
[+] Other available classifiers:
- wesley_v1
- wesley_test_v2
- who_dat
- no_fqdn_keywords
1. Summarize accuracy metrics across all trained classifiers.
2. Show performance metrics from a single classifier.
3. Change the active classifier.
4. Delete a classifier.
5. Return to the main menu.
Select option 1 of the classifier management menu to see a summary of accuracy metrics for all trained classifiers. The purpose of training additional classifiers is to explore how changes to the independent variables affect classifier performance (i.e. adding new training data, expanding/reducing features, using different algorithms, using different algorithm parameters, etc). One of the perks from building the application with docker-compose is that the classifiers don't disappear even after you make code changes and rebuild the cli
container, because the classifiers persist to the db
container.
Please make a selection [1-5]: 1
[+] Summary of classifier accuracy metrics:
[--- Test Set Accuracy ---]
0.9964 wesley_v1
0.9948 no_fqdn_keywords
0.9944 better_training_data
0.9944 wesley_test_v2
0.9936 no_tlds_included
[--- AUC [50%] ---]
0.9964 wesley_v1
0.9948 no_fqdn_keywords
0.9944 better_training_data
0.9944 wesley_test_v2
0.9935 no_tlds_included
[--- Recall [50%] ---]
0.9952 wesley_v1
0.9936 no_fqdn_keywords
0.9928 better_training_data
0.9928 wesley_test_v2
0.9902 no_tlds_included
[--- Precision [50%] ---]
0.9976 wesley_v1
0.9968 better_training_data
0.9968 wesley_test_v2
0.9959 no_tlds_included
0.9952 no_fqdn_keywords
[--- Feature Vector Size ---]
467 wesley_v1
465 better_training_data
465 wesley_test_v2
465 no_fqdn_keywords
414 no_tlds_included
[--- Training Set Accuracy ---]
0.9992 better_training_data
0.9992 wesley_test_v2
0.9989 no_fqdn_keywords
0.9988 wesley_v1
0.9968 no_tlds_included
Select option 2 to view the performance metrics for a single trained classifier:
Please make a selection [1-5]: 2
Please enter the name of the classifier you want accuracy metrics from: sample_classifier
[+] Accuracy metrics for classifier sample_classifier:
{
"accuracy": {
"true_positive_rate": "0.9893",
"precision": "0.9874",
"confusion_matrix": [
[
1006,
13
],
[
13,
1016
]
],
"recall": "0.9883",
"false_positive_rate": "0.0128",
"test_set_accuracy": "0.9873",
"auc_score": "0.9873",
"training_set_accuracy": "0.9958"
},
"info": {
"algorithm": "LogisticRegression",
"training_samples": {
"phishing": 4170,
"not_phishing": 4019
},
"training_date": "2018-04-24 22:50:10.094880",
"feature_vector_size": 348,
"parameters": {
"dual": false,
"fit_intercept": true,
"n_jobs": 1,
"class_weight": null,
"max_iter": 100,
"random_state": null,
"warm_start": false,
"solver": "liblinear",
"verbose": 0,
"intercept_scaling": 1,
"tol": 0.0001,
"C": 10,
"penalty": "l2",
"multi_class": "ovr"
}
}
}
Select option 3 to change the active classifier if more than one classifiers are available in the database:
Please make a selection [1-5]: 3
Please enter the name of the classifier you'd like to activate: newest_classifier
[+] Activated new classifier, newest_classifier, in configuration.
[+] Active classifier: newest_classifier
Select option 4 to delete a trainer classifier:
Please make a selection [1-5]: 4
Please enter the name of the classifier you'd like to delete: newest_classifier
[+] Deleted classifier newest_classifier.
The system doesn't include any trained classifiers by default, so select option 4 from the main menu to train one. The metrics from the trained classifier will be printed to the screen as soon as training is complete (and FYI if you're unfamiliar with what the metrics mean, take a look at the accompanying Jupyter notebook). Continue following the instructions to save the classifier, give it a name, and activate it:
Please make a selection [1-6]: 4
[*] Loading benign data.
[*] Loading malicious data.
[+] Completed loading training data.
[*] Computing features...
[+] Training complete.
[*] Computing classifier metrics...
[+] Classifier metrics available.
The metrics from the newly trained classifier are as follows:
{
"info": {
"feature_vector_size": 467,
"training_samples": {
"phishing": 5000,
"not_phishing": 5000
},
"parameters": {
"penalty": "l2",
"solver": "liblinear",
"C": 10,
"multi_class": "ovr",
"intercept_scaling": 1,
"n_jobs": 1,
"class_weight": null,
"fit_intercept": true,
"tol": 0.0001,
"warm_start": false,
"verbose": 0,
"random_state": null,
"dual": false,
"max_iter": 100
},
"training_date": "2018-03-27 07:18:47.759669",
"algorithm": "LogisticRegression"
},
"accuracy": {
"precision": "0.9959",
"training_set_accuracy": "0.9991",
"auc_score": "0.9915",
"recall": "0.9871",
"test_set_accuracy": "0.9916",
"false_positive_rate": "0.0040",
"true_positive_rate": "0.9903"
}
}
Would you like to keep the classifier? [y/N] y
Please enter a name (no spaces) for the classifier: wesley_test_v1
[+] Saved new classifier wesley_test_v1.
Would you like to activate the classifier? [y/N] y
[+] Activated new classifier, wesley_test_v1, in configuration.
Training a new classifier might be necessary for several reasons:
- Exploring new features to extract from FQDNs.
- Updating the keywords, brands, or TLDs in the training_data folder.
- Updating the training sets - perhaps correcting false positives from running against the Certificate Transparency log network.
The training_data
folder is bind-mounted to the host, so updating any of the data in the training_data folder doesn't require rebuilding the cli
container in order to do a retrain. Select option 4 in the main menu to train another classifier, give it a unique name, and activate it. The new classifier, along with any previously trained classifiers, are persisted to the database
container.
Making changes to the feature extraction code (i.e. anything in cli/streamingphish/streamingphish/features.py
) will require rebuilding the cli
container, then selecting option 4 in the main menu. The good news is that as aforementioned, previous classifiers are not lost because they get persisted to the db
container. Trained classifiers will only be lost if the db
container goes down.
By default, features are extracted for any method in cli/streamingphish/streamingphish/features.py
that starts with _fe_
. Removing, adding, or updating these methods will warrant a retrain. Each method returns a dictionary and is well-documented on what they do. The initial methods for extracting features are as follows:
def _fe_extract_tld(self, sample)
def _fe_brand_presence(self, sample)
def _fe_keyword_match(self, sample)
def _fe_keyword_match_fqdn_words(self, sample)
def _fe_compute_domain_entropy(sample)
def _fe_check_phishing_similarity_words(self, sample)
def _fe_number_of_dashes(sample)
def _fe_number_of_periods(sample)
Rebuilding the cli
container after modifying cli/streamingphish/streamingphish/features.py
can be done with the following command:
sudo docker-compose up -d --build
The db and notebook containers should remain unchanged, whereas the cli container should be rebuilt.