Skip to content

HoussemBL/Spark_mlib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark_mlib

Project is based on the example given in:
https://towardsdatascience.com/a-tutorial-using-spark-for-big-data-an-example-to-predict-customer-churn-9078ac9a1e85

We analyze 19GB from data (taken from link below)
https://www.kaggle.com/mryanm/luflow-network-intrusion-detection-data-set
This dataset describes potential situation of malicious cyber intrusion.

The following dataset contains many entries. Each entry describe possible caractertics of potiontial cyber threat. Each entry has also a label haiving one of the possible following values:

  1. Malicious
  2. Begnin
  3. Outlier

Our target here is to learn a model capable to predict labels of entries similar to those available in these files.

Another solution (implemnted in python) for this problem is available in Kaggle plateform (see link below)
https://www.kaggle.com/houssembenlahmar/prediction-of-intrusion

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages