Spark_mlib

Project is based on the example given in:
https://towardsdatascience.com/a-tutorial-using-spark-for-big-data-an-example-to-predict-customer-churn-9078ac9a1e85

We analyze 19GB from data (taken from link below)
https://www.kaggle.com/mryanm/luflow-network-intrusion-detection-data-set
This dataset describes potential situation of malicious cyber intrusion.

The following dataset contains many entries. Each entry describe possible caractertics of potiontial cyber threat. Each entry has also a label haiving one of the possible following values:

Malicious
Begnin
Outlier

Our target here is to learn a model capable to predict labels of entries similar to those available in these files.

Another solution (implemnted in python) for this problem is available in Kaggle plateform (see link below)
https://www.kaggle.com/houssembenlahmar/prediction-of-intrusion

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
IntrusionData/01		IntrusionData/01
src/main/scala/com		src/main/scala/com
.gitignore		.gitignore
Databricks_notebook.dbc		Databricks_notebook.dbc
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark_mlib

About

Releases

Packages

Languages

HoussemBL/Spark_mlib

Folders and files

Latest commit

History

Repository files navigation

Spark_mlib

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages