Course website with videos and slides: https://sparktraining.web.cern.ch/
See also the notebooks on display in the CERN SWAN Gallery
Contact: [email protected]
Tutorial-DataFrame.ipynb
Solutions-DataFrame.ipynb
Examples-Pandas on Spark
Tutorial-SparkSQL.ipynb
HandsOn-SparkSQL_exercises.ipynb
HandsOn-SparkSQL_with_solutions.ipynb
Tutorial-SparkStreaming.ipynb
ML_Demo1_Classifier.ipynb
ML_Demo2_Regression.ipynb
Spark_JDBC_Oracle.ipynb
Demo_Spark_on_Hadoop.ipynb
Demo_Dimuon_mass_spectrum.ipynb
NXCals-example.ipynb
NXCals-example_bis.ipynb
TPCDS_PySpark_CERN_SWAN_getstarted.ipynb
LHCb_OpenData_Spark.ipynb
Dimuon_Spark_ROOT_RDataFrame.ipynb
- Open SWAN and clone the repo:
- note this can take a couple of minutes
- as an alternative you can clone the repo from the SWAN GUI https://swan.web.cern.ch
- find and click the button "Download project from git"
- when prompted, clone the repo
https://github.com/cerndb/SparkTraining.git
- Open the tutorial notebooks at SparkTraining -> notebooks