The course has seven sessions in which students directly interact with large data sets in practical sessions to develop the technical skills highly demanded in big data projects. The sessions start by discovering big data sources, performing descriptive analytics and plotting different data sets to identify trends and correlation. The course moves towards spatial and temporal dimensions of big data sets and the way to graphically represent features from these multiple dimensions. The last part of the course deals with data management tasks such as splitting, aggregating, merging and summarising datasets to improve analysis and visualization.
The course follows a practical methodology in which students study key concepts during the sessions and develop exercises to understand them deeply. Students have access to a GitHub repository with a compilation of source code and examples for the topics and tools used during the course. Along with the exercises, students will define a particular analysis scenario to apply the concepts seen in class. The analysis scenario might focus on a problem and a defined set of variables to analyse as well as tools to visually represent the results. Students will decide about analysis scenarios after the third session once they have explored multiple datasets and analysis tools.
2020 - Diego Pajarito