The purpose of this project is to evaluate data regarding air pollution in the USA, using Bid Data technologies.
It consists of a "notebooks" directory containing one file per technology used, including, MapReduce, Spark, SparkDF, SparkSQL and Hive. Also, a "report" directory containing a comprehensive discussion and the project conclusions.
Contributing: We welcome contributions from the community! If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request. Your contributions will help make this tutorial even better for others.
License: This tutorial is provided under the MIT License, which allows you to use, modify, and distribute the code freely.