Iceland is a playground project for Apache Iceberg.
It contains several components used to easily run, test, benchmark Apache Iceberg, with different flavors (different catalogs, different engines, ...).
The objectives are:
- Provide resources (docker images, helm charts, notebooks, ...) to easily start with Apache Iceberg using different flavors
- Easily run use cases with concrete datasets, representing concrete usage
- Compare the performance on the use cases with the different flavors
The components are:
datasets
contains concrete data used in the samples and testsusecases
contains samples and examples using the datasetsbenchmark
contains use cases benchmarkicekube
contains Docker images, HELM charts, ... Basically everything needed to start with Apache Iceberg
GDELT Project stores all new articles as "events": http://data.gdeltproject.org/events/index.html
Daily, a zip file is created, containing a CSV file with all events using the following format:
545037848 20150530 201505 2015 2015.4110 JPN TOKYO JPN 1 046 046 04 1 7.0 15 1 15 -1.06163552535792 0 4 Tokyo, Tokyo, Japan JA JA40 35.685 139.751 -246227 4 Tokyo, Tokyo, Japan JA JA40 35.685 139.751 -246227 20160529 http://deadline.com/print-article/1201764227/
The format is described here: http://data.gdeltproject.org/documentation/GDELT-Data_Format_Codebook.pdf
gdelt/spark/di
gdelt/spark/q1
This query extracts all events for a specific location, using Spark engine.
icekube