-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copernicus-Bench: Add Level-3 Sentinel-5P #2607
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
6797b31
draft adding senbench-aq-no2-s5p and senbench-aq-o3-s5p datasets
wangyi111 eb97cea
Merge branch 'main' into senbench-aq-s5p
adamjstewart c43213b
Add AQ test data
adamjstewart ef154e8
Add AQ-NO2-S5P
adamjstewart 69c5a62
Add AQ-O3-S5P
adamjstewart f8ddab9
Fix support for older torch
adamjstewart File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file added
BIN
+4.36 KB
...icus/l3_airquality_s5p/airquality_s5p/no2/label_annual/EEA_1kmgrid_2021_no2_avg_34_13.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
...5p/airquality_s5p/no2/s5p_annual/EEA_1kmgrid_2021_no2_avg_34_13/2021-01-01_2021-12-31.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
.../airquality_s5p/no2/s5p_seasonal/EEA_1kmgrid_2021_no2_avg_34_13/2021-01-01_2021-04-01.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
.../airquality_s5p/no2/s5p_seasonal/EEA_1kmgrid_2021_no2_avg_34_13/2021-04-01_2021-07-01.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
.../airquality_s5p/no2/s5p_seasonal/EEA_1kmgrid_2021_no2_avg_34_13/2021-07-01_2021-10-01.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
.../airquality_s5p/no2/s5p_seasonal/EEA_1kmgrid_2021_no2_avg_34_13/2021-10-01_2021-12-31.tif
Binary file not shown.
1 change: 1 addition & 0 deletions
1
tests/data/copernicus/l3_airquality_s5p/airquality_s5p/no2/test.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
EEA_1kmgrid_2021_no2_avg_34_13 |
1 change: 1 addition & 0 deletions
1
tests/data/copernicus/l3_airquality_s5p/airquality_s5p/no2/train.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
EEA_1kmgrid_2021_no2_avg_34_13 |
1 change: 1 addition & 0 deletions
1
tests/data/copernicus/l3_airquality_s5p/airquality_s5p/no2/val.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
EEA_1kmgrid_2021_no2_avg_34_13 |
Binary file added
BIN
+4.36 KB
...rnicus/l3_airquality_s5p/airquality_s5p/o3/label_annual/EEA_1kmgrid_2021_o3_avg_34_13.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
..._s5p/airquality_s5p/o3/s5p_annual/EEA_1kmgrid_2021_o3_avg_34_13/2021-01-01_2021-12-31.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
...5p/airquality_s5p/o3/s5p_seasonal/EEA_1kmgrid_2021_o3_avg_34_13/2021-01-01_2021-04-01.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
...5p/airquality_s5p/o3/s5p_seasonal/EEA_1kmgrid_2021_o3_avg_34_13/2021-04-01_2021-07-01.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
...5p/airquality_s5p/o3/s5p_seasonal/EEA_1kmgrid_2021_o3_avg_34_13/2021-07-01_2021-10-01.tif
Binary file not shown.
Binary file added
BIN
+4.36 KB
...5p/airquality_s5p/o3/s5p_seasonal/EEA_1kmgrid_2021_o3_avg_34_13/2021-10-01_2021-12-31.tif
Binary file not shown.
1 change: 1 addition & 0 deletions
1
tests/data/copernicus/l3_airquality_s5p/airquality_s5p/o3/test.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
EEA_1kmgrid_2021_o3_avg_34_13 |
1 change: 1 addition & 0 deletions
1
tests/data/copernicus/l3_airquality_s5p/airquality_s5p/o3/train.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
EEA_1kmgrid_2021_o3_avg_34_13 |
1 change: 1 addition & 0 deletions
1
tests/data/copernicus/l3_airquality_s5p/airquality_s5p/o3/val.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
EEA_1kmgrid_2021_o3_avg_34_13 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
#!/usr/bin/env python3 | ||
|
||
# Copyright (c) Microsoft Corporation. All rights reserved. | ||
# Licensed under the MIT License. | ||
|
||
import os | ||
import shutil | ||
|
||
import numpy as np | ||
import rasterio as rio | ||
from rasterio import Affine | ||
from rasterio.crs import CRS | ||
|
||
SIZE = 32 | ||
|
||
np.random.seed(0) | ||
|
||
profile = { | ||
'driver': 'GTiff', | ||
'dtype': 'float32', | ||
'height': SIZE, | ||
'width': SIZE, | ||
'count': 1, | ||
'crs': CRS.from_wkt(""" | ||
PROJCS["ETRS89-extended / LAEA Europe", | ||
GEOGCS["ETRS89", | ||
DATUM["European_Terrestrial_Reference_System_1989", | ||
SPHEROID["GRS 1980",6378137,298.257222101, | ||
AUTHORITY["EPSG","7019"]], | ||
AUTHORITY["EPSG","6258"]], | ||
PRIMEM["Greenwich",0, | ||
AUTHORITY["EPSG","8901"]], | ||
UNIT["degree",0.0174532925199433, | ||
AUTHORITY["EPSG","9122"]], | ||
AUTHORITY["EPSG","4258"]], | ||
PROJECTION["Lambert_Azimuthal_Equal_Area"], | ||
PARAMETER["latitude_of_center",52], | ||
PARAMETER["longitude_of_center",10], | ||
PARAMETER["false_easting",4321000], | ||
PARAMETER["false_northing",3210000], | ||
UNIT["metre",1, | ||
AUTHORITY["EPSG","9001"]], | ||
AXIS["Northing",NORTH], | ||
AXIS["Easting",EAST], | ||
AUTHORITY["EPSG","3035"]] | ||
"""), | ||
'transform': Affine(1113.2, 0.0, 3307317.2, 0.0, -1113.2, 3575598.4000000004), | ||
} | ||
|
||
Z = np.random.random(size=(profile['height'], profile['width'])) | ||
files = [ | ||
'2021-01-01_2021-04-01.tif', | ||
'2021-04-01_2021-07-01.tif', | ||
'2021-07-01_2021-10-01.tif', | ||
'2021-10-01_2021-12-31.tif', | ||
] | ||
for variable in ['no2', 'o3']: | ||
pid = f'EEA_1kmgrid_2021_{variable}_avg_34_13' | ||
|
||
# Image (annual) | ||
directory = os.path.join('airquality_s5p', variable, 's5p_annual', pid) | ||
os.makedirs(directory, exist_ok=True) | ||
file = '2021-01-01_2021-12-31.tif' | ||
path = os.path.join(directory, file) | ||
with rio.open(path, 'w', **profile) as src: | ||
src.write(Z, 1) | ||
|
||
# Images (seasonal) | ||
directory = os.path.join('airquality_s5p', variable, 's5p_seasonal', pid) | ||
os.makedirs(directory, exist_ok=True) | ||
for file in files: | ||
path = os.path.join(directory, file) | ||
with rio.open(path, 'w', **profile) as src: | ||
src.write(Z, 1) | ||
|
||
# Label (annual) | ||
directory = os.path.join('airquality_s5p', variable, 'label_annual') | ||
os.makedirs(directory, exist_ok=True) | ||
path = os.path.join(directory, f'{pid}.tif') | ||
with rio.open(path, 'w', **profile) as src: | ||
src.write(Z, 1) | ||
|
||
# Splits | ||
directory = os.path.join('airquality_s5p', variable) | ||
for split in ['train', 'val', 'test']: | ||
with open(os.path.join(directory, f'{split}.csv'), 'w') as f: | ||
f.write(f'{pid}\n') | ||
|
||
# Zip | ||
shutil.make_archive('airquality_s5p', 'zip', '.', 'airquality_s5p') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
# Copyright (c) Microsoft Corporation. All rights reserved. | ||
# Licensed under the MIT License. | ||
|
||
"""Copernicus-Bench AQ-NO2-S5P dataset.""" | ||
|
||
import os | ||
from collections.abc import Callable, Sequence | ||
from typing import Literal | ||
|
||
import torch | ||
from torch import Tensor | ||
|
||
from ..utils import Path, stack_samples | ||
from .base import CopernicusBenchBase | ||
|
||
|
||
class CopernicusBenchAQNO2S5P(CopernicusBenchBase): | ||
"""Copernicus-Bench AQ-NO2-S5P dataset. | ||
|
||
AQ-NO2-S5P is a regression dataset based on Sentinel-5P NO2 images and | ||
EEA air quality data products. Specifically, this dataset combines 2021 | ||
measurements of NO2 (annual average concentration) from EEA with S5P NO2 | ||
("tropospheric NO2 column number density") from GEE. | ||
|
||
This benchmark supports both annual (1 image/location) and seasonal | ||
(4 images/location) modes, the former is used in the original benchmark. | ||
|
||
If you use this dataset in your research, please cite the following papers: | ||
|
||
* https://arxiv.org/abs/2503.11849 | ||
* https://www.researchgate.net/profile/Jan-Horalek/publication/389165501_Air_quality_maps_of_EEA_member_and_cooperating_countries_for_2022/links/67b72628207c0c20fa8ec116/Air-quality-maps-of-EEA-member-and-cooperating-countries-for-2022.pdf | ||
|
||
.. versionadded:: 0.7 | ||
""" | ||
|
||
url = 'https://hf.co/datasets/wangyi111/Copernicus-Bench/resolve/9d252acd3aa0e3da3128e05c6f028647f0e48e5f/l3_airquality_s5p/airquality_s5p.zip' | ||
md5 = '92081c7437c5c1daf783868ad7669877' | ||
zipfile = 'airquality_s5p.zip' | ||
directory = os.path.join('airquality_s5p', 'no2') | ||
filename = '{}.csv' | ||
dtype = torch.float | ||
filename_regex = r'(?P<start>\d{4}-\d{2}-\d{2})_(?P<stop>\d{4}-\d{2}-\d{2})' | ||
date_format = '%Y-%m-%d' | ||
all_bands = ('NO2',) | ||
rgb_bands = ('NO2',) | ||
cmap = 'Wistia' | ||
|
||
def __init__( | ||
self, | ||
root: Path = 'data', | ||
split: Literal['train', 'val', 'test'] = 'train', | ||
mode: Literal['annual', 'seasonal'] = 'annual', | ||
bands: Sequence[str] | None = None, | ||
transforms: Callable[[dict[str, Tensor]], dict[str, Tensor]] | None = None, | ||
download: bool = False, | ||
checksum: bool = False, | ||
) -> None: | ||
"""Initialize a new CopernicusBenchAQNO2S5P instance. | ||
|
||
Args: | ||
root: Root directory where dataset can be found. | ||
split: One of 'train', 'val', or 'test'. | ||
mode: One of 'annual' or 'seasonal'. | ||
bands: Sequence of band names to load (defaults to all bands). | ||
transforms: A function/transform that takes input sample and its target as | ||
entry and returns a transformed version. | ||
download: If True, download dataset and store it in the root directory. | ||
checksum: If True, check the MD5 of the downloaded files (may be slow). | ||
|
||
Raises: | ||
DatasetNotFoundError: If dataset is not found and *download* is False. | ||
""" | ||
self.mode = mode | ||
super().__init__(root, split, bands, transforms, download, checksum) | ||
|
||
def __getitem__(self, index: int) -> dict[str, Tensor]: | ||
"""Return an index within the dataset. | ||
|
||
Args: | ||
index: Index to return. | ||
|
||
Returns: | ||
Data and labels at that index. | ||
""" | ||
pid = self.files[index] | ||
match self.mode: | ||
case 'annual': | ||
file = '2021-01-01_2021-12-31.tif' | ||
path = os.path.join(self.root, self.directory, 's5p_annual', pid, file) | ||
sample = self._load_image(path) | ||
case 'seasonal': | ||
files = [ | ||
'2021-01-01_2021-04-01.tif', | ||
'2021-04-01_2021-07-01.tif', | ||
'2021-07-01_2021-10-01.tif', | ||
'2021-10-01_2021-12-31.tif', | ||
] | ||
root = os.path.join(self.root, self.directory, 's5p_seasonal', pid) | ||
samples = [self._load_image(os.path.join(root, file)) for file in files] | ||
sample = stack_samples(samples) | ||
|
||
path = os.path.join(self.root, self.directory, 'label_annual', f'{pid}.tif') | ||
sample |= self._load_mask(path) | ||
|
||
if self.transforms is not None: | ||
sample = self.transforms(sample) | ||
|
||
return sample |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make a shared CopernicusBenchAQS5P base class, but then we need to document it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be useful when we have more variables and they can be aligned, could be in further updates of Copercnius-Bench.