Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New data mappings approach #686

Merged
merged 20 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions .github/workflows/tests-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ jobs:
python3 wis2box-ctl.py execute wis2box metadata station add-topic --territory-name $TERRITORY $CHANNEL
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA_UPDATE
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA_UPDATE
- name: add Italy synop data (bufr2bufr) 🇮🇹
env:
TOPIC_HIERARCHY: it-roma_met_centre.data.core.weather.surface-based-observations.synop
Expand All @@ -76,7 +76,7 @@ jobs:
python3 wis2box-ctl.py execute wis2box metadata station add-topic --territory-name $TERRITORY $CHANNEL
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: add Algeria synop data (bufr2bufr) 🇩🇿
env:
TOPIC_HIERARCHY: dz-alger_met_centre.data.core.weather.surface-based-observations.synop
Expand All @@ -90,7 +90,7 @@ jobs:
python3 wis2box-ctl.py execute wis2box metadata station add-topic --territory-name $TERRITORY $CHANNEL
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: add Romania synop data (synop2bufr and csv2bufr aws-template) 🇷🇴
env:
TOPIC_HIERARCHY: ro-rnimh.data.core.weather.surface-based-observations.synop
Expand All @@ -104,7 +104,7 @@ jobs:
python3 wis2box-ctl.py execute wis2box metadata station add-topic --territory-name $TERRITORY $CHANNEL
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: add Congo synop data (synop2bufr) 🇨🇩
env:
TOPIC_HIERARCHY: cd-brazza_met_centre.data.core.weather.surface-based-observations.synop
Expand All @@ -118,7 +118,7 @@ jobs:
python3 wis2box-ctl.py execute wis2box metadata station add-topic --territory-name $TERRITORY $CHANNEL
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: add example ship data (bufr2bufr) WMO
env:
TOPIC_HIERARCHY: int-wmo-test.data.core.weather.surface-based-observations.ship
Expand All @@ -135,7 +135,7 @@ jobs:
python3 wis2box-ctl.py execute wis2box metadata station add-topic --wsi 0-22000-0-EUCDE34 $CHANNEL
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: add example buoy data (bufr2bufr) WMO
env:
TOPIC_HIERARCHY: int-wmo-test.data.core.weather.surface-based-observations.buoy
Expand All @@ -148,7 +148,7 @@ jobs:
python3 wis2box-ctl.py execute wis2box metadata station add-topic --wsi 0-22000-0-1400011 $CHANNEL
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: add example wind profiler data (bufr2bufr) WMO
env:
TOPIC_HIERARCHY: int-wmo-test.data.core.weather.surface-based-observations.wind_profiler
Expand All @@ -161,7 +161,7 @@ jobs:
python wis2box-ctl.py execute wis2box metadata station add-topic --wsi 0-702-0-48698 $CHANNEL
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: add China GRIB2 data (universal pipeline) 🇨🇳
env:
TOPIC_HIERARCHY: cn-cma.data.core.weather.prediction.forecast.medium-range.probabilistic.global
Expand All @@ -172,7 +172,7 @@ jobs:
python3 wis2box-ctl.py execute wis2box dataset publish $DISCOVERY_METADATA
curl -s http://localhost/oapi/collections/discovery-metadata/items/$DISCOVERY_METADATA_ID --output /tmp/$DISCOVERY_METADATA_ID
check-jsonschema --schemafile /tmp/wcmp2-bundled.json /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -th $TOPIC_HIERARCHY -p $TEST_DATA
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: sleep 30 seconds then run integration tests ⚙️
run: |
sleep 30
Expand Down
14 changes: 7 additions & 7 deletions docs/source/reference/auth.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,19 @@ To add a token to PUT/POST/DELETE requests to the stations collection, use the f

This will generate a random token that can be use to update the stations collection.

Adding Access Control on topics
-------------------------------
Adding Access Control on datasets
maaikelimper marked this conversation as resolved.
Show resolved Hide resolved
---------------------------------

All topic hierarchies in wis2box are open by default. A topic becomes closed, with access control applied, the
first time a token is generated for a topic hierarchy.
All dataset in wis2box are open by default. A dataset becomes closed, with access control applied, the
first time a token is generated for a dataset

.. note::

Make sure you are logged into the wis2box-management container when using the wis2box CLI

.. code-block:: bash

wis2box auth add-token --topic-hierarchy mw-mw_met_centre.data.core.weather.surface-based-observations.synop mytoken
wis2box auth add-token --metadata-id urn:wmo:md:mw-mw_met_centre:surface-weather-observations mytoken


If no token is provided, a random string will be generated. Be sure to the record token now, there is no
Expand All @@ -58,8 +58,8 @@ Token credentials can be validated using the wis2box command line utility.
.. code-block:: bash

wis2box auth show
wis2box auth has-access-topic --topic-hierarchy mw-mw_met_centre.data.core.weather.surface-based-observations.synop mytoken
wis2box auth has-access-topic --topic-hierarchy mw-mw_met_centre.data.core.weather.surface-based-observations.synop notmytoken
wis2box auth has-access-topic --metadata-id urn:wmo:md:mw-mw_met_centre:surface-weather-observations mytoken
wis2box auth has-access-topic --metadata-id urn:wmo:md:mw-mw_met_centre:surface-weather-observations notmytoken


Once a token has been generated, access to any data of that topic in the WAF or API requires token authentication.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,28 +27,27 @@ Explicit topic hierarchy workflow
.. code-block:: bash

# process a single CSV file
wis2box data ingest --topic-hierarchy foo.bar.baz -p /path/to/file.csv
wis2box data ingest --metadata-id urn:wmo:md:centre-id:mydata -p /path/to/file.csv

# process a directory of CSV files
wis2box data ingest --topic-hierarchy foo.bar.baz -p /path/to/dir
wis2box data ingest --metadata-id urn:wmo:md:centre-id:mydata -p /path/to/dir

# process a directory of CSV files recursively
wis2box data ingest --topic-hierarchy foo.bar.baz -p /path/to/dir -r
wis2box data ingest --metadata-id urn:wmo:md:centre-id:mydata -p /path/to/dir -r


Implicit topic hierarchy workflow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Implicit metadata_id workflow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash

# process incoming data; topic hierarchy is inferred from fuzzy filepath equivalent
# wis2box will detect 'foo/bar/baz' as topic hierarchy 'foo.bar.baz'
# process incoming data; metadata_id is inferred from fuzzy filepath equivalent
# wis2box will detect 'mydata' as metadata_id 'urn:md:wmo:mydata'
wis2box data ingest -p /path/to/foo/bar/baz/data/file.csv


Event driven ingest, processing and publishing
----------------------------------------------

Once all metadata and topic hierarchies are setup, event driven workflow
Once all datasets are setup, event driven workflow
will immediately start to listen on files in the ``wis2box-incoming`` storage bucket as they are
placed in the appropriate topic hierarchy directory.
placed in the appropriate directory that can be matched to a metadata_id.
22 changes: 14 additions & 8 deletions docs/source/user/data-ingest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,16 +79,21 @@ Select 'browse' on the ``wis2box-incoming`` bucket and select 'Choose or create
:alt: MinIO new folder path

.. note::
The folder in which the file is placed defines the topic you want to share the data on and should match the datasets defined in your data mappings.
The folder in which the file is placed will be used to determine the dataset to which the file belongs.

The first 3 levels of the WIS2 topic hierarchy ``origin/a/wis2`` are automatically included by wis2box when publishing data notification messages.

For example:
The wis2box-management container will match the path of the file to the dataset defined in the data mappings by checking it either contains the metadata identifier or the topic (excluding 'origin/a/wis2/').

* data to be published on: ``origin/a/wis2/cd-brazza_met_centre/data/core/weather/surface-based-observations/synop``
* upload data in the path: ``cd-brazza_met_centre/data/core/weather/surface-based-observations/synop``
For example, using a filepath matching the metadata identifier:

* Metadata identifier: ``urn:wmo:md:it-roma_met_centre:surface-weather-observations.synop``
* upload data in path containing: ``it-roma_met_centre:surface-weather-observations.synop``

For example using a filepath matching the topic hierarchy:

The error message ``Topic Hierarchy validation error: No plugins for minio:9000/wis2box-incoming/... in data mappings`` indicates you stored a file in a folder for which no matching dataset was defined in the data mappings.
* Topic Hierarchy: ``origin/a/wis2/cd-brazza_met_centre/data/core/weather/surface-based-observations/synop``
* upload data in the path containing: ``cd-brazza_met_centre/data/core/weather/surface-based-observations/synop``

The error message ``Path validation error: Could not match http://minio:9000/wis2box-incoming/... to dataset, ...`` indicates that a file was stored in a directory that could not be matched to a dataset.

After uploading a file to ``wis2box-incoming`` storage bucket, you can browse the content in the ``wis2box-public`` bucket. If the data ingest was successful, new data will appear as follows:

Expand Down Expand Up @@ -132,7 +137,8 @@ See below a Python example to upload data using the MinIO package:
from minio import Minio

filepath = '/home/wis2box-user/local-data/mydata.bin'
minio_path = '/it-roma_met_centre/data/core/weather/surface-based-observations/synop/'
# path should match the metadata or the topic in the data mappings
minio_path = 'urn:wmo:md:it-roma_met_centre:surface-weather-observations'

endpoint = 'http://localhost:9000'
WIS2BOX_STORAGE_USERNAME = 'wis2box'
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P180D
topic_hierarchy: cd-brazza_met_centre.data.core.weather.surface-based-observations.synop
topic_hierarchy: cd-brazza_met_centre/data/core/weather/surface-based-observations/synop
country: cog
centre_id: cd-brazza_met_centre
data_mappings:
Expand Down
2 changes: 1 addition & 1 deletion tests/data/metadata/discovery/cn-grapes-geps-global.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P30D
topic_hierarchy: cn-cma.data.core.weather.prediction.forecast.medium-range.probabilistic.global
topic_hierarchy: cn-cma/data/core/weather/prediction/forecast/medium-range/probabilistic/global
country: chn
centre_id: cn-cma
data_mappings:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P30D
topic_hierarchy: dz-alger_met_centre.data.core.weather.surface-based-observations.synop
topic_hierarchy: dz-alger_met_centre/data/core/weather/surface-based-observations/synop
country: dza
centre_id: dz-alger_met_centre
data_mappings:
Expand Down
2 changes: 1 addition & 1 deletion tests/data/metadata/discovery/int-wmo-test-buoy.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P30D
topic_hierarchy: int-wmo-test.data.core.weather.surface-based-observations.buoy
topic_hierarchy: int-wmo-test/data/core/weather/surface-based-observations/buoy
country: int
centre_id: int-wmo-test
data_mappings:
Expand Down
2 changes: 1 addition & 1 deletion tests/data/metadata/discovery/int-wmo-test-ship.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P30D
topic_hierarchy: int-wmo-test.data.core.weather.surface-based-observations.ship
topic_hierarchy: int-wmo-test/data/core/weather/surface-based-observations/ship
country: int
centre_id: int-wmo-test
data_mappings:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P30D
topic_hierarchy: int-wmo-test.data.core.weather.surface-based-observations.wind_profiler
topic_hierarchy: int-wmo-test/data/core/weather/surface-based-observations/wind_profiler
country: int
centre_id: int-wmo-test
data_mappings:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P30D
topic_hierarchy: it-roma_met_centre.data.core.weather.surface-based-observations.synop
topic_hierarchy: it-roma_met_centre/data/core/weather/surface-based-observations/synop
country: ita
centre_id: it-roma_met_centre
data_mappings:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P30D
topic_hierarchy: mw-mw_met_centre.data.core.weather.surface-based-observations.synop
topic_hierarchy: mw-mw_met_centre/data/core/weather/surface-based-observations/synop
country: mwi
centre_id: mw-mw_met_centre
data_mappings:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
wis2box:
retention: P30D
topic_hierarchy: ro-rnimh.data.core.weather.surface-based-observations.synop
topic_hierarchy: ro-rnimh/data/core/weather/surface-based-observations/synop
country: rou
centre_id: ro-rnimh
data_mappings:
Expand Down
18 changes: 8 additions & 10 deletions tests/integration/test_workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def test_data_ingest():

assert item_api['reportId'] == 'WIGOS_0-454-2-AWSNAMITAMBO_20210707T145500'
assert item_api['properties']['resultTime'] == '2021-07-07T14:55:00Z' # noqa
item_source = f'2021-07-07/wis/mw-mw_met_centre/data/core/weather/surface-based-observations/synop/{item_api["reportId"]}.bufr4' # noqa
item_source = f'2021-07-07/wis/{ID}/{item_api["reportId"]}.bufr4' # noqa
r = SESSION.get(f'{URL}/data/{item_source}') # noqa
assert r.status_code == codes.ok

Expand Down Expand Up @@ -247,12 +247,12 @@ def test_message_api():

# test messages per test dataset
counts = {
'mw_met_centre': 25,
'roma_met_centre': 33,
'alger_met_centre': 29,
'rnimh': 50,
'brazza_met_centre': 15,
'wmo-test': 11,
'mw-mw_met_centre': 25,
'it-roma_met_centre': 33,
'dz-alger_met_centre': 29,
'ro-rnimh': 50,
'cd-brazza_met_centre': 15,
'int-wmo-test': 11,
'cn-cma': 11
}
for key, value in counts.items():
Expand All @@ -267,9 +267,7 @@ def test_message_api():
assert r['numberMatched'] == sum(counts.values())

# we want to find a particular message with data ID
target_data_id = "cd-brazza_met_centre/data/core/weather/" \
"surface-based-observations/synop/" \
"WIGOS_0-20000-0-64406_20230803T090000"
target_data_id = "cd-brazza_met_centre:surface-weather-observations/WIGOS_0-20000-0-64406_20230803T090000" # noqa

msg = None
for feature in r['features']:
Expand Down
12 changes: 6 additions & 6 deletions wis2box-management/docker/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,18 @@ set -e
#ensure environment-variables are available for cronjob
printenv | grep -v "no_proxy" >> /etc/environment

# ensure cron is running
service cron start
service cron status

# wis2box commands
# TODO: avoid re-creating environment if it already exists
# TODO: catch errors and avoid bounce in conjuction with restart: always
wis2box metadata discovery setup
wis2box metadata station setup
wis2box environment create
wis2box environment show
wis2box api setup
wis2box metadata discovery setup
wis2box metadata station setup

# ensure cron is running
service cron start
service cron status

echo "Caching topic hierarchy CSVs"
pywis-topics bundle sync
Expand Down
5 changes: 1 addition & 4 deletions wis2box-management/wis2box/api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,7 @@ def setup_collection(meta: dict = {}) -> bool:
LOGGER.error(f'Invalid configuration: {meta}')
return False

if 'topic_hierarchy' in meta:
data_name = meta['topic_hierarchy']
else:
data_name = meta['id']
data_name = meta['id']

backend = load_backend()
if not backend.has_collection(data_name):
Expand Down
10 changes: 5 additions & 5 deletions wis2box-management/wis2box/api/config/pygeoapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,15 +140,15 @@ def prepare_collection(self, meta: dict) -> bool:

editable = False

if meta['id'] in ['discovery-metadata', 'messages', 'stations']:
resource_id = meta['id']
else:
resource_id = meta['topic_hierarchy']
resource_id = meta['id']

if meta['id'] in ['discovery-metadata', 'stations']:
editable = True
else:
# avoid colons in resource id
resource_id = resource_id.lower().replace(':', '-')

LOGGER.debug(f'Resource id: {resource_id}')
LOGGER.info(f'Prepare collection with resource_id={resource_id}')

type_ = meta.get('type', 'feature')

Expand Down
Loading
Loading