Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update migrations #667

Merged
merged 11 commits into from
Apr 30, 2024
Merged

update migrations #667

merged 11 commits into from
Apr 30, 2024

Conversation

tomkralidis
Copy link
Collaborator

No description provided.

@tomkralidis tomkralidis added the discovery metadata Discovery metadata label Apr 28, 2024
@tomkralidis tomkralidis added this to the sprint-014 milestone Apr 28, 2024


def migrate(dryrun):
LOGGER.info('Updating station data in Elasticsearch index')
Copy link
Member

@david-i-berry david-i-berry Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated below (line 60), delete.

try:
res = es.search(index=es_index,
query={'match_all': {}},
size=maxrecords)
Copy link
Member

@david-i-berry david-i-berry Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whilst maxrecords is large for the metadata do we want/need to consider paging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this a migration, and the the state of existing installations, we are probably safe without paging.

th = record['wis2box']['topic_hierarchy']

if th not in DATA_MAPPINGS['data'].keys():
print("TH", th)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger?

record['wis2box']['data_mappings'] = DATA_MAPPINGS['data'][th]

if dryrun:
LOGGER.info('dryrun == True, writing updates to stdout')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this logged for each record?

LOGGER.info('dryrun == True, writing updates to stdout')
print(record)
else:
LOGGER.info('Updating index ...')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move outside of loop over records?

@maaikelimper
Copy link
Collaborator

I tried to test the migration-script, I got the following error:

mlimper@wis2box-migration-test-server:~/wis2box$ python3 wis2box-ctl.py execute python3 /app/migrations/v1_0b6_to_v1_0b7/update_wcmp2_identifiers.py
[2024-04-29T14:07:24Z] {/app/wis2box/log.py:63} DEBUG - Logging initialized
[2024-04-29T14:07:24Z] {/app/migrations/v1_0b6_to_v1_0b7/update_wcmp2_identifiers.py:110} INFO - Running wis2box migration from v1_0b6 to v1_0b7 (update wcmp2 identifiers)
[2024-04-29T14:07:24Z] {/app/migrations/v1_0b6_to_v1_0b7/update_wcmp2_identifiers.py:52} INFO - Updating discovery data in Elasticsearch index
[2024-04-29T14:07:24Z] {/app/migrations/v1_0b6_to_v1_0b7/update_wcmp2_identifiers.py:53} INFO - Connecting to API ...
[2024-04-29T14:07:24Z] {/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py:246} DEBUG - Starting new HTTP connection (1): elasticsearch:9200
[2024-04-29T14:07:24Z] {/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py:474} DEBUG - http://elasticsearch:9200 "POST /discovery-metadata/_search HTTP/1.1" 200 4493
[2024-04-29T14:07:24Z] {/usr/local/lib/python3.8/dist-packages/elastic_transport/_transport.py:349} INFO - POST http://elasticsearch:9200/discovery-metadata/_search [status:200 duration:0.006s]
[2024-04-29T14:07:24Z] {/app/migrations/v1_0b6_to_v1_0b7/update_wcmp2_identifiers.py:69} INFO - Processing [{'_index': 'discovery-metadata', '_id': 'urn:x-wmo:md:int-wmo:surface-based-observations.synop', '_score': 1.0, '_ignored': ['time.interval'], '_source': {'id': 'urn:x-wmo:md:int-wmo:surface-based-observations.synop', 'conformsTo': ['http://wis.wmo.int/spec/wcmp/2/conf/core'], 'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[-180.0, -90.0], [-180.0, 90.0], [180.0, 90.0], [180.0, -90.0], [-180.0, -90.0]]]}, 'properties': {'identifier': 'urn:x-wmo:md:int-wmo:surface-based-observations.synop', 'title': 'Hourly synoptic observations from fixed-land stations (SYNOP) (int-wmo)', 'description': 'Hourly synoptic observations from fixed-land stations (SYNOP) (int-wmo)', 'themes': [{'concepts': [{'id': 'weather'}], 'scheme': 'https://github.com/wmo-im/wis2-topic-hierarchy/blob/main/topic-hierarchy/earth-system-discipline/index.csv'}], 'language': 'en', 'type': 'dataset', 'created': '2024-04-24T00:00:00Z', 'updated': '2024-04-29T12:14:22Z', 'rights': None, 'contacts': [{'addresses': [{'country': 'Intergovernmental Organization'}], 'roles': ['pointOfContact', 'distributor'], 'organization': 'WMO', 'contactInstructions': 'email', 'emails': [{'value': '[email protected]'}], 'links': [{'rel': 'canonical', 'type': 'text/html', 'href': None}]}], 'keywords': ['surface', 'land', 'observations'], 'wmo:dataPolicy': 'core', 'wmo:topicHierarchy': 'origin/a/wis2/int-wmo/data/core/weather/surface-based-observations/synop', 'id': 'urn:x-wmo:md:int-wmo:surface-based-observations.synop'}, 'links': [{'href': 'http://136.156.130.78/oapi/collections/urn:x-wmo:md:int-wmo:surface-based-observations.synop', 'type': 'OAFeat', 'rel': 'collection', 'title': 'urn:x-wmo:md:int-wmo:surface-based-observations.synop'}, {'href': 'mqtt://everyone:everyone@mosquitto:1883', 'type': 'MQTT', 'rel': 'data', 'title': 'int-wmo.data.core.weather.surface-based-observations.synop', 'channel': 'origin/a/wis2/int-wmo/data/core/weather/surface-based-observations/synop'}, {'href': 'http://136.156.130.78/oapi/collections/discovery-metadata/items/urn:x-wmo:md:int-wmo:surface-based-observations.synop', 'type': 'OARec', 'rel': 'canonical', 'title': 'urn:x-wmo:md:int-wmo:surface-based-observations.synop'}], 'time': {'interval': ['BEGIN_DATE', '..'], 'resolution': 'P1H'}}}, {'_index': 'discovery-metadata', '_id': 'urn:x-wmo:md:int-wmo:surface-based-observations.temp', '_score': 1.0, '_ignored': ['time.interval'], '_source': {'id': 'urn:x-wmo:md:int-wmo:surface-based-observations.temp', 'conformsTo': ['http://wis.wmo.int/spec/wcmp/2/conf/core'], 'type': 'Feature', 'geometry': {'type': 'Polygon', 'coordinates': [[[-180.0, -90.0], [-180.0, 90.0], [180.0, 90.0], [180.0, -90.0], [-180.0, -90.0]]]}, 'properties': {'identifier': 'urn:x-wmo:md:int-wmo:surface-based-observations.temp', 'title': 'Upper-level temperature/humidity/wind reports from fixed-land stations (TEMP) (int-wmo)', 'description': 'Upper-level temperature/humidity/wind reports from fixed-land stations (TEMP) (int-wmo)', 'themes': [{'concepts': [{'id': 'weather'}], 'scheme': 'https://github.com/wmo-im/wis2-topic-hierarchy/blob/main/topic-hierarchy/earth-system-discipline/index.csv'}], 'language': 'en', 'type': 'dataset', 'created': '2024-04-24T00:00:00Z', 'updated': '2024-04-29T12:14:31Z', 'rights': None, 'contacts': [{'addresses': [{'country': 'Intergovernmental Organization'}], 'roles': ['distributor', 'pointOfContact'], 'organization': 'WMO', 'contactInstructions': 'email', 'emails': [{'value': '[email protected]'}], 'links': [{'rel': 'canonical', 'type': 'text/html', 'href': None}]}], 'keywords': ['upper air', 'humidity', 'wind', 'observations'], 'wmo:dataPolicy': 'core', 'wmo:topicHierarchy': 'origin/a/wis2/int-wmo/data/core/weather/surface-based-observations/temp', 'id': 'urn:x-wmo:md:int-wmo:surface-based-observations.temp'}, 'links': [{'href': 'http://136.156.130.78/oapi/collections/urn:x-wmo:md:int-wmo:surface-based-observations.temp', 'type': 'OAFeat', 'rel': 'collection', 'title': 'urn:x-wmo:md:int-wmo:surface-based-observations.temp'}, {'href': 'mqtt://everyone:everyone@mosquitto:1883', 'type': 'MQTT', 'rel': 'data', 'title': 'int-wmo.data.core.weather.surface-based-observations.temp', 'channel': 'origin/a/wis2/int-wmo/data/core/weather/surface-based-observations/temp'}, {'href': 'http://136.156.130.78/oapi/collections/discovery-metadata/items/urn:x-wmo:md:int-wmo:surface-based-observations.temp', 'type': 'OARec', 'rel': 'canonical', 'title': 'urn:x-wmo:md:int-wmo:surface-based-observations.temp'}], 'time': {'interval': ['BEGIN_DATE', '..'], 'resolution': 'P12H'}}}] records
[2024-04-29T14:07:24Z] {/app/migrations/v1_0b6_to_v1_0b7/update_wcmp2_identifiers.py:75} INFO - Updating discovery metadata record urn:wmo:md:int-wmo:surface-based-observations.synop
Traceback (most recent call last):
  File "/app/migrations/v1_0b6_to_v1_0b7/update_wcmp2_identifiers.py", line 111, in <module>
    migrate(dryrun=args.dryrun)
  File "/app/migrations/v1_0b6_to_v1_0b7/update_wcmp2_identifiers.py", line 77, in migrate
    th = record['wis2box']['topic_hierarchy']
KeyError: 'wis2box'

@maaikelimper
Copy link
Collaborator

I've added some commits to this PR to fix various issues, and I could confirm the dataset-editor could load the datasets after migration.
The migration-script is not currently addressing updating the links or the data-collection which also used x-wmo in the identifier, breaking the wis2box-ui

@tomkralidis tomkralidis changed the title add WCMP2 migration script (#663) update migrations Apr 30, 2024
@tomkralidis
Copy link
Collaborator Author

Update: given WCMP2 migrations are complex given the significant changes in wis2box-api and associated workflows, we will instruct users to delete their volumes and start fresh. Keeping station updates and WCMP2 x-wmo check in this PR.

@tomkralidis tomkralidis merged commit 262ffd3 into main Apr 30, 2024
2 checks passed
@tomkralidis tomkralidis deleted the issue-663 branch April 30, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discovery metadata Discovery metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants