Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

607 review request to integrate universal pipeline #611

Merged
merged 18 commits into from
Jan 22, 2024

Conversation

alimand
Copy link
Collaborator

@alimand alimand commented Jan 10, 2024

Adding grib2-pipeline-plugin demo

grib2-pipeline-plugin

1. 3 Related Files

a.universal.py, put it in /wis2box/wis2box-management/wis2box/data

b.data-mappings.yml, appended it in /wis2box/tests/data/data-mappings.yml

c.test data(10 files), we entered /wis2box/tests/data/observation, created a new directory named 'china', we put all the testing data sample in this directory.

d·discovery-metadata:GRAPES-GEPS-GLB.yml,put it in /wis2box/tests/data/metadata/discovery/GRAPES-GEPS-GLB.yml

2. Source Code

"""create function: UniversalData,inherit wis2box.data.base.BaseAbstractData"""

Implement the transform method and fill in the output_data property, returning True

universal.py

###############################################################################
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#
###############################################################################
import logging
from pathlib import Path
from typing import Union
from dateutil.parser import parse
from wis2box.data.base import BaseAbstractData

LOGGER = logging.getLogger(__name__)


class UniversalData(BaseAbstractData):
    """Universal data"""

    def __init__(self, defs: dict) -> None:
        super().__init__(defs)

    def transform(self, input_data: Union[Path, bytes],
                  filename: str = '') -> bool:

        filename = Path(filename)
        LOGGER.debug('Procesing data')
        input_bytes = self.as_bytes(input_data)

        LOGGER.debug('Deriving datetime')
        match = self.validate_filename_pattern(filename.name)

        if match is None:
            msg = f'Invalid filename format: {filename} ({self.file_filter})'
            LOGGER.error(msg)
            raise ValueError(msg)
        try:
            date_time = match.group(1)
        except IndexError:
            msg = 'Missing date/time in filename pattern'
            LOGGER.error(msg)
            raise ValueError(msg)

        date_time = parse(date_time)

        rmk = filename.stem
        suffix = filename.suffix.replace('.', '')

        self.output_data[rmk] = {
            suffix: input_bytes,
            '_meta': {
                'identifier': rmk,
                'relative_filepath': self.get_local_filepath(date_time),
                'data_date': date_time
            }
        }

        return True

    def get_local_filepath(self, date_):
        yyyymmdd = date_.strftime('%Y-%m-%d')
        return Path(yyyymmdd) / 'wis' / self.topic_hierarchy.dirpath

3. Data-mappings.yml configures the topic hierarchy of the numerical prediction data (CMA as an example)

data-mappings.yml

data: 		
chn.babj.data.core.weather.prediction.forecast.shortrange.probabilistic.global.CMA_GEPS::
    plugins:
        grib2:
"""call grib2 data pipeline plugin to deal with CMA_GEPS grib2 data"""
            - plugin: wis2box.data.universal.UniversalData
              notify: true
              buckets:
                - ${WIS2BOX_STORAGE_INCOMING}
              file-pattern: '^.*_(\d{8})\d{2}.*\.grib2$'

4. Test data list

Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-024.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-036.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-048.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-060.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-072.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-084.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-096.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-108.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-120.grib2
Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-132.grib2

5. GRAPES-GEPS-GLB.yml

wis2box:
    retention: P30D
    topic_hierarchy: cn-cma-babj.data.core.weather.prediction.forecast.short-range.probabilistic.global
    country: chn
    centre_id: cn-cma-babj

mcf:
    version: 1.0

metadata:
    identifier: urn:x-wmo:md:cn-cma-babj:data.core.weather.prediction.forecast.short-range.probabilistic.global
    hierarchylevel: dataset

identification:
    title: CMA GRAPES GEPS v1.3
    abstract: GRAPES GEPS is the main technical means to solve the uncertainty of CMA-GFS medium-term forecast and the difficulties of extreme weather forecast.
    dates:
        creation: 2024-01-17
    keywords:
        default:
            keywords:
                - mean sea level Pressure 
                - 2 m above ground Temperature
                - 10 m above ground U-Component of Wind
                - 10 m above ground V-Component of Wind
                - Total Precipitation
                - Geopotential Height
                - Temperature
                - U-Component of Wind
                - V-Component of Wind
        wmo:
            keywords:
                - weatherObservations
            keywords_type: theme
            vocabulary:
                name: WMO Category Code
                url: https://github.com/wmo-im/wcmp-codelists/blob/main/codelists/WMO_CategoryCode.csv
    extents:
        spatial:
            - bbox: [73.66000, 4.00000, 135.08000, 53.52000]
              crs: 259200
        temporal:
            - begin: 2021-11-29
              end: null
              resolution: P12H
    url: http://gisc.wis.cma.cn/wis/portal.pub?M_PID=urn:x-wmo:md:int.wmo.wis::CMA_GEPS
    wmo_data_policy: core

contact:
    pointOfContact: &contact_poc
        organization: China Meteorological Administration (CMA)
        url: https://www.cma.gov.cn/
        individualname: National Meteorological Information Center (NMIC)
        positionname: National Meteorological Information Center (NMIC)
        phone: 86-10-68409329
        fax: null
        address: 46 Zhongguancun Nandajie
        city: Beijing
        administrativearea: Beijing
        postalcode: 100 081
        country: China
        email: [email protected]
        hoursofservice: 0000h - 0900h UTC
        contactinstructions: email

    distributor: *contact_poc

This is grib2 pipeline plugin from CMA which named universal.py, it is recommended to put it in /wis2box/wis2box-management/wis2box/data directory.
reference grib2-pipeline-plugin in CMA demo TH in data-mapping to deal with grib2 files, which is recomended to append it in /wis2box/tests/data/data-mappings.yml
these testing dataset provide some sample to support users to test grib2-data-pipeline-plugin
@alimand alimand linked an issue Jan 10, 2024 that may be closed by this pull request
@alimand alimand self-assigned this Jan 10, 2024
@alimand alimand added the data transformation Data transformation label Jan 10, 2024
@maaikelimper
Copy link
Collaborator

maaikelimper commented Jan 10, 2024

Hi Jin,
Thank you very much for your commit, Tom and I will review how to integrate this in the coming days.
Initial comments:

  • data_date should not be set to current datetime if file-pattern match fails, instead use LOGGER.error and return False
  • I would like to include the china data in the automated testing (see https://github.com/wmo-im/wis2box/blob/main/.github/workflows/tests-docker.yml ) can you please provide a sample discovery-metadata for the test
  • can you please update "chn.babj" to use the new center-id and remove country_code

As the wis2box is a reference implementation for WIS2 the data-examples should align with the WIS2-topic-hierachy

@maaikelimper
Copy link
Collaborator

Feedback from Anna to correct topic-hierarchy in example replace:
weather.prediction.forecast.shortrange.probabilistic.global.CMA_GEPS
with:
weather.prediction.forecast.short-range.probabilistic.global

See:
https://github.com/wmo-im/tt-nwpmd/blob/main/weather/prediction/type/prediction-system/prediction-system.csv

Copy link
Collaborator

@tomkralidis tomkralidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes in addition to @maaikelimper's comments.

@@ -0,0 +1,55 @@
from datetime import datetime
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add source code header (see example)

@alimand
Copy link
Collaborator Author

alimand commented Jan 11, 2024

Hi Maaike and Tom,
@maaikelimper @tomkralidis Well received.
Based on the comments and the advice we will make changes (after internal test) as soon as possible and pull a request again, thank you!

@alimand alimand force-pushed the 607-review-request-to-integrate-universal-pipeline branch from 569af06 to 69397bb Compare January 15, 2024 11:41
@alimand
Copy link
Collaborator Author

alimand commented Jan 17, 2024

Hi Maaike and Tom, @maaikelimper @tomkralidis Well received. Based on the comments and the advice we will make changes (after internal test) as soon as possible and pull a request again, thank you!

@tomkralidis @maaikelimper
Hi Maaike/Tom,
Based on the comments we made an update, could you please review again?
For center id && 3 country code merging: we decided to use "cn-cma-babj", and we will send you e-mail after we officially update it.
Thank you so much for your help.

@maaikelimper
Copy link
Collaborator

Thanks @alimand for the updates ! I've added an additional commit to your PR to add documentation for the new plugin and and included a new step in our automated testing to run the universal-plugin using your data-samples.

I've removed the additional .md you included. Once the code is included in the next release any user will be able to use this as a built-in plugin for the wis2box.

@maaikelimper
Copy link
Collaborator

@tomkralidis for final review

@tomkralidis
Copy link
Collaborator

Great work everyone!

@alimand is the GRAPES data 15 day forecast? Please confirm, thanks.

@tomkralidis
Copy link
Collaborator

Discussion with @alimand confirms test data is 10 day forecast (medium range).

@tomkralidis tomkralidis merged commit dafcecc into main Jan 22, 2024
3 checks passed
@tomkralidis tomkralidis deleted the 607-review-request-to-integrate-universal-pipeline branch January 22, 2024 03:37
@tomkralidis tomkralidis mentioned this pull request Jan 22, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data transformation Data transformation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

review request to integrate universal pipeline
3 participants