-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String parsing and expression evaluation in json_map reader #252
Open
mmpsi
wants to merge
4
commits into
FAIRmat-NFDI:master
Choose a base branch
from
mmpsi:json_map_parsing
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -17,9 +17,13 @@ | |||||
# | ||||||
"""An example reader implementation for the DataConverter.""" | ||||||
from typing import Tuple, Any | ||||||
import datetime | ||||||
import dateutil.parser | ||||||
import dateutil.tz | ||||||
import json | ||||||
import pickle | ||||||
import numpy as np | ||||||
import re | ||||||
import xarray | ||||||
from mergedeep import merge | ||||||
|
||||||
|
@@ -152,6 +156,129 @@ def get_map_from_partials(partials, template, data): | |||||
return mapping | ||||||
|
||||||
|
||||||
def parse_strings(mapping, data): | ||||||
""" | ||||||
Parse strings, notably date and time, from custom format | ||||||
|
||||||
The function can do the following operations, in the given order, on string data. | ||||||
The result of each operation is passed on as input of the next one. | ||||||
|
||||||
1. Extract element from array by index. | ||||||
2. Match a regular expression. | ||||||
3. Parse date and time using the datetime or dateutil parser. | ||||||
|
||||||
The resulting string replaces the mapped value (dictionary) in the mapping dictionary. | ||||||
If date parsing is enabled, the resulting string is ISO-formatted as required by the Nexus standard. | ||||||
The operations are selected and tuned by the following dictionary items: | ||||||
|
||||||
"parse_string": (required) Data path of the string (array) like for regular datasets. | ||||||
If this item is missing, string parsing is skipped altogether. | ||||||
"index": (optional) Element index to extract from string array. | ||||||
The original data must be a string array. | ||||||
If this option is not specified, the original data must be a singular string. | ||||||
"regexp": (optional) Match regular expression, keeping only the matching part. | ||||||
If the expression contains groups, the result will be a space-delimited concatenation of the matching groups. | ||||||
If the expression does not contain explicit groups, the whole match is used. | ||||||
"datetime": (optional) Format string for datetime.datetime.strptime function. | ||||||
The "datetime" and "dateutil" options are mutually exclusive. | ||||||
"dateutil": (optional) Date ordering for the dateutil.parser.parse function. | ||||||
Possible values 'YMD', 'MDY', 'DMY' (or lower case). | ||||||
The dateutil parsers recognizes many date and time formats, but may need the order of year, month and day. | ||||||
The "datetime" and "dateutil" options are mutually exclusive. | ||||||
"timestamp": (optional) Interpret the data item as POSIX timestamp. | ||||||
"timezone": (optional) Specify the time zone if the date-time string does not include a UTC offset. | ||||||
The time zone must be in a dateutil-supported format, e.g. "Europe/Berlin". | ||||||
By default, the local time zone is used. | ||||||
""" | ||||||
|
||||||
for key in mapping: | ||||||
parse_opts = mapping[key] | ||||||
|
||||||
try: | ||||||
value = parse_opts["parse_string"] | ||||||
if is_path(value): | ||||||
value = get_val_nested_keystring_from_dict(value[1:], data) | ||||||
except (KeyError, TypeError): | ||||||
continue | ||||||
|
||||||
if "index" in parse_opts: | ||||||
value = value[int(parse_opts["index"])] | ||||||
|
||||||
if "regexp" in parse_opts: | ||||||
match = re.match(parse_opts["regexp"], value) | ||||||
groups = match.groups('') | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just for consistency you can do this:
Suggested change
|
||||||
if groups: | ||||||
value = " ".join(match.groups("")) | ||||||
else: | ||||||
value = match.group(0) | ||||||
|
||||||
if "timezone" in parse_opts: | ||||||
tz = dateutil.tz.gettz(parse_opts["timezone"]) | ||||||
else: | ||||||
tz = dateutil.tz.gettz() | ||||||
|
||||||
if "datetime" in parse_opts: | ||||||
dt = datetime.datetime.strptime(value, parse_opts["datetime"]) | ||||||
if dt.tzinfo is None: | ||||||
dt = dt.replace(tzinfo=tz) | ||||||
value = dt.isoformat() | ||||||
elif "dateutil" in parse_opts: | ||||||
order = parse_opts["dateutil"].lower() | ||||||
y = order.index("y") | ||||||
m = order.index("m") | ||||||
d = order.index("d") | ||||||
dt = dateutil.parser.parse(value, yearfirst=y < m, dayfirst=d < m) | ||||||
if dt.tzinfo is None: | ||||||
dt = dt.replace(tzinfo=tz) | ||||||
value = dt.isoformat() | ||||||
elif "timestamp" in parse_opts: | ||||||
dt = datetime.datetime.fromtimestamp(float(value), tz=tz) | ||||||
value = dt.isoformat() | ||||||
|
||||||
mapping[key] = value | ||||||
|
||||||
|
||||||
def eval_expressions(mapping, data): | ||||||
""" | ||||||
Evaluate Python expressions in mapping. | ||||||
|
||||||
If a mapping entry contains a dictionary with a `eval` key, | ||||||
the `eval` expression is evaluated using the Python built-in `eval`. | ||||||
The expression can use built-in functions, numpy functions in namespace `np`, | ||||||
and argXxx variables that are defined in the mapping and can refer to dataset paths. | ||||||
|
||||||
The result of the expression replaces the value of the mapping. | ||||||
|
||||||
:param mapping: Mapping dictionary | ||||||
:param data: Data dictionary | ||||||
:return: None | ||||||
""" | ||||||
|
||||||
for key in mapping: | ||||||
eval_args = mapping[key] | ||||||
|
||||||
try: | ||||||
expression = eval_args["eval"] | ||||||
except (KeyError, TypeError): | ||||||
continue | ||||||
|
||||||
args = {} | ||||||
for arg, value in eval_args.items(): | ||||||
if arg[0:3] == "arg": | ||||||
if is_path(value): | ||||||
value = get_val_nested_keystring_from_dict(value[1:], data) | ||||||
else: | ||||||
try: | ||||||
value = float(value) | ||||||
except TypeError: | ||||||
pass | ||||||
|
||||||
args[arg] = value | ||||||
|
||||||
value = eval(expression, {"np": np}, args) | ||||||
mapping[key] = value | ||||||
|
||||||
|
||||||
class JsonMapReader(BaseReader): | ||||||
"""A reader that takes a mapping json file and a data file/object to return a template.""" | ||||||
|
||||||
|
@@ -217,6 +344,8 @@ def read( | |||||
) | ||||||
|
||||||
new_template = Template() | ||||||
parse_strings(mapping, data) | ||||||
eval_expressions(mapping, data) | ||||||
convert_shapes_to_slice_objects(mapping) | ||||||
|
||||||
fill_documented(new_template, mapping, template, data) | ||||||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the cases here will result in an exception and hit continue. It will be better to replace this try except block with an if statement. It should perform better in this case.