-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/add vlmd extract #14
base: master
Are you sure you want to change the base?
Conversation
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
email change looks correct!
heal/vlmd/validate/validate.py
Outdated
@@ -8,7 +8,7 @@ | |||
from heal.vlmd.validate.csv_validator import vlmd_validate_csv | |||
from heal.vlmd.validate.json_validator import vlmd_validate_json | |||
|
|||
logger = get_logger("VALIDATE", log_level="debug") | |||
logger = get_logger("vlmd-validate", log_level="debug") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the log_level specifications should be removed from child level methods. If they were set at a higher level then we would have more control in changing the level, say when invoked by the CLI.
@@ -8,7 +9,7 @@ | |||
from heal.vlmd.config import ALLOWED_SCHEMA_TYPES | |||
from heal.vlmd.validate.utils import get_schema | |||
|
|||
logger = get_logger("JSON_VALIDATOR", log_level="debug") | |||
logger = get_logger("json-validator", log_level="debug") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another example where the log_level in a child method could be removed and controlled from a higher level method.
heal/vlmd/extract/utils.py
Outdated
) | ||
|
||
propnames_exp = "|".join("^" + name + "$" for name in props_to_rearrange) | ||
names_to_rearrange = [name for name in propnames if re.match(propnames_exp, name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to do re.match()
here? I think what the propnames_exp
and re.match()
do here is the same as if name in props_to_rearrange
, or maybe I'm missing something
is_one_unique = len(flat_fields_df[name].map(str).unique()) == 1 | ||
else: | ||
is_one_unique = flat_fields_df[name].nunique() == 1 | ||
if is_one_unique: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_one_unique
might have not be defined at here
if is_one_unique: | ||
flat_record[name] = flat_fields_df.pop(name).iloc[0] | ||
if isinstance(flat_record[name], pd.Series): | ||
flat_record[name] = flat_record[name].to_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wrote this block of logic in the old healdata-utils
code to patch some urgent bugs. But I didn't get to the root of the problem, so it is more like a band-aid fix
What I really hope we can do is to figure out why flat_record[name]
can have so many possible types (pd.DataFrame, pd.Series, etc.). I feel that we might not need to support dealing with that many different types of flat_record[name]
here, if we can fix the logic upstream to make sure the value of flat_record[name]
will only ends up in one type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also create another ticket for what I said above, and put a FIXME
note here
droplist: dict = None, | ||
item_sep: str = "|", | ||
keyval_sep: str = "=", | ||
) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many variable / function names in this file that should be in snake cases but are not. We should change them
|
||
|
||
def convert_templatejson( | ||
jsontemplate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments for snake cases for this file as well
heal/vlmd/extract/utils.py
Outdated
for prop_name in nested_names[:-1]: | ||
if "[" in prop_name: | ||
array_name, array_index = prop_name.split("[") | ||
array_index = int(array_index[:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably needs some error handling if the casting fails here
heal/vlmd/extract/utils.py
Outdated
else: | ||
prop_json[array_name][array_index] = {array_name: prop} | ||
else: | ||
prop_json[last_prop_name] = prop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L195-L210 is very similar to L176-L193 above, except it does something special at the end for the last element. Probably better if written this way to combine the two
nested_names_length = len(nested_names)
for i, prop_name in enumerate(nested_names):
if "[" in prop_name:
...
if i == nested_names_length - 1
// do something for the last element
else:
// do other things for other elements
logger.debug("Validating data") | ||
jsonschema.validate(instance=data, schema=schema) | ||
logger.debug(data) | ||
# jsonschema.validate(instance=data, schema=schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer needed?
JIRA ticket: HP-1727
New Features
Breaking Changes
Bug Fixes
Improvements
Dependency updates
Deployment changes