Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to disable inline representation of tagged field content when writing to ASDF file #1499

Open
ketozhang opened this issue Mar 24, 2023 · 9 comments

Comments

@ketozhang
Copy link

I do not see this in the docs, but currently when writing to ASDF file, all ASDF tags are written inline if it is not nested (i.e., all children are leaf nodes).

Current Behavior:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 2.14.3}

Desired Behavior:

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 
  author: The ASDF Developers
  homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, 
  version: 2.14.3

Can we add an option to disable this and I much prefer this to be the default.

@ketozhang ketozhang changed the title Add option to disable inline representation when writing to ASDF file Add option to disable inline representation of tagged field content when writing to ASDF file Mar 24, 2023
@ketozhang
Copy link
Author

Here's an example what happens if the tagged field has a child field

node: !<asdf://example.com/tags/node-1.0.0>
  child: !<asdf://example.com/tags/node-1.0.0> {child: null}

This looks a bit silly

@WilliamJamieson
Copy link
Contributor

This should be controlled by the flowStyle setting in the schema describing the tag. See https://asdf-standard.readthedocs.io/en/1.5.0/schemas/yaml_schema.html#Schema%20Definitions for details. Not that in your case I believe you need to set flowStyle: block in the schema.

It should work as the Roman asdf files make extensive use of this. For example if one generates and saves a wfi_image, its schema contains flowStyle: block here. A sample header is:

roman: !<asdf://stsci.edu/datamodels/roman/tags/wfi_image-1.0.0>
  meta:
    aperture: !<asdf://stsci.edu/datamodels/roman/tags/aperture-1.0.0>
      name: WFI_06_FULL
      position_angle: 30.149604523080143
    cal_step: !<asdf://stsci.edu/datamodels/roman/tags/cal_step-1.0.0>
      assign_wcs: INCOMPLETE
      flat_field: SKIPPED
      dark: COMPLETE
      dq_init: COMPLETE
      jump: COMPLETE
      linearity: SKIPPED
      photom: COMPLETE
      ramp_fit: COMPLETE
      saturation: INCOMPLETE
    calibration_software_version: !<asdf://stsci.edu/datamodels/roman/tags/calibration_software_version-1.0.0> Version
      94bb92d0a922867c94ed1fddcb4216a5
    coordinates: !<asdf://stsci.edu/datamodels/roman/tags/coordinates-1.0.0>
      reference_frame: ICRS
    crds_context_used: roman_8226.pmap
    crds_software_version: 63.1.67
    ephemeris: !<asdf://stsci.edu/datamodels/roman/tags/ephemeris-1.0.0>
      earth_angle: 1.301915411283766
      moon_angle: 1.1516403400836825
      sun_angle: 2.736255246678112
      type: PREDICTED
      time: 59865.65463414276
      ephemeris_reference_frame: Frame 3706a1f3
      spatial_x: -1.3623950365305825e+308
      spatial_y: -1.60804538172114e+306
      spatial_z: 1.5921914973582028e+307
      velocity_x: 1.3125923623238684e+308
      velocity_y: -1.6632717836035864e+308
      velocity_z: 2.352229275461757e+307

I believe yaml would have normally chosen to represent the part under aperture in the way you describe if the schemas involved did not specify the flowStyle: block.

@ketozhang
Copy link
Author

How would I control this without defining a schema for the tag?

One may want to overwrite all tag's preference after all this is just for the serialization and does not affect deseralization.

@WilliamJamieson
Copy link
Contributor

This is currently not supported outside of setting the style within the tag definition. However, looking at the early history, adjusting the default for serialization was planned but never finished, see:

asdf/asdf/yamlutil.py

Lines 90 to 95 in d33ca09

# TODO: Again, adjust for preferred flow style, and other stylistic details
# NOTE: For block style this uses the compact omap notation, but for flow style
# it does not.
# TODO: Need to see if I can figure out a mechanism so that classes that
# use this representer can specify which values should use flow style

Please feel free to explore how to do this further if you wish and open PR adding the feature if you figure out how to do it.

Now I do have some broad thoughts on how to accomplish this. Note that the flowStyle key is read into this class attribute:

For the two wrapper objects that are used to wrap tagged objects in the ASDF tree. This is then read by the yaml dumper routines:

for the two wrapper types above. Thus if you examine the raw ASDF tree, once its been converted into data by the converters, you should be able to set the flow_style keyword on the tagged objects you wish to effect and then when that tree is passed through to pyyaml for dumping using the AsdfDumper everything should get correctly read and adjusted accordingly.

How and when to access will require more in-depth investigation and experimentation.

@ketozhang
Copy link
Author

Very informative, thank you! I will take a stab it this.

First, ASDF is using whatever default the pyyaml dumper is using. This is fine as-is, but if ASDF has a preference, the change seems simple and this would not ignore the tag definition schema preference.

- flow_style = _flow_style_map.get(mapping.flow_style, None)
+ flow_style = _flow_style_map.get(mapping.flow_style, _flow_style_map['block'])

For adding user overrides of flow style, I'm seeing this would have to come from the configs elsewhere as the function signature for the representers will not support this (unless I'm mistaken and the object type for data/mapper/sequence do contain user configs).

Lastly, what are the interfaces the users can use to set stylistics preferences. Is it just AsdfFile.write_to() and/or config (e.g., asdf.get_config())?

@WilliamJamieson
Copy link
Contributor

This block style is not necessarily ideal, so I disagree with forcing the default to be "block".

Since ASDF uses pyyaml to write the yaml output to the header, it was decided that we would just let pyyaml decide how best to format the yaml rather than making a single decision. pyyaml uses a heuristic to decide when to switch from "flow" style to "block" style, and we felt like this seemed appropriate way to standardize the decision. Moreover, if users of ASDF disagree with this approach they can simply configure their schemas appropriately.

In so far as, enabling user overrides placing the interface in the asdf-config is a fine idea, this can even enable the AsdfFile.write_to() to set the overrides by simply letting it use the config context to temporally modify the configuration for a given write.

@WilliamJamieson
Copy link
Contributor

Note that "inlining" arrays (storing them in the yaml directly, instead of in a binary block) operates on the same principle as the flow_style, except it has a global configuration option which passes through to a write option.

The global setting for this is controlled by the all_array_storage setting in the AsdfConfig:

asdf/asdf/config.py

Lines 325 to 347 in 8dce02c

@property
def all_array_storage(self):
"""
Override the array storage type of all blocks
in the file immediately before writing. Must be one of the
following strings or `None`:
- ``internal``: The default. The array data will be
stored in a binary block in the same ASDF file.
- ``external``: Store the data in a binary block in a
separate ASDF file.
- ``inline``: Store the data as YAML inline in the tree.
"""
return self._all_array_storage
@all_array_storage.setter
def all_array_storage(self, value):
if value not in (None, "internal", "external", "inline"):
msg = f"Invalid value for all_array_storage: '{value}'"
raise ValueError(msg)
self._all_array_storage = value

I believe something like global_flow_style could be done in a similar way.

@WilliamJamieson
Copy link
Contributor

Here is where all_array_storage passes into the writing of the file:

asdf/asdf/asdf.py

Lines 1188 to 1190 in 8dce02c

with config_context() as config:
if "all_array_storage" in kwargs:
config.all_array_storage = kwargs.pop("all_array_storage")

@ketozhang
Copy link
Author

This block style is not necessarily ideal, so I disagree with forcing the default to be "block".
pyyaml uses a heuristic

Okay I won't push further, but I will say it's not much of a heuristic -- it looks to be just a consistency rule:

By default, PyYAML chooses the style of a collection depending on whether it has nested collections. If a collection has nested collections, it will be assigned the block style. Otherwise it will have the flow style.
-- PyYAML


Thank you, the existing config for array serialization styling helps a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants