Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: template parser answer list support #2211

Merged
merged 21 commits into from
Apr 18, 2024

Conversation

chrismclarke
Copy link
Member

@chrismclarke chrismclarke commented Feb 22, 2024

PR Checklist

  • PR title descriptive (can be used in release notes)

TODO

Description

  • Update parser to parse answer_lists (previously would happen at runtime)
  • Add spec tests for template parser
  • Improve methods for row name auto-generation
  • Update parser cache version (will force reprocess all sheets next sync)

Author Notes

Been a while since taking a closer look at the template parser. A few things noted while working on the PR:

Notes

  • answer_lists need to be defined as local variables with name including _list in them somewhere. Not sure if this was previously documented anywhere, I haven't changed the handling just noted that it is the case. See more info about special row name handling in questions section below

  • Following the change all content repos should see git diffs as answer_lists populated fully (not just string partial representation)

  • One limitation of parsing answer lists in the compiler is harder to determine whether a named list is actually a list or a reference to something else. E.g. {name: "my_answer_list", value: "@local.some_other_answer_list"}. In order to handle this case I've added a check explicitly for ; when specifying template lists. That may break content if a sheet defined a list with only one value and forgot to include, e.g. answer_list local variable value defined as:

"name: value_1 | text: my text"

as opposed to

 "name: value_1 | text: my_text; name: value_2 | text: my text 2"
  • There is currently a default to handle rows without a name which is just setting the name as the row type. This vaguely works if there is just one text or image type row in a template and nothing needs to modify or refer back, however felt quite fragile in general. I've added a minor update so that by default we now also add a suffix to the name derived from the row number being processed, e.g. if the sheet has unnamed text on row 8 the generated name should be text_8. Is this something useful, or would it be better to remove entirely and replace with some quality-control check to ensure all rows have an author-defined name.

  • With the changes above there may be small content diffs when syncing repos if any do contain non-named rows

Questions

  • It appears that parameter lists only ever support simple key-value pairs during parsing (e.g. answer_list: @local.some_list), with additional support for key-only properties (e.g. input_allowed would be implied as input_allowed: 'true'). Do we have any runtime cases of more complex use cases that we might want to reconsider evaluating in the parser? E.g. parameter lists where the value isn't just a simple string but more complex object?

  • There seems support template_group row type, is this something we still use? (don't have strong memory of it)

  • We're currently passing the exclude_from_translation property explicitly with every row entry. Given that this info isn't ever used at runtime it might be worth considering exporting this separately (e.g. a single json file of excluded translations). Possibly a small follow-up issue (might be best to coordinate with translation workflow)

  • Templates have special handling for any variables named _list (convert to array) or _collection (convert to json key-value pairs). This is on top of specific handling for columns ending _list (e.g. condition_list), _collection, _date, or specifically action_list.
    Is this something documented somewhere? Do we know if things like local name and/or column name _collection handling is still used?

Dev Notes

I've left the runtime answer list item parse method in the code for now for any cases where the parsing may not have been done correctly in advance (I don't think it should happen but probably better to be safe for now.. maybe if read in from a global or cell within a data_list (as opposed to full data list row)). I've also added an explicit call to sentry to detect if the code is executed to track whether we can safely remove in the future

I've done my best to add spec tests to cover all the cases covered explicitly within the template parser methods in their current state. I'm not sure if it might make sense to further try to refine/improve as follow-up (e.g. remove template_group handling which I think must be deprecated by now). Spec tests can be run via:

yarn workspace scripts test

The code also includes a placeholder method to run specific QC checks on templates, however I stopped short of adding as the PR was already growing quickly. Might be worth reviewing in the future possible checks worth having (e.g. I had planned to include one to ensure all parameter lists that included answer lists pointed at local variables with correct _list in name)

Git Issues

Closes #2210

Screenshots/Videos

Tests added - see full test details in spec file
image

@chrismclarke chrismclarke marked this pull request as ready for review February 22, 2024 19:44
@chrismclarke chrismclarke changed the title Refactor/answer list parser Refactor: template parser answer list support Feb 22, 2024
Copy link
Collaborator

@esmeetewinkel esmeetewinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answer_lists need to be defined as local variables with name including _list in them somewhere.

It seems that the local variable needs to end in _list, containing it as a substring is not enough: A variable answer_list_1 is not parsed differently, but answer_1_list is. (template question of plh_kids_tz deployment)

image
image

There is currently a default to handle rows without a name which is just setting the name as the row type. This vaguely works if there is just one text or image type row in a template and nothing needs to modify or refer back, however felt quite fragile in general. I've added a minor update so that by default we now also add a suffix to the name derived from the row number being processed, e.g. if the sheet has unnamed text on row 8 the generated name should be text_8. Is this something useful, or would it be better to remove entirely and replace with some quality-control check to ensure all rows have an author-defined name.

I like this suggestion in general, makes sense. I did find the following issues with the current code:

  1. This also applies to data_items. I'm understanding that a begin_data_items row "expects" a name column value (populating it with a unique default when missing). This is a bit confusing, since a begin_items row breaks when there is anything in the name column. (template comp_data_items)
    image

  2. When rows sit nested in a display group the proposed default row names are numbered within the scope of the display group, which doesn't make sense to me. When referring to these rows in a parent template the display group is invisible (i.e. as opposed to begin_nested_template there is no begin_nested_display_group) so we need the default row names to be unique within the scope of the template, I'd say. (template feature_display_group)
    image

    I don't see file changes like this in any regular template with a begin_items row, but it does show up for a generated template that has a begin_items row (sheet example_generator_temp generating example_generator_output_w_1)
    image

There seems support template_group row type, is this something we still use? (don't have strong memory of it)

Doesn't ring a bell to me... So definitely we don't use it anymore, not sure we ever did.

@chrismclarke
Copy link
Member Author

chrismclarke commented Mar 15, 2024

It seems that the local variable needs to end in _list, containing it as a substring is not enough: A variable answer_list_1 is not parsed differently, but answer_1_list is.

@esmeetewinkel
Thanks for looking through all these cases

You're right, I can see the default parser only converts special fields (here meaning sheet columns) depending on their name ending

  if (typeof value === "string") {
          if (field.endsWith("_list")) {
            this.row[field] = parseAppDataListString(value);
          }
          if (field.endsWith("_collection")) {
            this.row[field] = parseAppDataCollectionString(this.row[field]);
          }
          if (field.endsWith("action_list")) {
            this.row[field] = this.row[field]
              .map((actionString) => parseAppDataActionString(actionString))
              .filter((action) => action !== null);
          }
        }
        // convert google/excel number dates to dates (https://stackoverflow.com/questions/16229494/converting-excel-date-serial-number-to-date-using-javascript)
        if (field.endsWith("_date")) {
          if (typeof this.row[field] === "number") {
            this.row[field] = parseAppDateValue(this.row[field]);
          }
        }

It is then only at a later stage that the template parser handles based on row names, and supports the text included anywhere in the name

 if (row.name?.includes("_list")) {
        row.value = this.parseTemplateList(row.value);
      }
      if (row.name?.includes("_collection") && row.value && typeof row.value === "string") {
        row.value = parseAppDataCollectionString(row.value);
      }

So that means right now for any data_list the column must end in _list to be processed as expected, and for any template that defines inline then the row must contain _list somewhere in the name.

Do you think this distinction makes sense/is sensible, or better to also allow data_lists to convert if the column has _list anywhere in the name? I don't think it was considered to carefully when first implemented so open to suggestion

@chrismclarke
Copy link
Member Author

  1. This also applies to data_items. I'm understanding that a begin_data_items row "expects" a name column value (populating it with a unique default when missing). This is a bit confusing, since a begin_items row breaks when there is anything in the name column.

I'm not 100% sure if it breaks or not, because I think there's still fixes applied in #2220 that would need merge to work properly.
So it might be worth retesting after - I actually think assigning a name to data items should be a requirement because otherwise it would be possible to have 2 sets of data items with children that have duplicate name references, but can see post #2220

I did pick up one extra issue while testing on the debug repo though related to navigation-bar component which I've pushed a fix for

@chrismclarke
Copy link
Member Author

chrismclarke commented Mar 15, 2024

2. When rows sit nested in a display group the proposed default row names are numbered within the scope of the display group, which doesn't make sense to me. When referring to these rows in a parent template the display group is invisible (i.e. as opposed to begin_nested_template there is no begin_nested_display_group) so we need the default row names to be unique within the scope of the template, I'd say.

Yeah it's a bit of a tricky one. I (think) I can understand what you mean - when referring to the rows from within the same template they can be identified independent of the display group container (this is the case and what you mean, right??) - and so makes sense enforcing uniqueness across the template . The challenge here is that when it comes to the rendering process they are still rendered as child rows of the display group container (which handles the row layout logic).

So I think the anomaly here is being able to refer to the rows directly, say nested_text_1, instead of as my_display_group.nested_text_1. I think it probably wouldn't be very easy to adapt the case for display groups without diving deeper into the overall templating system, so might just be a limitation/bug/feature to be aware of for now...

I'm not sure whether it would be worth the time or not to create an example sheet with the sole purpose of reviewing generated row names or not... It's tricky added all these cases to the test specs as depends somewhat on different levels of processing (e.g. items and data_items apply certain manipulations at runtime so they can iterate over dynamic data while display_groups are state and managed more at compile time)

@esmeetewinkel
Copy link
Collaborator

So that means right now for any data_list the column must end in _list to be processed as expected, and for any template that defines inline then the row must contain _list somewhere in the name.

Do you think this distinction makes sense/is sensible, or better to also allow data_lists to convert if the column has _list anywhere in the name? I don't think it was considered to carefully when first implemented so open to suggestion

Okay, thanks for checking and clarifying. I can see reasons for either convention (it could be handy for list_1, list_2, etc still to be recognised as lists, but it would be annoying if listenor specialist would be recognised as lists). I'd definitely be in favour of handling the same way for sheet columns and template rows.

@chrismclarke
Copy link
Member Author

chrismclarke commented Apr 12, 2024

One issue I encountered: it seems that bumping the cacheVersion of the flowParser is not enough to prompt all sheets to be re-parsed on sync, only the handful of sheets that had new author edits were re-parsed for me. I tried bumping the number myself, which didn't seem to have an effect. What did work was bumping the cacheVersion number as specified in the base processor. After that, syncing makes changes to over a hundred parsed sheets, as expected. I believe this is because the cacheVersion property is used in the BaseProcessor constructor, so the derived class's property does not override it, see this discussion.

Thanks for catching, I've pushed 0089892 which should now force the inheritance correctly by passing to the constructor. Let me know if now working as expected.

Once happy this should be good to merge, I accidentally re-requested review from @esmeetewinkel however I don't think there should be any additional changes to look at

Copy link
Collaborator

@jfmcquade jfmcquade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching, I've pushed 0089892 which should now force the inheritance correctly by passing to the constructor. Let me know if now working as expected.

Can confirm that the cache versioning seems to function as expected, so should be good to merge.

@jfmcquade jfmcquade merged commit d208bed into master Apr 18, 2024
10 checks passed
@jfmcquade jfmcquade deleted the refactor/answer-list-parser branch April 18, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test - preview Create a preview deployment of the PR test - visual Run visual regression tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Process answer lists within the parser
3 participants