Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: track the answers to allow_list questions in analytics_logs #2487

Closed
wants to merge 7 commits into from

Conversation

Mike-Heneghan
Copy link
Contributor

@Mike-Heneghan Mike-Heneghan commented Nov 27, 2023

What:

  • Passport variable tracking:

    • Begin tracking the fn or val of a node
    • Store this data against a new column node_fn on the analytics_logs
  • Answers to allow_list question tracking:

    • Defined a new array of questions which are part of the allow_list (i.e. useful for analytics but not sensitive and or PII)
    • On component transition check whether the node fn or val is in the allow list and if so update the log with the user's answer

Why:

  • Currently, node are generally identified in analytics by their node_title and node_type.
  • This is useful information but there may be situations where it's usefully to additionally know their fn or val if they're assigned
  • For example the node_title code change but we could easily continue to track via the passport variable which is unlikely to change
  • Also, some questions in flowd could provide really useful insights to users but also not contain any PII or be invasive to track
  • These questions are harmless to track and hence could be on the allow_list

Screen recoding

Screen.Recording.2023-11-28.at.12.21.48.mov

Note

  • At the moment this only really works as a user goes forward in a flow. Do we need to track a user going backwards?
  • There are analytics_logs records created going backwards so even if a user has completed an answer for proposal.projectType that wouldn't be stored on the analytics_log which was generated on hitting Back
  • Would it make sense to track this when going backwards?

Copy link

github-actions bot commented Nov 27, 2023

🤖 Hasura Change Summary compared a subset of table metadata including permissions:

Updated Tables (1)

  • public.analytics_logs permissions:

    insert select update delete
    public / / /
    3 added column permissions
    insert select update
    public ➕ node_fn ➕ node_fn ➕ allow_list_answers

Copy link

github-actions bot commented Nov 27, 2023

Pizza

Deployed 5e70e0d to https://2487.planx.pizza.

Useful links:

@Mike-Heneghan Mike-Heneghan self-assigned this Nov 28, 2023
@Mike-Heneghan Mike-Heneghan requested a review from a team November 28, 2023 12:43
@Mike-Heneghan Mike-Heneghan marked this pull request as ready for review November 28, 2023 12:43
@Mike-Heneghan
Copy link
Contributor Author

Copy link
Member

@jessicamcinchak jessicamcinchak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution is looking good to me, but I am getting some unexpected results when going through a flow in the analytics_logs table on this pizza that I want to flag:
Screenshot from 2023-11-29 08-44-03

What was happening (db rows here):

  • The first row for proposal.projectType is right
  • But then my answer is incorrectly recorded for two Set Values with un-allow-listed fns of proposal.droppedKerbOnly & proposal.treeWorksOnly (is it possible to make mutation param check stricter to ensure it only writes answers for allow-listed fns?)
  • Then I hit a follow-up, more granular project type question "Select all the works to replace windows or doors" that correctly records answer ["alter.replace.windowsToWindows"]
  • There's also a number of duplicates (maybe writing on component renders or user interactions within the card rather that whole-card transition??) - I realise this may not be introduced here and a wider caveat of analytics tracking / happy for it to be thought about in a separate PR

As for the larger question in your description about currently only working on "forward" and whether we should track on "back":

  • My high-level opinion about this would be that we should try to track allow-list answers as similarly as possible to other interactions like "help click" & "input errors" - so if those are tracked on back, then let's try to track answer changes on back too. Helps keep the "explainability" of analytics methodology more straightforward for external consumers in the long run!

Mike-Heneghan added a commit that referenced this pull request Nov 29, 2023
@Mike-Heneghan Mike-Heneghan force-pushed the mh/track-allowlist-on-analytics-logs branch from 6028404 to 1a06af7 Compare November 29, 2023 12:53
@Mike-Heneghan
Copy link
Contributor Author

Solution is looking good to me, but I am getting some unexpected results when going through a flow in the analytics_logs table on this pizza that I want to flag: Screenshot from 2023-11-29 08-44-03

What was happening (db rows here):

  • The first row for proposal.projectType is right
  • But then my answer is incorrectly recorded for two Set Values with un-allow-listed fns of proposal.droppedKerbOnly & proposal.treeWorksOnly (is it possible to make mutation param check stricter to ensure it only writes answers for allow-listed fns?)
  • Then I hit a follow-up, more granular project type question "Select all the works to replace windows or doors" that correctly records answer ["alter.replace.windowsToWindows"]
  • There's also a number of duplicates (maybe writing on component renders or user interactions within the card rather that whole-card transition??) - I realise this may not be introduced here and a wider caveat of analytics tracking / happy for it to be thought about in a separate PR

As for the larger question in your description about currently only working on "forward" and whether we should track on "back":

  • My high-level opinion about this would be that we should try to track allow-list answers as similarly as possible to other interactions like "help click" & "input errors" - so if those are tracked on back, then let's try to track answer changes on back too. Helps keep the "explainability" of analytics methodology more straightforward for external consumers in the long run!

I'm struggling to replicate the issue 🤔 Looking at the analytics it seems like you were using the find-out-if-you-need-planning-permission flow? Although that doesn't seem to have the questions you encountered.

If the flow you tested with is available would you mind sharing a link to it?

@jessicamcinchak
Copy link
Member

jessicamcinchak commented Nov 29, 2023

@Mike-Heneghan I was using Lambeth's FOI on the pizza (FOI flows outside of active council teams likely don't have maintained content) - I'm almost always the only Linux user in pizza logs 🐧
Screenshot from 2023-11-29 13-58-02

The project type checklist and set values are nested in the permitteddevelopment flow / external portal. The set values will always be hidden from the user journey, but auto-answered in the background based on my project selection - so logging at the same time is expected I suppose (rather than a duplication), but recording the answer is not expected if the fn for that row does not match nor is allow-listed itself if that makes sense ?

@Mike-Heneghan
Copy link
Contributor Author

@Mike-Heneghan I was using Lambeth's FOI on the pizza (FOI flows outside of active council teams likely don't have maintained content) - I'm almost always the only Linux user in pizza logs 🐧 Screenshot from 2023-11-29 13-58-02

The project type checklist and set values are nested in the permitteddevelopment flow / external portal. The set values will always be hidden from the user journey, but auto-answered in the background based on my project selection - so logging at the same time is expected I suppose (rather than a duplication), but recording the answer is not expected if the fn for that row does not match nor is allow-listed itself if that makes sense ?

Thanks @jessicamcinchak! That's really helpful, I imagine the issue is that I wasn't testing auto-answer questions which will completed really quickly and are more susceptible to this issue! I'll use the lambeth foi to have something a bit more representative to the real behaviour.

@Mike-Heneghan Mike-Heneghan marked this pull request as draft November 29, 2023 14:05
- The passport variable of a node is on either the `fn` or `val` optional attribute
- It can be useful going forward for this data to be stored
- Node title might change but the passport variable might not allowing us to better compare trends if rewording for example
- Added a new column to the analytics_log of node_fn to store this value
analytics_logs

- Add new column on the analytics_logs to store allow_list_answers in an
array
- On component transition check if the previous node was an allow list
question
- If allow list then update the previous log with the answer
- This relies on previousCard function which is computationally
expensive
- This can cause itiming ssues where allowList answer of previous node
stored in a log
- The function previousCard is computationally expensive
- Rather than repeatedly call previousCard instead store what the node was in state on component transistion
- Resolves: #2487 (review)
- If there's a timing issue avoid storing sensitive data by verifying db node_fn in allowlist
- When finding answers remove null values
- Note: doesn't resolve edge case that subsequent questions with node_fn may be updated with prevous questions answer
@Mike-Heneghan Mike-Heneghan force-pushed the mh/track-allowlist-on-analytics-logs branch from 1a06af7 to 80512c9 Compare December 13, 2023 12:31
const { data } = flow[nodeId];
const nodeFn = data?.fn || data?.val;
if (nodeFn && allowList.includes(nodeFn)) {
const answerIds = breadcrumbs[nodeId]?.answers;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason this doesn't work going backwards is that the current approach has an assumption that we always want to update the analytics log for the previous node with the answer which is stored in the current breadcrumbs.

When we move backwards through a flow the breadcrumbs are removed and the answers for the question go with them.

We could track what the saved answers were when a user loads a node they've already answered in the analytics logs.

Although I'd be tempted not to and only track allow_list answers when a user explicitly hits continue. I think this might be more consistent with the broad approach to user's answers?

@Mike-Heneghan Mike-Heneghan deleted the mh/track-allowlist-on-analytics-logs branch March 14, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants