Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should trajectories be temporally indexed based on patrol start or end time? #429

Open
cisaacstern opened this issue Nov 13, 2024 · 4 comments
Assignees

Comments

@cisaacstern
Copy link
Collaborator

cisaacstern commented Nov 13, 2024

On app.dev.ecoscope.io, I have a patrols workflow with the following config:

{'workflow_details': {'name': 'CS - Patrols Views Keys Set Comparison - Workflows v0.0.20',
   'description': "it's the charm",
   'image_url': ''},
  'er_client_name': {'EcoscopeWorkflowsNamedConnection': 'mep-dev-jake'},
  'groupers': {'groupers': [{'index_name': 'month', 'directive': '%B'}]},
  'time_range': {'since': '2015-01-01T05:00:00Z',
   'until': '2015-04-01T04:00:00Z'},
  'patrol_obs': {'patrol_type': ['05ad114e-1aff-4602-bc83-efd333cdd8a2']},
  'patrol_traj': {'max_length_meters': 10000.0,
   'max_time_secs': 3600.0,
   'max_speed_kmhr': 120.0},
  'patrol_events': {'patrol_type': ['05ad114e-1aff-4602-bc83-efd333cdd8a2']},
  'patrol_events_bar_chart': {'time_interval': 'month'}}

Note that the time range includes the first 4 hours of April

  'time_range': {'since': '2015-01-01T05:00:00Z',
   'until': '2015-04-01T04:00:00Z'},

I have found that, running against the latest workflows v0.0.21 release, this workflow fails the following assert

assert set(list(gw.views)) == set(
keys_sample
), "All grouped widgets must have the same keys"

because there are events recorded during these first few hours of April, but there are no patrols which started during those four hours, resulting in an April group being created for events data, but none for trajectories data, because for trajectories, we are temporally indexing based on the start time of the patrol, i.e. "extra__patrol_start_time":

- name: Add temporal index to Patrol Trajectories
id: traj_add_temporal_index
task: add_temporal_index
partial:
df: ${{ workflow.patrol_traj.return }}
time_col: "extra__patrol_start_time"
groupers: ${{ workflow.groupers.return }}

Put another way, by the time we reach the above-linked assert, the events data has these groups:

{(('month', '=', 'February'),), (('month', '=', 'March'),), (('month', '=', 'January'),), (('month', '=', 'April'),)}

but the trajectories data only has these groups (because the patrols that extend into April are being grouped with March, the month they started in):

{(('month', '=', 'February'),), (('month', '=', 'March'),), (('month', '=', 'January'),)}

Confirming this diagnosis of the failure, using https://github.com/wildlife-dynamics/compose/pull/195 I monkey-patched a change here

traj_add_temporal_index = (
add_temporal_index.validate()
.partial(
df=patrol_traj,
time_col="extra__patrol_start_time",

like so

 traj_add_temporal_index = ( 
     add_temporal_index.validate() 
     .partial( 
         df=patrol_traj, 
-        time_col="extra__patrol_start_time", 
+        time_col="extra__patrol_end_time", 

Upon re-running the same (previously failing config) with that change, the workflow succeeded. This is because both event and trajectory data now had the following same groups

{(('month', '=', 'February'),), (('month', '=', 'March'),), (('month', '=', 'January'),), (('month', '=', 'April'),)}
@cisaacstern
Copy link
Collaborator Author

So this leads me to wonder, per the title of this issue, should trajectories be temporally indexed based on patrol start or end time?

Are there other adverse groupings that could occur if we default to "extra__patrol_end_time"; that is, will we just be creating different problems that way?

Or rather than changing this behavior do, we need to just handle mismatched grouper sets by displaying blank (or placeholder) widgets in the dashboard in the case of mismatched grouper sets?

@cisaacstern
Copy link
Collaborator Author

On further thought I believe this could also be solved by ensuring that time ranges are always whole unit intervals of the groupers.

@cisaacstern cisaacstern self-assigned this Nov 14, 2024
@cisaacstern
Copy link
Collaborator Author

Per discussion with @walljcg, this problem may be resolved by grouping patrol trajectories based on segment_start (rather than extra__patrol_start_time).

@atmorling
Copy link
Contributor

On a call with @walljcg just now we did a rerun.sh test of a patrol config that runs into this issue and I tried monkey patching the segment_start change into the run. Sadly/Hilariously this resulted in an even larger number of "extra" groups.
Pretty confident this is a more holistic issue than just how we resolve traj segments (as your notes above have hinted)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants