-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time-Use Tab (Issue) #840
Comments
At this point, we have finished the end-to-end trip level changes on the phone (e-mission/e-mission-phone#917) We need to start making place level changes. One high level design question is whether we should represent place additions in the I have three main thoughts around this design:
|
Do we anticipate that future projects will need place information? To what extent - and how much detail? If we are prepared to have a new However, I am a little lost on when/why we may want to provide a geojson object for a place. Isn't this 'heavier' with more detail? I thought we were moving away from this - what use case might this be useful? (Maybe eventually a details page / show stats for places?) To my understanding the label screen currently benefits from being relatively lightweight. My only high-level concerns are that we could suffer a network performance hit when places are shown in detail. |
@shankari Jack and I decided that for this change we would rather go with the first option for implemeting the places into the server:
The reason we chose this, is because it will be very straighforward to add Because This would require us in to include cpList, and then we would need to have an additional function in
|
@shankari I'll also add that if you have a reason why we should go with the other option:
We are not adverse to going this route, but we believed that it seems too intense of an implementation to merit this small change for "places". We discussed that if you have larger plans for the "Place"s object, then this may be a better way of doing it but for now and the reasons listed above we would rather go with the other way. |
We've been working on e-mission/OpenPATH for several years now and this is the first time we have displayed the place. However, this is also the first time that we are able to support several different use cases for deployers - in the past, our answer was that "open path is open source" so deployers should write their own clients if they needed. Having said that, having the place separate opens up the opportunity for a completely different, place-oriented UI. This would show a map for the place, and just a line indicating "travel" by "mode" to the next place. The old Moves app UI used to look like that.
wrt the "combined" option, we are not constrained by the long-term data model, so we could copy the start and end place information into each trip, for example. The day geojson is heavier than the raw trip/place objects, but primarily due to the embedded trajectories. (you should actually measure this - retrieve the geojson for a day, and see where the size comes from)
If we are looking for the best design, we should theoretically pick the one that is best regardless of time and then plan the steps to get there. The steps may be sideways and may involve a temporary implementation initially as long as we have the final design clear and can work towards it when possible. Having said that, and thinking through this in greater detail while writing this down, I think that there is enough ambiguity in the requirements for displaying the place object that I am fine with your proposal. But we should keep the other option in mind, and if this gets too clunky as we expand the diary, we should re-evaluate switching to the other implementation. I will add the |
for the record
There is a place-like object for every trip-like object. Note that we (aka OpenPATH) create both place and trip objects, they are not user inputs. If there is not, we should fix the pipeline. The one caveat is that the trip-like object may be untracked time (section 5.3.1 of my thesis), so you should really retrieve that as well in the list and stitch together the place-like and trip-like objects. Note that I have code to do that on the server - the starting point is here: |
A concrete example of clunkiness is that we have to reimplement both the trip -> geojson conversion and the timeline creation code on the phone. This is already implemented in the server, so if we returned the combined object, we would not have to maintain two copies of the same codebase. |
Although maybe the argument is that after the label <-> diary unification, maybe we can remove the geojson formatting code from the server. So the entire presentation layer will be on the phone. @sebastianbarry @JGreenlee I hope this gives you an example of how to reason about which design is more principled. We also don't want to make decisions solely based on what is easiest to implement - that how we end up with "ball of wax" code. And even as we choose a decision, we want to keep in mind that we might switch to the other one as we make additional changes, and write code that is easy to switch in the future. |
End to end integration testing:
There are three confirmed trips
while matching the inputs
the matching entry has a write_ts of 2023-01-18T22:29:50.252379-08:00 and a start_ts of 2016-08-11T14:05:09.694147-07:00 And then
|
ok so this is pretty bizarre. I create the confirmed trip array just before the append
And I test the function in the unit tests, which currently pass
|
Also when I run the pipeline, I get an error, which did not show up when @JGreenlee ran the pipeline, so I don't think this is the root case for Jack's issue, but it needs to be fixed regardless.
|
Hm, so the issue is that the trip addition is set to
Ok, so now I think that I understand it a bit better. As part of the change to add in the previous labeled entries while creating the confirmed trip, we are setting trip_additions to The fix is fairly trivial, we should also check for |
Matching for new user inputs broke (regression due to e-mission@a7ee02c) The main issue was that we were filling in the `trip_additions` with None but only checking for existence. More details at: e-mission/e-mission-docs#840 (comment) Fixing it in two different ways: - Returning `[]` instead of None from `get_not_deleted_candidates` if there are no filtered candidates This will ensure that we don't have a None value - check for both None and existence. This will ensure that even if we do have a None value, we can recover gracefully Testing done: Original failure: ``` 2023-01-19T09:58:57.198828-08:00**********UUID e20f451e-c619-46c8-a189-54310231bce5: updating incoming user inputs********** Error while matching incoming user inputs, timestamp is unchanged Traceback (most recent call last): File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 19, in match_incoming_user_inputs last_user_input_done = match_incoming_inputs(user_id, time_query) File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 46, in match_incoming_inputs handle_multi_non_deleted_match(confirmed_trip, ui) File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 70, in handle_multi_non_deleted_match confirmed_trip["data"]["trip_addition"].append(ui) AttributeError: 'NoneType' object has no attribute 'append' ``` New failure (updating incoming), no `match_id` ``` 2023-01-19T09:59:26.680014-08:00**********UUID e20f451e-c619-46c8-a189-54310231bce5: updating incoming user inputs********** Error while matching incoming user inputs, timestamp is unchanged Traceback (most recent call last): File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 19, in match_incoming_user_inputs last_user_input_done = match_incoming_inputs(user_id, time_query) File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 46, in match_incoming_inputs handle_multi_non_deleted_match(confirmed_trip, ui) File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 72, in handle_multi_non_deleted_match after_del_list = [ta for ta in confirmed_trip["data"]["trip_addition"] if ta["match_id"] != ui["match_id"]] File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 72, in <listcomp> after_del_list = [ta for ta in confirmed_trip["data"]["trip_addition"] if ta["match_id"] != ui["match_id"]] KeyError: 'match_id' ``` New failure (creating objects), no `match_id` ``` 2023-01-19T14:47:59.883681-08:00**********UUID e20f451e-c619-46c8-a189-54310231bce5: creating confirmed objects ********** Error while creating confirmed objects, timestamp is unchanged Traceback (most recent call last): File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 81, in create_confirmed_objects last_expected_trip_done = create_confirmed_trips(user_id, time_query) File "/Users/kshankar/e-mission/gis_branch_tests/emission/analysis/userinput/matcher.py", line 110, in create_confirmed_trips esdt.get_additions_for_trip_object(ts, tct) File "/Users/kshankar/e-mission/gis_branch_tests/emission/storage/decorations/trip_queries.py", line 191, in get_additions_for_trip_object return get_not_deleted_candidates(valid_user_input(ts, trip_obj), potential_candidates) File "/Users/kshankar/e-mission/gis_branch_tests/emission/storage/decorations/trip_queries.py", line 168, in get_not_deleted_candidates not_deleted_active = [efpc for efpc in all_active_list if efpc["match_id"] not in all_deleted_id] File "/Users/kshankar/e-mission/gis_branch_tests/emission/storage/decorations/trip_queries.py", line 168, in <listcomp> not_deleted_active = [efpc for efpc in all_active_list if efpc["match_id"] not in all_deleted_id] KeyError: 'match_id' ```
Checked in the fix. Had some errors in testing from old-style trip addition inputs as outlined in e-mission/e-mission-server@fcf5a9c, but after I purged and reloaded the user, everything worked correctly. The actions were retained on reload. |
#840 (comment) I think you had preferred (1) because it is easier to implement and were concerned about performance with We could just start with what is easier to implement and then do the performance evaluation later - there's a lot of other performance stuff we can do with caching for example. But while thinking about choosing between (1) and (2), we may want to look at some longer term considerations as well, if they are very close in terms of ease of implementation.
Three high level considerations: (b) You might also want to think about where the code lives and how many copies we have to keep in sync.. So if we got the composite trip, the trip -> geojson conversion could live on the server and we wouldn't have to do the (c) If we do want to cache trips locally for better performance and to work even when there is no network, then again, would one option work better than the other? We can cache the previous request, but we can also periodically push the most recent trips to the phone, at the same time as we are pushing up data from the phone to the server (as part of a "periodic sync"). So if you look at the server code, the phone -> server happens through |
@JGreenlee @shankari
OR
Thoughts:
WE WILL MAKE THIS FINAL DECISION ON FRIDAY MORNING 9:00AM MST |
You should check with @allenmichael099 - I believe he plans to also store some summary of the section into the trip for the "count every trip" calculations. You should work with him on creating a standard representation that works both for "count every trip" and for displaying the sections in the UI
wrt #840 (comment) |
on the label screen: we retrieve data model objects directly and construct the geojson on the phone |
we will have confirmed_place in the data model. We do not want to tie the internal data model to the representation on the phone or the communication between the server and the phone because it does not give us sufficient flexibility to change if we need to in the future, and is bad from a architectural standpoint. Note that we use geojson for the actual UI representation regardless of the options. The geojson can be complex or simple. The label geojson is simple, the diary geojson is complex. what we are deciding now is whether
Lots of options but once we have the data model (confirmed_place), we can swap between these fairly easily |
To test whether confirmed_place has been created properly, you can just reset the pipeline and re-run and see if it is created (assuming you have created everything upto and including the pipeline) |
My approach has been to store the section modes and distances as their own 'data' fields in an "analysis/confirmed_trip" document:'data': {.... , 'section_modes': ['bicycling', 'car'], 'section_distances': [4553.37, 8721.14]}. I added these fields to my copy of the database by fetching all of the inferred sections ("analysis/inferred_section") for each trip. To fetch them, I used I add up mean section energy intensities * section distances for a trip to get the trip level mean energy consumption. I keep the distances in meters currently.When I find the variance of the energy consumption estimate for a trip or set of trips, I find total distances in each mode based on the section distances. Calculating other metrics will probably involve these section modes and distances as well. |
The answer to our first question (Should we use We will go with
|
Can either of @sebastianbarry and @JGreenlee list out what you understood by |
@shankari my understanding is that |
After @shankari @JGreenlee call today, we found out this: Regarding the way that the label screen currently generates Geojson
What about trajectories (the line we display in the map, typically the line object in the geojson)?
Pros to # 1:
Pros to # 2:
How do we decide between #1 and #2? What should be the determining factor?
Question on # 2: Recreated location points already contain information about what section/mode they belong to? If so, do we not need to put that in composite_trip? (as a part of figuring out the multi-modal sections to generate their respective trajectories)
class MotionTypes(enum.Enum): IN_VEHICLE = 0 BICYCLING = 1 ON_FOOT = 2 STILL = 3 UNKNOWN = 4 TILTING = 5 WALKING = 7 RUNNING = 8
|
Here is what we have decided:We will go with # 2, because this is the currently implemented way on the Label screen, and the label screen has some key benefits over the way the Diary screen functions
With this, The composite_trip object will be generated on the fly, meaning we do not need to store composite_trip as it's own data object on the server. We simply will just receive a call on the server for Then, later on in the phone code when we are ready to get the trajectories, we can make an additional asynchronous call to the server to get the raw entries, and then on the phone we will generate the multi-modal trajectories to be displayed on the maps |
@sebastianbarry @JGreenlee I don't think we are ready to make the decision yet. (This is at least the third decision we have made, and this is a flip from the first decision, #840 (comment)). with these pros/cons, there are some assumptions. what are those assumptions and how can we validate them? aka
and even with the pros/cons, are we thinking through all of them? since this is a long-term design that will form the foundation for diary/label unification, I would encourage you to not treat label screen as perfection. It has some drawbacks too. @sebastianbarry I am not sure if you remember the drawbacks from the user interviews that you attended and the email that I forwarded from Jeanne. |
Not sure if this is going to help or confuse you further, but there is some related community discussion here: |
My thoughts in regards to comparing the magnitude of performance between # 1 method of retrieving trip data vs the # 2 method of retrieving trip data: Time comparisonTypically, in my classes (such as Algorithms and Parallel Computing) we can measure the temporal performance of a method by putting some sort of Accuracy comparisonThe other method we would measure in my classes, were how accurate is the function? This can be done by inputting a set of data which you already know the expected output, and comparing the received output with the expected output. From what I know from reading your thesis, e-mission-server will always give the same output with the same input, so I can trust that we know this is not a factor on the server. Where this might come into play, is how the received raw data from the server will be handled on the phone. I am not sure how to test this in regards to OpenPATH, as typically in my classes we have only done accuracy comparisons on input and output sets of integer numbers, not Latitude/Longitude decimal points. Features comparison(?)The only other way I can think of comparing the 2 methods are by qualitatively measuring the features between them. Here is my comparison table:
@shankari |
@sebastianbarry this is on the right track. But first, I am not sure you are measuring the same things: you say that you are comparing #1 vs. #2 from wrt the time option, use the breakpoints and search through the code to find your answer. You have access to the full source code, and you should know how to search through it. wrt the accuracy option, you are right that it is not relevant here. |
Option 1: separate For Option 2: Assumptions in your decision:
This reads all
So this is called for every trip Even just having a bunch of server calls outstanding at the same time can be performance intensive on the phone as well. Complaints were "app is hanging" after trips load, so I do some bottlenecking on the phone.
My intuition is that the slowness in the diary is not just due to trajectory retrieval, but is primarily due to the database calls required to construct the geojson. Note that while constructing the geojson, we first get all the trips, and then we read all the sections for each trip, and then we read all the trajectories for each section. So if you have 5 trips, each with 3 sections, you will end up with 5 + 5 + 15 = 25 calls to the database before you can construct the geojson and return it. If part of the performance issue is the trajectory size, that is easily fixable by including the downsampled trajectory in the composite trip object With option 2, impl 1:
for the core functionality that you have (trip diary/label screen) you want to do as little as possible on the critical path before returning to the user; that is the motivation for option 2, impl 1 vs. option 2, impl 2 To downsample diary calls: Measurement:
|
For Diary Screen (time before promise to .then / time on webserver.logs):
@sebastianbarry you can look along a spectrum of downsampling instead of a binary (edited to clarify) |
FYI, AWS instance to DocumentDB is ~ 3x slower than laptop -> docker on laptop |
There are 3 collections we read from:
all are collections in the same database this is when we start the call to the timeseries/analysis_timeseries
this is when the timeseries/analysis timeseries calls return (purpose_confirm is a user input so it will be in the timeseries and not in the analysis outputs)
This is when the usercache returns
Note that if we haven't yet processed the user inputs (like purpose_confirm) they will be in the usercache, so we do have to check it In the geojson code, where we are working with processed data, we should not have to check the usercache and we don't - for example:
|
using July 25 test data (2 trips) and no network throttling label screen load
Database queries are not taking a significant amount of time for the label screen flow. diary screen load
Similarly to the label screen, the queries for trips, user inputs, and locations do not take much time. ConclusionThe database calls referenced in #840 (comment) do not significantly impact our performance on either the label screen nor the diary screen. Next discussionWould pre-saving sections and stops in a |
@JGreenlee the database calls in #840 (comment) are examples.
To answer your questions:
|
wrt throttling, we are trying to determine what the network does with large object transfers That will let us estimate the serialization/deserialization overhead on a realistic system. If that is small, I would just say that we don't want to worry about the object size right now. We are currently leaning towards (#840 (comment)) option 2, impl 1 |
Following our meeting today, I wanted to share a couple of UI-related comments: 1- We suggest that time-use activities be left-aligned instead of centered. 2- Currently, when tapping the delete button, the activity is instantly deleted. It would be better to have a pop-up confirmation message asking the person if they are sure they want to delete the file. This would prevent accidental deletion of activities. Thanks. |
@sebastianbarry can you deal with these minor UI issues tomorrow so that @JGreenlee can deal with the hanging UI once he returns? |
Closing this since we have merged the related change to master |
Creating an issue to keep track of the time-use PR's
The text was updated successfully, but these errors were encountered: