user requirements related to "self data" #364

PatGendre · 2019-04-05T08:48:08Z

for the record i put here these requirements related to "self data"
They can be linked to current work on new architecture with built-in encryption for addressing privacy.
See https://github.com/njriasan/e-mission-docs/blob/master/docs/future_work/NewArchitecture.md
and issue #330

the user should be able to download his/her data easily (this is currently the case)
he/she should also be able to delete part of his/her data
there should be very clear user consent terms (I hope we can find some) but obviously this depends on each particular application
going further on the same point, e-mission is a framework that can be reused to build several applications ; I believe its design shall be enough generic so that it can accomodate a large variety of use case, from pure self data app to authorising aggregate studies, and including to crowdsourcing and data sharing apps
there should be a clear separation between the individual data and the aggregate data ;- in my views, the aggregate data should be in a separate data base, and provided to the app with a clearly seperate API ; even if data is not encrypted and that no technical ; in some ways, we can consider that the aggregate data functions are another module of the e-mission "suite", but it should ne on a par with third party modules, conceptually distinct from the "core e-mission" functionalities
as expressed in this architecture page, I wish the user could refuse that his/her data be aggregated and analysed along with others. The "core" functionality shall be self mobility data and control over sharing it.
one criticism over the new architecture is that it is in a way "closed", because the application should use a certain technology (wether it be Graphene or another). In the overview, it is said "the server", as if there was one only server. I believe there are many use cases (including crowdsourcing like in opentraffic or Posmo) that do not imply a sophisticated and secure encryption mechanism and that could be developed possibly without impacting a lot the current architecture. But as you said, this is research, and it is for the longer term, and potentially could also meet the requirements I express.

shankari · 2019-04-08T14:53:59Z

Couple of quick comments for particular requirements

he/she should also be able to delete part of his/her data
@jf87 is also interested in this because of the GDPR requirements. The main challenge is reconciling this with the current assumption that all input data is read-only, so all results are reproducible for all time. The read-only assumption is actually fairly standard for analysis based on datasets.

@jf87 have you seen ML work that addresses relaxing this requirement? I will also do a quick search and see if I can find something. I think that there has been some work on detecting changes are recomputing only related results. I have opened #366 for the more detailed discussion.

shankari · 2019-04-08T14:56:51Z

there should be very clear user consent terms (I hope we can find some) but obviously this depends on each particular application

Yes, the expectation is that every project can have its own consent terms; the consent terms go into the intro/consent.html. The standard e-mission consent terms can be an example (https://e-mission.eecs.berkeley.edu/consent); if you can indicate what is unclear about them, we can modify that and check in a boilerplate consent into the docs that projects can re-use

@ipsita0012 had some thoughts from their deployment.

shankari · 2019-04-08T15:02:14Z

going further on the same point, e-mission is a framework that can be reused to build several applications ; I believe its design shall be enough generic so that it can accomodate a large variety of use case, from pure self data app to authorising aggregate studies, and including to crowdsourcing and data sharing apps

This is definitely the goal and the framework has been used for travel surveys, behavior change modification and crowdsourcing (hopefully launching today). Do you have concrete suggestions on how to make it more generic? As the use cases submit their changes, the need for a plugin-based architecture for both the phone and the server has become increasingly clear. Is that what you had in mind?

shankari · 2019-04-08T16:29:58Z

there should be a clear separation between the individual data and the aggregate data ;- in my views, the aggregate data should be in a separate data base, and provided to the app with a clearly seperate API ; even if data is not encrypted and that no technical ; in some ways, we can consider that the aggregate data functions are another module of the e-mission "suite", but it should ne on a par with third party modules, conceptually distinct from the "core e-mission" functionalities

This is already true at a conceptual level.

The recommended way to access e-mission data is through the timeseries interface emission.storage.timeseries.abstract_timeseries interface (https://github.com/e-mission/e-mission-server/blob/master/emission/storage/timeseries/abstract_timeseries.py), NOT directly through the database[1].

The timeseries interface has two options - you can get data for an individual user (get_time_series(user_id)) or for the aggregate (get_aggregate_time_series()). Algorithms that work on aggregate data should use the aggregate timeseries, algorithms that work on a single user should use the regular time series.

As long as all code follows these conceptual guidelines, the rest of it is implementation detail. Although I don't think that having a separate aggregate DB is flexible enough (which ranges would you use for the stored aggregations?), you could certainly cache some results in a separate aggregate database as long as everybody followed the abstractions. And of course, we could choose to switch to a dedicated timeseries database, or give people a choice of databases (e.g. use embedded SQLite for lighter-weight deployments) in the future.

The abstractions are what is important. The database is implementation. People should not box themselves into a corner by using the implementation.

[1] I have no idea why every project just wants to access the database directly instead of using the recommended methods in the Timeseries_sample; suggestions for clarifying this are welcome.

shankari · 2019-04-08T16:30:24Z

as expressed in this architecture page, I wish the user could refuse that his/her data be aggregated and analysed along with others. The "core" functionality shall be self mobility data and control over sharing it.

Yup, will be in the new architecture!

shankari · 2019-04-08T16:39:55Z

one criticism over the new architecture is that it is in a way "closed", because the application should use a certain technology (wether it be Graphene or another). In the overview, it is said "the server", as if there was one only server. I believe there are many use cases (including crowdsourcing like in opentraffic or Posmo) that do not imply a sophisticated and secure encryption mechanism and that could be developed possibly without impacting a lot the current architecture. But as you said, this is research, and it is for the longer term, and potentially could also meet the requirements I express.

The initial implementation of the architecture will use docker without graphene. But using docker without graphene has serious limitations if the cloud provider is compromised, so secure execution is really the long-term goal and solution.

I am sure it is possible to implement an ad-hoc crowdsourcing solution for certain specific kinds of analysis (e.g. only automobile speeds as in OpenTraffic) but that is not very interesting to me, because it does not fit my overall vision of longitudinal collection of end to end data across all modes. It seems like you would not even need to store data in that case. You can theoretically compute average speed directly on the phone and send it to a server, but you would need to ensure that sequences of speeds from the same user cannot be correlated. There's been prior work done on that; IIRC vPriv https://people.eecs.berkeley.edu/~raluca/vpriv.pdf uses that model. I am not aware of an open source implementation, though. I don't think any of the work out of Hari's lab at MIT is open source.

shankari · 2019-04-08T18:48:07Z

Also, wrt crowdsourcing, this article made a big splash when it came out
https://www.technologyreview.com/s/523346/how-to-track-vehicles-using-speed-data-alone/

PatGendre · 2019-04-09T08:53:08Z

Thanks for your remarks!

handle deleting data Handle deleting data #366 : for the moment I cannot think of further inputs, sorry, but the requirement (at least to delete all data and sign off) is real
the intro/consent.html seems clear to me, but as you say later on, the consent terms vary from one application to the other (e.g. I believe that for some or even many use cases, the user shall have the choice not to authorise sharing data to researchers, and to use the app for its own sake as a personal tool)

the framework has been used for travel surveys, behavior change modification and crowdsourcing (hopefully launching today). Do you have concrete suggestions on how to make it more generic? As the use cases submit their changes, the need for a plugin-based architecture for both the phone and the server has become increasingly clear. Is that what you had in mind?

yes, exactly. As for suggestions : it was the idea discussed after of separating more markedly individual and collective/aggregate data as two distinct applications

the aggregate data functions should be conceptually distinct from the "core e-mission" functionalities : This is already true at a conceptual level.

thanks for the explanations! I agree the conceptual level is essential. You're right, I am not competent enough to assert that there should be 2 databases. However, I believe the separation should be also made clear at the implementation (developer point of view : guidelines / sdk; there is "human" tendency to access directly the db, if it is possible; maybe have 2 separate timeseries interface instead of 2 options would be clearer? I should think more about it in order to be able to make concrete suggestions) and application level (user point of view; the aggregate functions should "look" different or be in a different app)

it is possible to implement an ad-hoc crowdsourcing solution for certain specific kinds of analysis (e.g. only automobile speeds as in OpenTraffic) but that is not very interesting to me, because it does not fit my overall vision of longitudinal collection of end to end data across all modes

definitely, open traffic would not even need a server pipeline, I agree this is not what e-mission has been designed for. The point was that if the basic e-mission functionalities could also implement the and that we found a "blockbuster" crowdsourcing usage, this could help a lot finding resources for building more advanced features. But definitely this won't be easy to meet the demand for massive crowdsourcing, it was just a reflection.

PatGendre · 2019-04-09T08:54:56Z

there are also documents in French, but this document in English from Uk about personal data might interest you:
https://www.digicatapult.org.uk/news-and-views/publication/pdr-report/

shankari mentioned this issue Apr 8, 2019

Handle deleting data #366

Open

shankari added the enhancement New feature or request label Apr 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user requirements related to "self data" #364

user requirements related to "self data" #364

PatGendre commented Apr 5, 2019

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019 •

edited

Loading

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019

PatGendre commented Apr 9, 2019

PatGendre commented Apr 9, 2019

user requirements related to "self data" #364

user requirements related to "self data" #364

Comments

PatGendre commented Apr 5, 2019

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019 • edited Loading

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019

shankari commented Apr 8, 2019

PatGendre commented Apr 9, 2019

PatGendre commented Apr 9, 2019

shankari commented Apr 8, 2019 •

edited

Loading