You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are exploring possible approaches for storing submission data.
We would like to know you comments on how do you want to retrieve submission data in R script program.
Approach 1: Current way for storing submission data in a "long" format table
Approach 2: Possible way for storing submission data in a "wide" format table.
Previous discussion can be found in issue #164 FYI.
Approach 1: Current way for storing submission data in a "long" format table
When we retrieve a ODK submission:
ODK form structure are stored in entities table
Each ODK form data item and data value is stored as one entity_values record
For example, we will have below records for a submission with a main survey section and two repeat groups.
entities records for storing ODK form structures
main survey
repeat group A
repeat group B
entity_values records for storing ODK data items and values
main survey, item 1
main survey, item 2
repeat group A, item 1
repeat group A, item 2
repeat group A, item 3
repeat group B, item 1
repeat group B, item 2
repeat group B, item 3
repeat group B, item 4
Each data item is stored as a entity_values record, the table is in a "long" format.
When we do data export from entities and entity_values tables, we need to run one query from entity_values table for each single data item.
main survey, 2 queries
repeat group A, 3 queries
repeat group B, 4 queries
For this approach:
No program change required in PHP program
R script program needs to re-assemble each data item and data value by one query (i.e. 4 queries for 4 data items)
R script will create entity_values records for storing the calculated indicator values
Approach 2: Possible way for storing submission data in a "wide" format table.
Dave and I are exploring potential option for storing and retrieving submission data in a more efficient way.
When we retrieve a ODK submission:
ODK form structure are stored in entities table
Each dataset for ODK form data items and data is stored as one entity_values record
For example, we will have below records for a submission with a main survey section and two repeat groups.
entities records for storing ODK form structures
main survey
repeat group A
repeat group B
entity_values records for storing dataset for ODK data items and values
main survey, JSON column for storing item 1 and 2
repeat group A, JSON column for storing item 1, 2, and 3
repeat group B, JSON column for storing item 1, 2, 3 and 4
For this approach:
We will need to update database table structures, submission retrieval program and data export program in PHP program
R script program can re-assemble one dataset with data items and data values by one query (i.e. 1 query for 4 data items)
R script will create entity_values records for storing the calculated indicator values
The text was updated successfully, but these errors were encountered:
I think will probably need a call at some point just to check I am understanding and picturing these structures correctly.
But just on the basis of reading your descriptions I think approach 1 sounds simpler from an R perspective. At least if I am understanding correctly than a data item is an entity, e.g. a single farm, a single crop within a farm etc. In which case this would align with the existing structure of 1 row per farm, 1 row per crop grown etc. Which is the preferable format I would need for joining tables together so I can match IDs to codes or bring required variables up or down a level if needed for a calculation.
I'm not sure if I am picturing approach 2 correctly, so I may need a visual aid to understand how exactly this is storing the data items. But from what I think you mean, I think this approach would be extra steps in R to unpack the JSONs and probably pivot to a longer format anyway to be able to join onto other tables. So in R this might be more steps to get back to what approach 1 is already doing for me.
Yes, a picture approach would be very useful.
Please find below illustrations for databae records for visual aid.
I do agree with you.
For approach 1, R script can just get what it needs for indicator calculations.
For approahc 2, R script will need extra step to handle JSON data. (um.... maybe we can only get the required data names and data values from JSON columns...?)
Thanks that helps me understand more clearly, I think still stand with what I said before that approach 2 introduces extra complexities where as approach 1 is pretty simple and accommodating to working with the data in R.
@alex-thomson222 - Hi Alex, hope you get well soon.
We are exploring possible approaches for storing submission data.
We would like to know you comments on how do you want to retrieve submission data in R script program.
Approach 1: Current way for storing submission data in a "long" format table
Approach 2: Possible way for storing submission data in a "wide" format table.
Previous discussion can be found in issue #164 FYI.
Approach 1: Current way for storing submission data in a "long" format table
When we retrieve a ODK submission:
For example, we will have below records for a submission with a main survey section and two repeat groups.
entities records for storing ODK form structures
entity_values records for storing ODK data items and values
Each data item is stored as a entity_values record, the table is in a "long" format.
When we do data export from entities and entity_values tables, we need to run one query from entity_values table for each single data item.
For this approach:
Approach 2: Possible way for storing submission data in a "wide" format table.
Dave and I are exploring potential option for storing and retrieving submission data in a more efficient way.
When we retrieve a ODK submission:
For example, we will have below records for a submission with a main survey section and two repeat groups.
entities records for storing ODK form structures
entity_values records for storing dataset for ODK data items and values
For this approach:
The text was updated successfully, but these errors were encountered: