Exploration for Storing Submission Data #165

dan-tang-ssd · 2025-02-05T10:56:51Z

@alex-thomson222 - Hi Alex, hope you get well soon.

We are exploring possible approaches for storing submission data.
We would like to know you comments on how do you want to retrieve submission data in R script program.

Approach 1: Current way for storing submission data in a "long" format table
Approach 2: Possible way for storing submission data in a "wide" format table.

Previous discussion can be found in issue #164 FYI.

Approach 1: Current way for storing submission data in a "long" format table

When we retrieve a ODK submission:

ODK form structure are stored in entities table
Each ODK form data item and data value is stored as one entity_values record

For example, we will have below records for a submission with a main survey section and two repeat groups.

entities records for storing ODK form structures

main survey
repeat group A
repeat group B

entity_values records for storing ODK data items and values

main survey, item 1
main survey, item 2
repeat group A, item 1
repeat group A, item 2
repeat group A, item 3
repeat group B, item 1
repeat group B, item 2
repeat group B, item 3
repeat group B, item 4

Each data item is stored as a entity_values record, the table is in a "long" format.

When we do data export from entities and entity_values tables, we need to run one query from entity_values table for each single data item.

main survey, 2 queries
repeat group A, 3 queries
repeat group B, 4 queries

For this approach:

No program change required in PHP program
R script program needs to re-assemble each data item and data value by one query (i.e. 4 queries for 4 data items)
R script will create entity_values records for storing the calculated indicator values

Approach 2: Possible way for storing submission data in a "wide" format table.

Dave and I are exploring potential option for storing and retrieving submission data in a more efficient way.

When we retrieve a ODK submission:

ODK form structure are stored in entities table
Each dataset for ODK form data items and data is stored as one entity_values record

For example, we will have below records for a submission with a main survey section and two repeat groups.

entities records for storing ODK form structures

main survey
repeat group A
repeat group B

entity_values records for storing dataset for ODK data items and values

main survey, JSON column for storing item 1 and 2
repeat group A, JSON column for storing item 1, 2, and 3
repeat group B, JSON column for storing item 1, 2, 3 and 4

For this approach:

We will need to update database table structures, submission retrieval program and data export program in PHP program
R script program can re-assemble one dataset with data items and data values by one query (i.e. 1 query for 4 data items)
R script will create entity_values records for storing the calculated indicator values

alex-thomson222 · 2025-02-05T11:15:33Z

Hi @dan-tang-ssd ,

I think will probably need a call at some point just to check I am understanding and picturing these structures correctly.

But just on the basis of reading your descriptions I think approach 1 sounds simpler from an R perspective. At least if I am understanding correctly than a data item is an entity, e.g. a single farm, a single crop within a farm etc. In which case this would align with the existing structure of 1 row per farm, 1 row per crop grown etc. Which is the preferable format I would need for joining tables together so I can match IDs to codes or bring required variables up or down a level if needed for a calculation.

I'm not sure if I am picturing approach 2 correctly, so I may need a visual aid to understand how exactly this is storing the data items. But from what I think you mean, I think this approach would be extra steps in R to unpack the JSONs and probably pivot to a longer format anyway to be able to join onto other tables. So in R this might be more steps to get back to what approach 1 is already doing for me.

dan-tang-ssd · 2025-02-05T11:47:13Z

Hi @alex-thomson222 - Thanks a lot for your reply and comment.

Yes, a picture approach would be very useful.
Please find below illustrations for databae records for visual aid.

I do agree with you.
For approach 1, R script can just get what it needs for indicator calculations.
For approahc 2, R script will need extra step to handle JSON data. (um.... maybe we can only get the required data names and data values from JSON columns...?)

alex-thomson222 · 2025-02-05T11:55:37Z

Thanks that helps me understand more clearly, I think still stand with what I said before that approach 2 introduces extra complexities where as approach 1 is pretty simple and accommodating to working with the data in R.

dan-tang-ssd assigned alex-thomson222, dan-tang-ssd and dave-mills Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploration for Storing Submission Data #165

Exploration for Storing Submission Data #165

dan-tang-ssd commented Feb 5, 2025

alex-thomson222 commented Feb 5, 2025

dan-tang-ssd commented Feb 5, 2025

alex-thomson222 commented Feb 5, 2025

Exploration for Storing Submission Data #165

Exploration for Storing Submission Data #165

Comments

dan-tang-ssd commented Feb 5, 2025

alex-thomson222 commented Feb 5, 2025

dan-tang-ssd commented Feb 5, 2025

alex-thomson222 commented Feb 5, 2025