Link R scripts for data processing #55

dave-mills · 2024-12-02T11:59:15Z

First step:

Calculated indicators for Agroecology module are ready; need to add database link in R to get data from database and save results back to database.

Automate process - run R scripts when new submissions come in. (or on schedule?)
Include calculated indicators in data export.

dave-mills · 2024-12-02T14:48:44Z

If the scripts are going to be run on a schedule, I would also like the option for a user to manually trigger the scripts from the front-end. It might be easier for the scripts to be run as soon as a new submission comes in, or perhaps on a delay (e.g. 10 minutes after new submissions are received), to account for the likely event that many submissions are received in a batch.

alex-thomson222 · 2024-12-02T15:38:30Z

@dan-tang-ssd

Just a couple things to check database wise to help with the R scripts and data based on previous projects though you may have already corrected for them.

That the binary columns for the options of multiple select questions are available in the database
That 0s are not being stored or exported as blanks as this was an issue on TAPE

dan-tang-ssd · 2024-12-02T16:15:47Z

@alex-thomson222

@dan-tang-ssd

Just a couple things to check database wise to help with the R scripts and data based on previous projects though you may have already corrected for them.
* That the binary columns for the options of multiple select questions are available in the database

Dan: We do not have any database table created for main survey and repeat groups yet. The table structure can be tailor made for our requirements.

* That 0s are not being stored or exported as blanks as this was an issue on TAPE

Dan: I will keep it in my mind and ensure this issue will not happen here.

alex-thomson222 · 2024-12-03T12:36:36Z

I sent an email to you both detailing the first draft of the agroecology_scores script which can be found at

https://github.com/stats4sd/holpa-r-scripts

alex-thomson222 · 2024-12-04T15:59:43Z

I have the script for key performance indicators maybe 2/3rds complete currently, all of the simple ones are coded so I have begun work on the more complex calculations.

@dave-mills you may have seen my emails to Andrea + Sarah about a couple of the indicators where script and protocol differ or i think there may be errors in their code

alex-thomson222 · 2024-12-18T13:46:34Z

tested what I can of the agroecology scores script will need the following to test the rest;

The "products" table populated
variables following income_count to be restored as they are currently blank
"livestock_count" and "fish_count" to be added to the data structure

dave-mills · 2024-12-18T14:12:41Z

@alex-thomson222 thanks for the review and feedback. We've looked through the issues and Dan's coming up with some fixes. The big one is because these submissions are using forms with the unclosed income group, so the form structure isn't what the platform expects - we've put in a fix for that and now Dan's working through the other issues you've highlighted.

I expect you'll be able to test again tomorrow with a refreshed set of data tables. Tomorrow I want to do a run through of the whole system on the staging site, so we'll be able to test with properly formatted submission data at that point.

alex-thomson222 · 2024-12-18T14:29:25Z

No problem, just let me know when there is a new copy of the database on dropbox i can use.

Are the fixed versions of the form going to be uploaded to the same project on ODK Central so I can complete some submissions that should have complete data, especially on those variables converting to hectares or kilograms?

dan-tang-ssd · 2024-12-18T14:43:21Z

Thanks a lot Alex.

Some updates:

farm_survey_data table, columns after "income_count" populated. (Thank you Dave for help)
farm_survey_data table, added columns "livestock_count" and "fish_count", they are populated
products table, it is now populated. I realised that ODK variable names not matched in submission content and Data Structure excel file. I updated column names to fix it.

Latest database with retrieved submissions is in below Dropbox folder:
\SSD Dropbox\Stats4SD Projects\HOLPA - 2024\database\20241218 Database with Data Retrieved\newer_version

I will fix "farm_survey_data_id is empty" issue by today.

alex-thomson222 · 2024-12-18T14:57:08Z

With the income issue, I think almost have it as while everything after the income group has now been populated, the questions that are meant to be in that group i.e. subsidy to farm_loss_text are still empty. The indicator codes relating to these questions are fairly simple so I can keep working on other parts before needing this fixed.

For the products table, I think this isn't quite what was in mind for this table. I believe the intention was to combine the other_product_use_sales repeat group with crop_use, livestock_use, fish_use, tree_use and honey_use groups of questions which are not repeats but structured similarly to create a product level table. With those variables currently not in the data anywhere

alex-thomson222 · 2024-12-18T15:08:13Z

Just a couple things missing on the permanent workers table to match with the structure of the seasonal_workers, though i assume this was on the "to do" list as only a few rows were populated

The number of workers which would be the merging of perm_labour_group_n_workers (Household) and perm_labourer_numbers (hired externally) - down the line I should harmonise these 2 variable names though to make it clearer apologies
populating the binary for household_members

edit - the seasonal worker table is missing seasonal_labour_months_count

alex-thomson222 · 2024-12-18T16:14:31Z

Done a test on everything I can for the performance indicators and have fixed obvious errors I made. Will now go back to the agroecology scores to check what i couldn't before.

To clarify, i am first testing to make sure code runs successfully, without getting stuck except where existing data table issues would need correcting first. Once i am happy with that i will check more closely that the final numbers are as expected. I have done a little bit of this where it is quite simple to quickly compare but will go back more thoroughly on the more complex instances.

alex-thomson222 · 2024-12-18T16:59:32Z

@dan-tang-ssd @dave-mills

Finished checking now what I can on the agroecology scores script and have rewritten my code for the connectivity and fairness scores as they relate to the products table so the code should now accommodate that structure. this can be tested when the products table is completed

dan-tang-ssd · 2024-12-18T17:32:05Z

For the products table, I think this isn't quite what was in mind for this table. I believe the intention was to combine the other_product_use_sales repeat group with crop_use, livestock_use, fish_use, tree_use and honey_use groups of questions which are not repeats but structured similarly to create a product level table. With those variables currently not in the data anywhere

@dave-mills - Um... I am a bit confused... Would you clarify where can I get data for products table?

dave-mills · 2024-12-19T08:39:35Z

This was the part we discussed where the structure is flat in the form, but really it's a different data level, In the 'farm_characteristics' section - the user is asked 'what did you produce on your farm?' with a select-multiple and then a repeat group to enter any number of other products.

Then there is a flat section of questions about each product:

Produced Crops
Livestock
Fish
Trees
Honey
Other (this is a repeat group that repeats over any 'other' products entered.

Each of these sections has basically the same set of questions, (with some minor variation; there are a couple of 'trees-only' questions - see the 'notes' column in your screenshot), so we discussed the option of turning it into a new data level with Alex.

So: we get the data for this data level from those questions. There should be 1 row per selected item in farm_products, and one row per repeat entry in other_product_use_sales.

For crops, livestock, fish, trees and honey, the product_id and product_name should be taken from the choice_list, i.e.:

id	name
crops	Crops (including perennial crops)
livestock	Livestock
fish	Fish
trees	Trees (e.g., for wood, bark, rubber)
honey	Honey

And the rest of the data comes from the correct section. e.g. crops from:

crop_produce_note
crop_hh_consumption
crop_livestock_consumption
crop_sales
crop_gifts
crop_waster
crop_other_use
crop_other_use_specify
crop_use
crop_sales_buyers
crop_buyer
crop_buyer_other
crop_fair_price

and so on.

dan-tang-ssd · 2024-12-19T11:39:55Z

@dave-mills - Thank you Dave for your clarification. This is really helpful.

Questions:

For crops, livestock, fish, trees and honey, the product_id and product_name should be taken from the choice_list.

I can find product_id, but where can I find product_name...?

There is no product_name in submission content.
I also tried to find it from database table choice_list_entries but failed...

Screen shots:

dave-mills · 2024-12-19T11:56:19Z

I can find product_id, but where can I find product_name...?

In general, the names/labels for choice list entries are in LanguageStrings, as they are 'translatable'. But I don't think we need to go that far yet, as these products aren't changing any time soon. So we can use the names as they are in the ODK form right now:

id	name
crops	Crops (including perennial crops)
livestock	Livestock
fish	Fish
trees	Trees (e.g., for wood, bark, rubber)
honey	Honey

dave-mills assigned alex-thomson222 and dan-tang-ssd Dec 2, 2024

dave-mills added this to the All features ready for testing milestone Dec 2, 2024

dave-mills self-assigned this Dec 16, 2024

dan-tang-ssd mentioned this issue Dec 20, 2024

Call R Script #125

Closed

4 tasks

dave-mills removed this from the All features ready for testing milestone Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link R scripts for data processing #55

Link R scripts for data processing #55

dave-mills commented Dec 2, 2024

dave-mills commented Dec 2, 2024

alex-thomson222 commented Dec 2, 2024

dan-tang-ssd commented Dec 2, 2024

alex-thomson222 commented Dec 3, 2024

alex-thomson222 commented Dec 4, 2024

alex-thomson222 commented Dec 18, 2024

dave-mills commented Dec 18, 2024

alex-thomson222 commented Dec 18, 2024

dan-tang-ssd commented Dec 18, 2024

alex-thomson222 commented Dec 18, 2024

alex-thomson222 commented Dec 18, 2024 •

edited

Loading

alex-thomson222 commented Dec 18, 2024

alex-thomson222 commented Dec 18, 2024

dan-tang-ssd commented Dec 18, 2024

dave-mills commented Dec 19, 2024

dan-tang-ssd commented Dec 19, 2024

dave-mills commented Dec 19, 2024

Link R scripts for data processing #55

Link R scripts for data processing #55

Comments

dave-mills commented Dec 2, 2024

dave-mills commented Dec 2, 2024

alex-thomson222 commented Dec 2, 2024

dan-tang-ssd commented Dec 2, 2024

alex-thomson222 commented Dec 3, 2024

alex-thomson222 commented Dec 4, 2024

alex-thomson222 commented Dec 18, 2024

dave-mills commented Dec 18, 2024

alex-thomson222 commented Dec 18, 2024

dan-tang-ssd commented Dec 18, 2024

alex-thomson222 commented Dec 18, 2024

alex-thomson222 commented Dec 18, 2024 • edited Loading

alex-thomson222 commented Dec 18, 2024

alex-thomson222 commented Dec 18, 2024

dan-tang-ssd commented Dec 18, 2024

dave-mills commented Dec 19, 2024

dan-tang-ssd commented Dec 19, 2024

dave-mills commented Dec 19, 2024

alex-thomson222 commented Dec 18, 2024 •

edited

Loading