-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link R scripts for data processing #55
Comments
If the scripts are going to be run on a schedule, I would also like the option for a user to manually trigger the scripts from the front-end. It might be easier for the scripts to be run as soon as a new submission comes in, or perhaps on a delay (e.g. 10 minutes after new submissions are received), to account for the likely event that many submissions are received in a batch. |
Just a couple things to check database wise to help with the R scripts and data based on previous projects though you may have already corrected for them.
|
Dan: We do not have any database table created for main survey and repeat groups yet. The table structure can be tailor made for our requirements.
Dan: I will keep it in my mind and ensure this issue will not happen here. |
I sent an email to you both detailing the first draft of the agroecology_scores script which can be found at |
I have the script for key performance indicators maybe 2/3rds complete currently, all of the simple ones are coded so I have begun work on the more complex calculations. @dave-mills you may have seen my emails to Andrea + Sarah about a couple of the indicators where script and protocol differ or i think there may be errors in their code |
tested what I can of the agroecology scores script will need the following to test the rest;
|
@alex-thomson222 thanks for the review and feedback. We've looked through the issues and Dan's coming up with some fixes. The big one is because these submissions are using forms with the unclosed I expect you'll be able to test again tomorrow with a refreshed set of data tables. Tomorrow I want to do a run through of the whole system on the staging site, so we'll be able to test with properly formatted submission data at that point. |
No problem, just let me know when there is a new copy of the database on dropbox i can use. Are the fixed versions of the form going to be uploaded to the same project on ODK Central so I can complete some submissions that should have complete data, especially on those variables converting to hectares or kilograms? |
Thanks a lot Alex. Some updates:
Latest database with retrieved submissions is in below Dropbox folder: I will fix "farm_survey_data_id is empty" issue by today. |
With the income issue, I think almost have it as while everything after the income group has now been populated, the questions that are meant to be in that group i.e. subsidy to farm_loss_text are still empty. The indicator codes relating to these questions are fairly simple so I can keep working on other parts before needing this fixed. For the products table, I think this isn't quite what was in mind for this table. I believe the intention was to combine the other_product_use_sales repeat group with crop_use, livestock_use, fish_use, tree_use and honey_use groups of questions which are not repeats but structured similarly to create a product level table. With those variables currently not in the data anywhere |
Just a couple things missing on the permanent workers table to match with the structure of the seasonal_workers, though i assume this was on the "to do" list as only a few rows were populated
edit - the seasonal worker table is missing seasonal_labour_months_count |
Done a test on everything I can for the performance indicators and have fixed obvious errors I made. Will now go back to the agroecology scores to check what i couldn't before. To clarify, i am first testing to make sure code runs successfully, without getting stuck except where existing data table issues would need correcting first. Once i am happy with that i will check more closely that the final numbers are as expected. I have done a little bit of this where it is quite simple to quickly compare but will go back more thoroughly on the more complex instances. |
Finished checking now what I can on the agroecology scores script and have rewritten my code for the connectivity and fairness scores as they relate to the products table so the code should now accommodate that structure. this can be tested when the products table is completed |
@dave-mills - Um... I am a bit confused... Would you clarify where can I get data for products table? |
This was the part we discussed where the structure is flat in the form, but really it's a different data level, In the 'farm_characteristics' section - the user is asked 'what did you produce on your farm?' with a select-multiple and then a repeat group to enter any number of other products. Then there is a flat section of questions about each product:
Each of these sections has basically the same set of questions, (with some minor variation; there are a couple of 'trees-only' questions - see the 'notes' column in your screenshot), so we discussed the option of turning it into a new data level with Alex. So: we get the data for this data level from those questions. There should be 1 row per selected item in For crops, livestock, fish, trees and honey, the product_id and product_name should be taken from the choice_list, i.e.:
And the rest of the data comes from the correct section. e.g. crops from:
and so on. |
@dave-mills - Thank you Dave for your clarification. This is really helpful. Questions:
I can find product_id, but where can I find product_name...?
Screen shots: |
In general, the names/labels for choice list entries are in LanguageStrings, as they are 'translatable'. But I don't think we need to go that far yet, as these products aren't changing any time soon. So we can use the names as they are in the ODK form right now:
|
First step:
Next:
The text was updated successfully, but these errors were encountered: