Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code comments #11

Open
jrising opened this issue Jan 27, 2025 · 1 comment
Open

Code comments #11

jrising opened this issue Jan 27, 2025 · 1 comment

Comments

@jrising
Copy link

jrising commented Jan 27, 2025

openjournals/joss-reviews#7538

Comments on code in R:

  • d$plant <- ifelse(d$Plant.start < d$Plant.end, ((d$Plant.start + d$Plant.end) / 2 / 30), ((d$Plant.start + d$Plant.end + 365) / 2 / 30))
    : Day-of-year divided by 30 isn't a great approximation for month, unless you're working with 360-day calendar data.
  • # Deal with exceptions (countries by states or regions)
    : This would be less error-prone if encoded in a data file.
  • input_file = "crop_calendar.csv"
    : If you intend for users to modify this, provide them a way to use their own filename.
  • pattern = utils::glob2rx("*precip*rfc*"),
    : Why is only the RFC weighted data used?
  • dplyr::left_join(mirca_harvest_area, by = c("lon", "lat")) %>%
    : This assumes a 1-to-1 mapping between region and grid cell. At the country level, this is usually fine, but would not work at higher resolutions.
  • reg$se_robust <- sandwich::vcovHC(reg, type = "HC", weights = iso)
    : The SE should also be clustered, since there will be correlation between error terms in the each country.
mengqi-z added a commit that referenced this issue Feb 14, 2025
For #11 #10

- flexibility to use user provided crop calendar
- flexibility to select crops
- updated to the latest FAO data
- updated with FAO crop to MIRCA2000 crop mapping
- updated default MIRCA2000 crops to GCAM commodity mapping
- make cleaning sage data its own function
@mengqi-z
Copy link
Collaborator

Dear @jrising,

Thanks for reviewing the code! We've made modifications based on your feedback and have also provided more detailed responses to some of your comments:

Day-of-year divided by 30 isn't a great approximation for month, unless you're working with 360-day calendar data.

Thank you for your comment. We use 30 as a divisor because the model operates at a monthly temporal resolution. The final planting and harvesting months are determined by rounding up the computed value using the ceiling function:
d$plant <- ceiling(ifelse(d$plant > 12, d$plant - 12, d$plant)).
Since the output is ultimately mapped to discrete months, small variations in day length (e.g., using 30 vs. 31 days) usually do not affect the final values of planting and harvesting months.

This would be less error-prone if encoded in a data file.

Thank you for your suggestion. We see the benefit of encoding this in the data file to reduce errors. However, our goal is to keep the data in its raw form while providing the processing codes for transparency. If this part were embedded in the data file, it wouldn’t be as clear for users to check the code. Additionally, the processing behaves differently depending on the selected crops. To improve clarity, we’ve cleaned up the crop calendar and moved the exception-handling code into a separate function.

Why is only the RFC weighted data used?

Thank you for your question. The aggregate nature of the yield data at the country level limits our ability to separately analyze the impacts of climate change on irrigated versus rainfed crops. We use rainfed-land-weighted climate data because rainfed crops are directly dependent on precipitation and temperature, making them more sensitive to climate variability than irrigated crops, which have more stable water availability. By incorporating rainfed land-weighted precipitation and temperature, the model better captures yield responses to changing climate conditions in historical periods. To ensure consistency, future climate projections also use rainfed land-weighted climate to project yield shocks.

This assumes a 1-to-1 mapping between region and grid cell. At the country level, this is usually fine, but would not work at higher resolutions.

Thank you for your insightful comment. Gaia is designed to operate at the country level for several reasons. First, it is intended to support global integrated assessment modeling, which typically relies on coarser spatial resolutions, such as country and regional scales. Additionally, the availability of global crop yield data imposes constraints; for example, FAOSTAT provides yield data at the country level, which is essential for gaia's empirical model fitting.
In addition, gaia's design aligns with available data standards. Key datasets, such as MIRCA2000, which provides crop-specific harvest area data at a 0.5-degree resolution, and ISIMIP’s climate data, which follows the same resolution, dictate the model's spatial scale. Given these constraints, gaia's finest resolution is set at 0.5 degrees, which ensures consistency across input datasets and model outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants