Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default query in get_data #33

Open
AldoCompagnoni opened this issue Mar 16, 2017 · 2 comments
Open

Change default query in get_data #33

AldoCompagnoni opened this issue Mar 16, 2017 · 2 comments

Comments

@AldoCompagnoni
Copy link
Contributor

The default query of get_data sometimes returns sparse data frames. I suggest that, as default, we should:

  1. Always include sppcode (until at least we translate sppcode to genus/species)
  2. Always include lat_study_site/lng_study_site
  3. Convert -99999 values to NA
  4. Do not return treatment/structure/spatial replication columns if they contain only NAs
  5. covariates in last instead of penultimate column?
  6. columns with spat_rep LABEL in the downloaded data set (this idea came up over a month ago, but I never implemented it).
@AldoCompagnoni
Copy link
Contributor Author

Progress made so far:

  1. Included sppcode, lat_study_site, and lng_study_site in default queries.
  2. Function substitutes -99999 with NAs - but only in numeric columns
  3. Function removes columns that contain only "NA"
  4. Output of get data now includes the label of spatial replicates (e.g. spatial_replication_level_1_label)

@AldoCompagnoni
Copy link
Contributor Author

AldoCompagnoni commented Apr 4, 2017

New tasks before we close this issues:

  1. If substitution of -99999 with NAs as fast as it could be? My concern is that the code I use works with only a subset of the output data frame (e.g., the code is: output_data[,col_repl] <- as.data.frame(lapply(output_data[,col_repl], function(x){replace(x, x == -99999,NA)}))
  2. Find a new column name for the label of spatial replicates. I fear that labels such as spatial_replication_level_1_label are will look annoyingly long for the average user.

Moreover, I have moved author and authors_contact as first two lines of the data frames returned by get_data. The rationale is that in so doing, author information is prominent, but it's not "in the way" of the actual data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant