Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use DNA/Culture-specific views for Chicago Beach Lab Data #332

Open
levyj opened this issue Aug 21, 2017 · 12 comments
Open

Use DNA/Culture-specific views for Chicago Beach Lab Data #332

levyj opened this issue Aug 21, 2017 · 12 comments

Comments

@levyj
Copy link

levyj commented Aug 21, 2017

As noted in Chicago/opengrid#309, http://plenar.io/explore/event/beach_lab_data has not updated properly since 2016 due to our reconfiguration of the underlying dataset to include both Culture and DNA tests, meaning that no one date field is consistently populated.

It probably would make sense to separate the Plenario dataset into separate datasets for these two test types, although DNA seems like the higher priority since there are no culture tests this year. We have created filtered views that can be used as source "datasets":

https://data.cityofchicago.org/Parks-Recreation/Beach-Lab-Data-DNA-Tests/hmqm-anjq
https://data.cityofchicago.org/Parks-Recreation/Beach-Lab-Data-Culture-Tests/hh4t-tnq8

CC: @tomschenkjr @nicklucius

@vforgione
Copy link
Member

Hey @levyj, is there something on our end we need to do to get these data sets straightened out? If you have a login for the system, you can register them. If not, we can work with you to get them set up and get the data ingested.

@levyj
Copy link
Author

levyj commented Oct 2, 2017

@vforgione - Sorry for the delay in replying. How about if we did the following?

  1. Revise http://plenar.io/explore/event/beach_lab_data to be called something like Beach Lab Data - Culture Tests and feed it from https://data.cityofchicago.org/Parks-Recreation/Beach-Lab-Data-Culture-Tests/hh4t-tnq8.

  2. Create Beach Lab Data - DNA Tests and feed it from https://data.cityofchicago.org/Parks-Recreation/Beach-Lab-Data-DNA-Tests/hmqm-anjq.

I certainly can register Number 2 but we probably should make sure Number 1 seems acceptable first.

Thanks.

CC: @tomschenkjr @nicklucius

@vforgione
Copy link
Member

I've updated the existing data set and added a new one. The DNA data set is coming back just fine via the API, but the culture data is still funky. I looked at the exported CSV and there are a lot of null values for the location field and many of the timestamps are being parsed incorrectly in the database.

@levyj
Copy link
Author

levyj commented Oct 3, 2017

Thanks. There is no way to filter out bad records on your end, is there? If not, how about if I set up a separate, hidden filtered view that excludes problematic records? It would be papering over problems instead of really solving them but that may be acceptable in this case. Data ending in 2016 will not be of huge interest, anyway.

@vforgione
Copy link
Member

As it stands, there really isn't a good way. We've tried several tweaks to the ETL process and it inevitably breaks something else (hence the work on a new platform).

A view that filters those out would be the better option right now.

@levyj
Copy link
Author

levyj commented Oct 3, 2017

@vforgione
Copy link
Member

That worked better. I have a match on the number of rows, and can directly query it in the db. The API is still reporting no data and I'm getting nothing in my error report. I'm gonna have to dig deeper on this.

@vforgione
Copy link
Member

@HeyZoos, I added a new table for the culture tests (dataset name is beach_lab_data_culture_tests) and I'm still getting nothing. Can you look at the API code sometime tomorrow?

@HeyZoos
Copy link
Collaborator

HeyZoos commented Oct 5, 2017

Hey guys, sorry it took me so long to get on the issue. I was moving and so was out of action for a good while.

Issuing these two queries seems to yield data:

http://plenar.io/v1/api/detail?dataset_name=beach_lab_data_dna_tests&obs_date__ge=2010&limit=3000

grabs 2935 records of beach lab dna (including 2017 data) and:

http://plenar.io/v1/api/detail?dataset_name=beach_lab_data_culture_tests&obs_date__ge=2015

grabs 629 records of beach lab cultures

It could be that if you guys had issued queries before the ETL completed, an empty result was cached that takes a half hour to refresh if I remember correctly.

@HeyZoos
Copy link
Collaborator

HeyZoos commented Oct 5, 2017

Are you able to see the dna and culture data @levyj?

@levyj
Copy link
Author

levyj commented Oct 5, 2017

I can see data but I am not sure it is all the data. Also, we seem to have two versions of the Culture dataset listed in Plenario.

image

@vforgione
Copy link
Member

I just removed the old data set. I'm not sure about the information reported in the explorer app, but going through the API and checking the database we definitely have and are producing all the data ingested from the view you provided us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants