-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#Feature: Build the complete database #47
Comments
The database file is being built, but the data is not being loaded. There are issues with the schema versus the flat file. Not recognizing the UCR table. Not spent a ton of time due to holidays - will continue this work in sprint 19 Suggestion for mismatched columns: Create a sub dataframe containing exactly the headers in the schema and then load from that sub dataframe. @allfires , please upload screenshot of error output about finding the correct database table. |
@J-glove I no longer have the screenshot of the error, but I was able to determine the issue. I was not initially creating the tables correctly. The solution was to ad this function to the db_session.py script: However, I am calling create_tables() in my test driver script, so I believe I'll need to find a way to have this function called automatically. There are some additional issues that I am currently working through and will post any blockers that arise, especially with loading other exposome data. |
Continuing to work on the upload. Had trouble locating files to upload to db. Chengrong has them. @chengrong-us please upload these preprocessed files to the exposome server and tell Ali where they are located. There is a good deal of updating that needs to be done to scripts to allow the datasets to be uploaded. |
The preprocess_ucr_zip9.csv file is saved in exposome server at /data/exposome_csvs/preprocess_ucr_zip9.csv. |
@allfires is currently having to prioritize 17r1 Characterization ahead of holiday, bringing this into next sprint. |
Not finished yet, working on upload scripts to match the datasource to defined model. Just ucr is present in the db as of 1/8. |
@allfires - Please move the database to the on-prem Exposome server. |
@J-glove I have been working on moving the database to the exposome server, however, I am running into some issues. The biggest issue right now is that I am not able to run the latest version of SQLAlchemy on the server due to the current python version. In order to access libraries for SQLAlchemy that include "DeclarativeBase" and "mapped_column", I need to be using sqlalchemy version2 which is only compatible with python 3.7 or higher. What would you suggest as a fix to this? Should we create a docker container? |
@J-glove I wanted to give you an update on this. I was able to figure out how to use pyenv and install the necessary python version that is compatible with the sqlalchemy dependencies. I was able to get the script running, however, it looks like the headers in the preprocess_ucr.csv file are not following the standard model for zip code and time identifiers, therefore I am not able to load the ucr data. |
@J-glove the model for ucr specifically is this: class ucr(exposome_base): If the preprocess files don't match exactly the model, then the file will not load. Do you want me to change the scripts so that it will handle other columns? or should we change the script for the preprocess to include only columns that will read into the model correctly? Please let me know how to proceed. preprocess_ucr_zip9.csv currently has these headers: ZIP_9,YEAR,pop,murder,fso,robbery,assault,burglary,larceny,mvt,total,p_total,p_murder,p_fso,p_rob,p_assault,p_burglary,p_larceny,p_mvt |
Upload all available preprocessed exposome files into the database. This will likely be a large file, please create it on the on prem exposome server.
Be wary of persistent issues with zip_9 adding .0 on the entries
The text was updated successfully, but these errors were encountered: