#Feature: Build the complete database #47

J-glove · 2024-11-20T15:37:37Z

Upload all available preprocessed exposome files into the database. This will likely be a large file, please create it on the on prem exposome server.

Be wary of persistent issues with zip_9 adding .0 on the entries

J-glove · 2024-12-04T16:19:49Z

The database file is being built, but the data is not being loaded. There are issues with the schema versus the flat file.

Not recognizing the UCR table.

Not spent a ton of time due to holidays - will continue this work in sprint 19

Suggestion for mismatched columns: Create a sub dataframe containing exactly the headers in the schema and then load from that sub dataframe.

@allfires , please upload screenshot of error output about finding the correct database table.

allfires · 2024-12-04T19:56:27Z

@J-glove I no longer have the screenshot of the error, but I was able to determine the issue. I was not initially creating the tables correctly. The solution was to ad this function to the db_session.py script:
def create_tables():
# Create all tables based on the models defined in models.py
engine = get_engine()
exposome_base.metadata.create_all(engine)

However, I am calling create_tables() in my test driver script, so I believe I'll need to find a way to have this function called automatically. There are some additional issues that I am currently working through and will post any blockers that arise, especially with loading other exposome data.
I would also like to clarify that I am able to load the test preprocess_ucr.csv data to the UCR table though. The script is functioning correctly.

J-glove · 2024-12-10T15:36:46Z

Continuing to work on the upload. Had trouble locating files to upload to db. Chengrong has them. @chengrong-us please upload these preprocessed files to the exposome server and tell Ali where they are located.

There is a good deal of updating that needs to be done to scripts to allow the datasets to be uploaded.

chengrong-us · 2024-12-17T17:41:16Z

The preprocess_ucr_zip9.csv file is saved in exposome server at /data/exposome_csvs/preprocess_ucr_zip9.csv.

J-glove · 2024-12-18T16:10:17Z

@allfires is currently having to prioritize 17r1 Characterization ahead of holiday, bringing this into next sprint.

J-glove · 2025-01-08T16:15:22Z

Not finished yet, working on upload scripts to match the datasource to defined model.

Just ucr is present in the db as of 1/8.

J-glove · 2025-01-08T16:24:32Z

@allfires - Please move the database to the on-prem Exposome server.

allfires · 2025-01-08T20:30:48Z

@allfires - Please move the database to the on-prem Exposome server.

@J-glove I have been working on moving the database to the exposome server, however, I am running into some issues. The biggest issue right now is that I am not able to run the latest version of SQLAlchemy on the server due to the current python version. In order to access libraries for SQLAlchemy that include "DeclarativeBase" and "mapped_column", I need to be using sqlalchemy version2 which is only compatible with python 3.7 or higher. What would you suggest as a fix to this? Should we create a docker container?

allfires · 2025-01-10T18:33:39Z

@J-glove I wanted to give you an update on this. I was able to figure out how to use pyenv and install the necessary python version that is compatible with the sqlalchemy dependencies. I was able to get the script running, however, it looks like the headers in the preprocess_ucr.csv file are not following the standard model for zip code and time identifiers, therefore I am not able to load the ucr data.

allfires · 2025-01-10T18:40:48Z

@J-glove the model for ucr specifically is this: class ucr(exposome_base):
tablename='UCR'
ZIP_9: Mapped[str] = mapped_column(String(9), primary_key=True)
YEAR: Mapped[str] = mapped_column(String(4), primary_key=True)
p_assualt: Mapped[Float] = mapped_column(Float)
p_burglary: Mapped[Float] = mapped_column(Float)
p_fso: Mapped[Float] = mapped_column(Float)
p_larceny: Mapped[Float] = mapped_column(Float)
p_murder: Mapped[Float] = mapped_column(Float)
p_mvt: Mapped[Float] = mapped_column(Float)
p_rob: Mapped[Float] = mapped_column(Float)

If the preprocess files don't match exactly the model, then the file will not load. Do you want me to change the scripts so that it will handle other columns? or should we change the script for the preprocess to include only columns that will read into the model correctly? Please let me know how to proceed.

preprocess_ucr_zip9.csv currently has these headers: ZIP_9,YEAR,pop,murder,fso,robbery,assault,burglary,larceny,mvt,total,p_total,p_murder,p_fso,p_rob,p_assault,p_burglary,p_larceny,p_mvt

J-glove added feature new code/logic to be written preprocessing issue pertains to the preprocessing portion of the project labels Nov 20, 2024

J-glove assigned J-glove and allfires and unassigned J-glove Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#Feature: Build the complete database #47

#Feature: Build the complete database #47

J-glove commented Nov 20, 2024 •

edited

Loading

J-glove commented Dec 4, 2024

allfires commented Dec 4, 2024 •

edited

Loading

J-glove commented Dec 10, 2024

chengrong-us commented Dec 17, 2024

J-glove commented Dec 18, 2024

J-glove commented Jan 8, 2025

J-glove commented Jan 8, 2025

allfires commented Jan 8, 2025 •

edited

Loading

allfires commented Jan 10, 2025

allfires commented Jan 10, 2025 •

edited

Loading

#Feature: Build the complete database #47

#Feature: Build the complete database #47

Comments

J-glove commented Nov 20, 2024 • edited Loading

J-glove commented Dec 4, 2024

allfires commented Dec 4, 2024 • edited Loading

J-glove commented Dec 10, 2024

chengrong-us commented Dec 17, 2024

J-glove commented Dec 18, 2024

J-glove commented Jan 8, 2025

J-glove commented Jan 8, 2025

allfires commented Jan 8, 2025 • edited Loading

allfires commented Jan 10, 2025

allfires commented Jan 10, 2025 • edited Loading

J-glove commented Nov 20, 2024 •

edited

Loading

allfires commented Dec 4, 2024 •

edited

Loading

allfires commented Jan 8, 2025 •

edited

Loading

allfires commented Jan 10, 2025 •

edited

Loading