Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup alembic #441

Closed
wants to merge 64 commits into from
Closed

Setup alembic #441

wants to merge 64 commits into from

Conversation

nrjadkry
Copy link
Member

@nrjadkry nrjadkry commented Jun 4, 2023

No description provided.

@nrjadkry nrjadkry requested a review from spwoodcock June 4, 2023 05:59
@spwoodcock
Copy link
Member

I'll check this over soon 👍

Did we decide against #254 in the end then @nrjadkry @robsavoye?

By using alembic I assume we are sticking with SQLAlchemy and GeoAlchemy.

@robsavoye
Copy link
Collaborator

Actually I prefer using psycopg3 instead of alembic. I'm not sure where this idea came from, so would like to know why so I can make an informed decision.

@nrjadkry
Copy link
Member Author

nrjadkry commented Jun 4, 2023

Since we are using sqlalchemy now, it is a bit more difficult to maintain the changes in the database. Alembic helps to maintain those changes. I just saw the issue to remove sqlalchemy. Why are we moving away from sqlalchemy? I think sqlalchemy is a great package to have. @robsavoye @spwoodcock

@robsavoye
Copy link
Collaborator

sqlalchemy is overkill. It is designed to support multiple different databases, but we'll only use postgres due it it having postgis. I agree that if we continue using sqlalchemy alembic makes sense, but I prefer not adding more layers of dependencies. Sam was about to do this refactoring from sqlalchemy, so this is a good time to talk about the path forward.

@spwoodcock
Copy link
Member

spwoodcock commented Jun 4, 2023

SQLAlchemy is good, and has some advantages, but we don't necessarily need the abstraction. Raw SQL gives more flexibility, easier use of complex queries, and potential performance optimisation.

Look into some comparisons online. Each has their use case, but I think we will rely a lot on raw SQL when using Postgis anyway & we don't need to portability provided by an ORM (we will likely always use Postgres).
Edit example comparison https://lyz-code.github.io/blue-book/architecture/orm_builder_query_or_raw_sql/

I think a good middle ground here could be to use a query builder. The SQL can be generated using familiar Python syntax and then executed directly by the postgres driver. This also protects against SQL injection.

Something I have not tested, but could suit us nicely is: https://github.com/samuelcolvin/buildpg. It's by the creator for Pydantic and has a nice integration with asyncpg too.

On a side note: asyncpg vs psycopg3. Either will do the job, but asyncpg will likely always be more performant. Here is a good blog post by the creator of psycopg suggesting they will prioritise compatibility with psycopg2 and ease of use over performance: https://www.varrazzo.com/blog/2020/05/19/a-trip-into-optimisation/

@spwoodcock
Copy link
Member

And then finally we have database migrations to deal with. A pretty simple approach is to make a migrations directory containing sequential text or .sql files with the SQL logic required for each migration. It's the way I have handled it in some other projects I work on.

I have nothing against migration tools like alembic, as I have never had issues. Others complain that they can't handle complex migrations and have a lack of transparency. Text file migrations are pretty simple and easy to understand, plus have one less dependency to worry about.

@robsavoye
Copy link
Collaborator

Right now we're already using a query generator. All queries have a YAML config file in the data_models directory. Then the backend converts it to the different queries for local postgres, remote Underpass, and Overpass. My main reason for refactoring to psycopg3 is long term maintenance. I also have usually just have an SQL file of the schema, and a simple script to handle migrations. We'll only be using postgres, so the additional overhead and flexibility of sqlalchemy isn't needed.

@spwoodcock
Copy link
Member

spwoodcock commented Jun 4, 2023

This adds a lot more context, as fmtm relies heavily on osm-fieldwork. I think it makes complete sense to move away from SQLAlchemy as we generate raw SQL for the most important logic anyway.

For basic CRUD in fmtm we could use raw SQL too.
My main concerns were:

  • SQL injection (as this is a web API, not a lower level package). If contributors don't write SQL property with variable substitution, then we could have an attacker drop all data (we need backups), access hidden data (not a major concern).

  • Development friendliness. SQLAlchemy is pretty ubiquitous in the Python development world and reasonably easy to pick up. In an ideal world everyone would know SQL, but sometimes this isn't the case.

After discussion with Rob, we decided that using a combination of raw SQL (for very complex queries), and a query generator (for simple queries), would be a good approach.

We also discussed asyncpg vs psycopg3, and it makes sense to go with whatever is the most performant and well adopted.
In this case it's actually asyncpg.

I suggest we give https://github.com/samuelcolvin/buildpg a try (which combines query generation with asyncpg).

@spwoodcock
Copy link
Member

What do you think @nrjadkry?

Apologies if we end up dropping SQLAlchemy and your work here is not used.

@robsavoye
Copy link
Collaborator

I should correct myself. The query generator is in osm-fieldwork, not FMTM. Sorry about that. Most of the FMTM queries are raw SQL, as I used the lower level API of sqlalchemy. asyncpg might be fast, but maybe the syntax is a bit weird. If that was buried under a query generator, that'd be ok.

@nrjadkry
Copy link
Member Author

nrjadkry commented Jun 5, 2023

I made this PR since I was facing a lot of issues with maintaining the changes in the database and alembic could solve this issue. I am fine with whatever you guys decide. @robsavoye @spwoodcock

@nrjadkry nrjadkry marked this pull request as ready for review June 6, 2023 05:46
Copy link
Member

@spwoodcock spwoodcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the new SQLALCHEMY_URL variable is required in the config.py. I think we can use DB_URL. Otherwise ready to merge 👍

src/backend/app/alembic.ini Show resolved Hide resolved
src/backend/app/config.py Show resolved Hide resolved
src/backend/app/migrations/env.py Show resolved Hide resolved
@spwoodcock
Copy link
Member

I was working on the SQLAlchemy removal / refactor, but haven't had time unfortunately (back to work after a holiday...).

I plan to get back to it in the future, but for now it may be best to merge this in if it's useful for the devs.

The edits don't affect any of the functionality of the code, so all good to merge (after a minor tweak)!

@nrjadkry nrjadkry temporarily deployed to test September 26, 2023 05:54 — with GitHub Actions Inactive
@robsavoye robsavoye added enhancement New feature or request backend Related to backend code labels Sep 26, 2023
@spwoodcock
Copy link
Member

Merging in the old branch is tricky, as a lot has changed since, plus the pre-commit hooks have been run (many merge conflicts).

The easiest way to do this may actually be to create a new branch from development, then manually copy across the alembic stuff from this branch (plus a new PR).

@nrjadkry
Copy link
Member Author

nrjadkry commented Sep 26, 2023

Current files changes shows 353 files.
@spwoodcock you are right. Making a new pull request with copyng files is much easier.

I will make a new Pull Request.

@nrjadkry
Copy link
Member Author

@spwoodcock
If we run this migration in the db already having those tables created, it will throw us an error with the table already exists error.
What do you think is the best possible way to handle this?

@spwoodcock
Copy link
Member

Have you tested that @nrjadkry?
Generally database migrations should be idempotent, meaning they can be applied over an over without issues.
If that's not the case, then we need to look into how alembic is configured.

@nrjadkry
Copy link
Member Author

nrjadkry commented Oct 6, 2023

After careful consideration, we have opted not to proceed with the usage of Alembic for migration. This pull request has been pending for approximately five months, making it the longest-standing PR in our repository.

@nrjadkry nrjadkry closed this Oct 6, 2023
@spwoodcock spwoodcock deleted the setup-alembic branch November 29, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to backend code enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

5 participants