-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup alembic #441
Setup alembic #441
Conversation
…now for migration
I'll check this over soon 👍 Did we decide against #254 in the end then @nrjadkry @robsavoye? By using alembic I assume we are sticking with SQLAlchemy and GeoAlchemy. |
Actually I prefer using psycopg3 instead of alembic. I'm not sure where this idea came from, so would like to know why so I can make an informed decision. |
Since we are using sqlalchemy now, it is a bit more difficult to maintain the changes in the database. Alembic helps to maintain those changes. I just saw the issue to remove sqlalchemy. Why are we moving away from sqlalchemy? I think sqlalchemy is a great package to have. @robsavoye @spwoodcock |
sqlalchemy is overkill. It is designed to support multiple different databases, but we'll only use postgres due it it having postgis. I agree that if we continue using sqlalchemy alembic makes sense, but I prefer not adding more layers of dependencies. Sam was about to do this refactoring from sqlalchemy, so this is a good time to talk about the path forward. |
SQLAlchemy is good, and has some advantages, but we don't necessarily need the abstraction. Raw SQL gives more flexibility, easier use of complex queries, and potential performance optimisation. Look into some comparisons online. Each has their use case, but I think we will rely a lot on raw SQL when using Postgis anyway & we don't need to portability provided by an ORM (we will likely always use Postgres). I think a good middle ground here could be to use a query builder. The SQL can be generated using familiar Python syntax and then executed directly by the postgres driver. This also protects against SQL injection. Something I have not tested, but could suit us nicely is: https://github.com/samuelcolvin/buildpg. It's by the creator for Pydantic and has a nice integration with asyncpg too. On a side note: asyncpg vs psycopg3. Either will do the job, but asyncpg will likely always be more performant. Here is a good blog post by the creator of psycopg suggesting they will prioritise compatibility with psycopg2 and ease of use over performance: https://www.varrazzo.com/blog/2020/05/19/a-trip-into-optimisation/ |
And then finally we have database migrations to deal with. A pretty simple approach is to make a I have nothing against migration tools like alembic, as I have never had issues. Others complain that they can't handle complex migrations and have a lack of transparency. Text file migrations are pretty simple and easy to understand, plus have one less dependency to worry about. |
Right now we're already using a query generator. All queries have a YAML config file in the data_models directory. Then the backend converts it to the different queries for local postgres, remote Underpass, and Overpass. My main reason for refactoring to psycopg3 is long term maintenance. I also have usually just have an SQL file of the schema, and a simple script to handle migrations. We'll only be using postgres, so the additional overhead and flexibility of sqlalchemy isn't needed. |
This adds a lot more context, as fmtm relies heavily on osm-fieldwork. I think it makes complete sense to move away from SQLAlchemy as we generate raw SQL for the most important logic anyway. For basic CRUD in fmtm we could use raw SQL too.
After discussion with Rob, we decided that using a combination of raw SQL (for very complex queries), and a query generator (for simple queries), would be a good approach. We also discussed asyncpg vs psycopg3, and it makes sense to go with whatever is the most performant and well adopted. I suggest we give https://github.com/samuelcolvin/buildpg a try (which combines query generation with asyncpg). |
What do you think @nrjadkry? Apologies if we end up dropping SQLAlchemy and your work here is not used. |
I should correct myself. The query generator is in osm-fieldwork, not FMTM. Sorry about that. Most of the FMTM queries are raw SQL, as I used the lower level API of sqlalchemy. asyncpg might be fast, but maybe the syntax is a bit weird. If that was buried under a query generator, that'd be ok. |
I made this PR since I was facing a lot of issues with maintaining the changes in the database and alembic could solve this issue. I am fine with whatever you guys decide. @robsavoye @spwoodcock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the new SQLALCHEMY_URL variable is required in the config.py. I think we can use DB_URL. Otherwise ready to merge 👍
I was working on the SQLAlchemy removal / refactor, but haven't had time unfortunately (back to work after a holiday...). I plan to get back to it in the future, but for now it may be best to merge this in if it's useful for the devs. The edits don't affect any of the functionality of the code, so all good to merge (after a minor tweak)! |
for more information, see https://pre-commit.ci
Merging in the old branch is tricky, as a lot has changed since, plus the pre-commit hooks have been run (many merge conflicts). The easiest way to do this may actually be to create a new branch from development, then manually copy across the alembic stuff from this branch (plus a new PR). |
Current files changes shows 353 files. I will make a new Pull Request. |
@spwoodcock |
Have you tested that @nrjadkry? |
After careful consideration, we have opted not to proceed with the usage of Alembic for migration. This pull request has been pending for approximately five months, making it the longest-standing PR in our repository. |
No description provided.