Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL as alternate database provider #453

Closed
kspearrin opened this issue Mar 9, 2019 · 146 comments
Closed

PostgreSQL as alternate database provider #453

kspearrin opened this issue Mar 9, 2019 · 146 comments

Comments

@kspearrin
Copy link
Member

Adding another database provider should be fairly straight forward. Bitwarden's data access layer is abstracted away with repository interfaces which can be found under src/Core/Repositories. I've already done the work needed for making this possible. All that is left is re-writing all of the SQL tables, functions, stored procedures, etc over to PostgreSQL and then wiring them up repository implementations for PostgreSQL. I've started this work in the following commits, which can be followed as a model going forward:

84800da
acef40e

Choosing a database provider is as simple as either providing a connection string for MSSQL or PostgreSQL, whichever you prefer.

I do not have a lot of experience in working with PostgreSQL, so I will need some help to complete this task.

Questions:

  • Are PostgreSQL functions equivalent to MSSQL stored procedures?
@jeremyVignelles
Copy link

I was just trying to setup bitwarden on my server and immediately saw ram usage getting high due to the use of SQL Server. I was wondering if there was a way to use another db provider, when I found this post.

I've never used PostgreSQL, but I'm curious to know if that would help my server support the extra load.

Why are you using stored procedures for? What other dbms-specfic feature are you using? Would SQLite be a candidate for small environments?

@ghost
Copy link

ghost commented Mar 13, 2019

Hi there, long time lurker, first time poster.
As somebody who is highly interested in this feature I would like to help with at least my two cents.:

Are PostgreSQL functions equivalent to MSSQL stored procedures?

PosgreSQL stored procedures capabilities surpass MSSQL's, as this was one of the key aspects of its development. For a short info/comparison: https://stackoverflow.com/questions/339744/better-languages-than-sql-for-stored-procedures

I've never used PostgreSQL, but I'm curious to know if that would help my server support the extra load.

What would be your server's setup? Win or *nix? Usually I would say yes, it should reduce some of the load, but I don't want to give you some false hope.

I do not have a lot of experience in working with PostgreSQL, so I will need some help to complete this task.

I never helped I any open source project out of fear of doing something wrong, but if you guys need some help and are willing to help guide me on how to contribute I would gladly help.

@jeremyVignelles
Copy link

I never helped I any open source project out of fear of doing something wrong, but if you guys need some help and are willing to help guide me on how to contribute I would gladly help.

You can't mess up a project on your own by doing bad things. If you have the right development skills, go ahead, the worst that could happen is spending too much time, but you'll always learn something in the process 😉

@kspearrin
Copy link
Member Author

@tamaralo All database interaction in Bitwarden with MSSQL is done with stored procedures. You can find them (all all other schema) listed here: https://github.com/bitwarden/server/tree/master/src/Sql/dbo

My assumption is that we'd do the same in PostgreSQL using stored procedures (seems to behave differently in PostgreSQL) or functions. Our SQL data mapper (Dapper) seems to only work with PostgreSQL functions.

The majority of the work here is just translating the MSSQL schema (tables, indexes, sprocs/functions) over to their equivalent on PostgreSQL. After that it's just a matter of wiring them up to a C# repository class, which is fairly easy to do. Do you have enough experience in MSSQL and PostgreSQL to translate between the two? If so, I'd say you're qualified enough to help with this.

@ghost
Copy link

ghost commented Mar 13, 2019

@jeremyVignelles 😊 I give it a try! I have a lot of fear/respect for opensource projects, as they seem to be so well organized and fast paced. ^^

@kspearrin:
I worked with PostgreSQL, MySQL and some old versions from MSSQL ( in the 2000's ) so I think I could take a crack at it and try it at my server. The thing is, I study alongside my work and ATMI have some finals so, if its ok, I will try to take a look at it by the end of next week. 👍

@kspearrin
Copy link
Member Author

@tamaralo If you have any questions when you get started, you can find me in our chatroom here: https://gitter.im/bitwarden/Lobby

@ghost
Copy link

ghost commented Mar 13, 2019

@kspearrin Thx, I will come online by the end of next week (sorry, forgot the word next in the last post! Thanks for the prompt reply.

@Kovah
Copy link

Kovah commented Mar 24, 2019

The 2GB RAM requirement for SQLServer is the only thing holding me back from using Bitwarden. Would love to see an implementation for Postgres or any other database provider. 💙

@indy-singh
Copy link

Would love to help Bitwarden support PostgreSQL; though I tend to avoid stored procs/functions and prefer to bring that logic up into the code.

@mrahbar
Copy link

mrahbar commented Apr 4, 2019

👍 For supporting PostgreSQL. I'm a big fan of Bitwarden and a happy paying customer. The only reason of not using a self-hosted Bitwarden is MSSQL.

On the topic tough I think that without any official endorsement of the Bitwarden team for PostgreSQL this endeavor is be doomed to failure. After the initial batch of SQL scripts is ported who is going to create new PostgreSQL scripts every time the equivalent MSSQL scripts are updated? Kudos for a defined data-abstraction-layer but I'm not optimistic when it gets to maintain two different database vendors.

So can this process be only an experiment or a real alternative solution for MSSQL?

@FingerlessGlov3s
Copy link

👍
This sounds like a good idea to me. Getting ram usage down and removing minimum 2 GB limit is good. Will make running it more resource efficient on virtual machines etc...

Although we do need support from Bitwarden themselves like @mrahbar says.

@Agraphie
Copy link

I would like to see this as well. Are there any experimental branches or any way I can help make this happen?

@keliansb
Copy link

Any news about this?

@devployment
Copy link

As briefly mentioned here I'll try to volunteer here. Will set things up locally and see where I hit roadblocks.

@jeremyVignelles
Copy link

@devployment : 👏 and good luck.
Do you have any experience in working with Postgre SQL?
Please keep us posted 🙂

@devployment
Copy link

Do you have any experience in working with Postgre SQL?

@jeremyVignelles, yes I do. It's been a while. But we'll see.

@benjaminpreiss
Copy link

benjaminpreiss commented Jul 4, 2019

Is anyone already working on translating the schemes? Are there any branches?

@kspearrin
Copy link
Member Author

Not that I am aware of.

@wusatosi
Copy link

wusatosi commented Aug 9, 2019

Do you have any experience in working with Postgre SQL?

@jeremyVignelles, yes I do. It's been a while. But we'll see.

Any update?

@benjaminpreiss
Copy link

benjaminpreiss commented Aug 9, 2019

I have started with translating but made close to no progress - but I also have very little experience with sql

@Papina
Copy link
Contributor

Papina commented Aug 20, 2019

Just read this while investigating postgreSQL. I have good experience with tsql to pl/pgsql so I might start to give this a crack.

kspearrin pushed a commit that referenced this issue Sep 11, 2019
* PostgreSQL initial commit of translation from SQL Server to PostgreSQL

* snake_case added.
set search path for schema.  schema qualified name no longer needed for creation and access of functions.

* Table DDL for PostgreSQL
kspearrin pushed a commit that referenced this issue Sep 12, 2019
* PostgreSQL initial commit of translation from SQL Server to PostgreSQL

* snake_case added.
set search path for schema.  schema qualified name no longer needed for creation and access of functions.

* Table DDL for PostgreSQL

* Rename User.sql to user.sql

* PostgreSQL views, 
snake_case column fix for user_create, 
rename of users.sql file to lowercase
@Papina
Copy link
Contributor

Papina commented Sep 19, 2019

im down to the nitty gritty now of functions. Im not super familiar with docker, but i cant seem to publish the mssql ports from the docker image successfully. Is there a switch i can add that will expose the 1433 port? when i add it it i get an error os OSX

i modified the function to publish port 1433:

function updateDatabase() {
    pullSetup
    docker run -i --rm --name setup -p 1433:1433 --network container:bitwarden-mssql \
        -v $OUTPUT_DIR:/bitwarden --env-file $ENV_DIR/uid.env bitwarden/setup:$COREVERSION \
        dotnet Setup.dll -update 1 -db 1 -os $OS -corev $COREVERSION -webv $WEBVERSION
    echo "Database update complete"
}

now i get the error:

MacBook-Pro:bitwarden benjamin$ ./bitwarden.sh updatedb
 _     _ _                         _
| |__ (_) |___      ____ _ _ __ __| | ___ _ __
| '_ \| | __\ \ /\ / / _` | '__/ _` |/ _ \ '_ \
| |_) | | |_ \ V  V / (_| | | | (_| |  __/ | | |
|_.__/|_|\__| \_/\_/ \__,_|_|  \__,_|\___|_| |_|

Open source password management solutions
Copyright 2015-2019, 8bit Solutions LLC
https://bitwarden.com, https://github.com/bitwarden

===================================================

Docker version 19.03.2, build 6a30dfc
docker-compose version 1.24.1, build 4667896b

1.32.0: Pulling from bitwarden/setup
Digest: sha256:e88f1611ff88c77a6255c49189ac3c965aaa3576fa6980ba54f2be10a96907b5
Status: Image is up to date for bitwarden/setup:1.32.0
docker.io/bitwarden/setup:1.32.0
docker: Error response from daemon: conflicting options: port publishing and the container type network mode.
See 'docker run --help'.

this is more for testing, so i can make sure the functions im rewriting actually produce the same data

@kspearrin
Copy link
Member Author

You want to connect to the MSSQL instance of a Bitwarden installation? Just bash into the mssql docker container. Then you can run sqlcmd.

$ docker exec bitwarden-mssql bash

@Papina
Copy link
Contributor

Papina commented Sep 20, 2019

no, i want to connect to the mssql instance from the host, so i can run the development tools against the database directly, but still be able to use the bitwarden normally at the same time.

@Papina
Copy link
Contributor

Papina commented Sep 20, 2019

Screen Shot 2019-09-20 at 10 46 47

@kspearrin
Copy link
Member Author

Ok. Then create the following file ./bwdata/docker/docker-compose.override.yml.

version: '3'

services:
  mssql:
    ports:
      - '1433:1433'

Restart. Now you can connect to localhost:1433 from tooling on the host.

@FingerlessGlov3s
Copy link

@FingerlessGlov3s That would be a goal, but I am not sure how yet. You could just export your data vault and reimport it back into the new instance, but obviously that has limitations.

I have multiple users, so I want to avoid any disruption if possible. Hopefully just 30-60 minutes down time, which they'll be fine with because Bitwarden can work offline during that period.

@kspearrin
Copy link
Member Author

kspearrin commented Nov 24, 2022

@FingerlessGlov3s Note that with this new way of deploying you can also continue to use MS SQL as a database provider, so your existing database would still port over.

@lfvjimisola
Copy link

@kspearrin That is great news. FYI, we also need to be able to migrate all data from existing MSSQL to PostgreSQL. Ideally with as little downtime as possible.

@GieltjE
Copy link

GieltjE commented Nov 24, 2022

Wow, this is a really good improvement, easy to setup (even when using your own external db server).
Liking the lack of the old cluster of files and the much simpler setup.

The only thing that I couldn't get working is the /admin/ part, just results in a 404.

@udf2457
Copy link

udf2457 commented Dec 3, 2022

@kspearrin I've been lurking watching this development because MSSQL dependency is pretty much the only thing putting me off Bitwarden.

However taking a quick look around the repo, is this file representative of your proposed Postgres schema ?

Because there's a scary amount of character varying in that schema file that doesn't pass the Postgres smell test. See the "Don't Do This" wiki on the postgres site.

e.g. for varchar(n) the TL;DR is that in many (most?) cases you probably should be using text instead combined (where necessary) with a check constraint.

Similarly timestamp without time zone is discouraged in Postgres land, it should really be with.

Also I hope you're going to (eventually ?) be using Postgres stored functions in order to combat against SQL injection attacks ?

@kspearrin
Copy link
Member Author

kspearrin commented Dec 3, 2022

@udf2457 we use Entity Framework Core ORM for the MySQL and PostegreSQL implementations. EF just generates all the SQL from C#.

@knocte
Copy link

knocte commented Dec 4, 2022

@udf2457 we use Entity Framework Core ORM for the MySQL and PostegreSQL implementations. EF just generates all the SQL from C#.

I guess this is an answer to the question about stored functions, but what about the schema? I'm guessing EFCore didn't generate the schema.

@udf2457
Copy link

udf2457 commented Dec 4, 2022

@udf2457 we use Entity Framework Core ORM for the MySQL and PostegreSQL implementations. EF just generates all the SQL from C#.

I guess this is an answer to the question about stored functions, but what about the schema? I'm guessing EFCore didn't generate the schema.

And even if it did generate the schema, the schema file is not that big that you can't manually tweak it to be better suited for Postgresql instead of blindly using whatever the automated tool spat out.

And if it was automatically generated, then was it reviewed ? For example is the custom en-u-ks-primary collation really necessary ? Or was that just something else the automated tool just spat out ?

@kspearrin
Copy link
Member Author

Yes, the schema, queries, and migrations are all generated from Entity Framework Core. All code is peer reviewed, however, we do not have any Postgres expertise on the team, so we mostly rely on what is generated. Any specific feedback will be helpful since I believe tweaks can be made.

@udf2457
Copy link

udf2457 commented Dec 4, 2022

I'm a bit busy at $work with various deadlines due to change freezes caused by the upcoming holiday season. So I'm not sure I would be able to commit much time before the new year.

However a quick low-hanging fruit would be as above, i.e.:

  1. Any references to varchar/character varying etc. changed to text, and where there is a genuine need for a length constraint, add ADD CONSTRAINT $name CHECK (char_length($col) <= $len);
  2. timestamp without timezone to timestamp with timezone unless there's a genuine reason not to
  3. Review the "Don't Do This" wiki page, it provides a good summary that doesn't require any particular Postgres expertise
  4. Have a think about that en-u-ks-primary collation ? If it was put there by the automated tool, is it needed ? Messing around with custom collations on any database should only be done after some thought. I don't know much about collations - especially the more obscure ones. Looking around the internet en-u-ks-primary seems like it might refer to case & accent insensitive. Is this what you want ?
  5. There is also the bigger question of whether you want or need the case-sensitive table and column names ? All the names in your schema are presently wrapped in double quotes. This means Postgres will treat the names as case-sensitive instead of the default insensitive. If you do not need case-sensitive table/column names then drop all the double quotes.

@kspearrin
Copy link
Member Author

@udf2457

Re 1, Can we change these datatypes without affecting data already stored? Seems like we should if we're going from constrained to un constrained lengths?

Re 3, I don't know a lot about database collation, however, I found this in our code that is initing npgsql: https://github.com/bitwarden/server/blob/master/src/Infrastructure.EntityFramework/Repositories/DatabaseContext.cs#L107-L109

It references https://www.npgsql.org/efcore/misc/collations-and-case-sensitivity.html?tabs=data-annotations#database-collation and appears we are being specific for some reason.

Sounds like maybe we have to specify a collation of some sort? I don't know why this specific value (en-u-ks-primary) was chosen.

@GieltjE
Copy link

GieltjE commented Dec 4, 2022

Been over the MariaDB/MySQL tables, there are some very good things and some weird things.
1: user.Id is ascii, not UTF (great practice, much better than most other databases out there)
1a: user.[PublicKey|PrivateKey|LicenseKey|ApiKey] are UTF8mb4_general_ci (best choice in my opinion for MySQL/MariaDB for text with possible special characters), which is great for text, but these could be ascii too if we're going for consistency and optimization.
1b: I probably missed a few in this category, guess these apply to more tables/fields (e.g. grant.Key)

2: user.[CreationDate|RevisionDate|LastFailedLoginDate] are named Date but are DateTime

3: [authrequest|cipher|device].UserId is not an index, is this table always queried on Id?
3a: [collection].OrganizationId is not an index, is this table always queried on OrganizationId?
3b: [emergencyaccess|event|folder].Id is the only index, is this table always queried on Id?
3c: stopped looking after the previous ones, these should be easy to find and fix

4: the distinction between char and varchar appears to be correct

5: organization.PlanType is unsigned (correct and good practice), but a lot of other integers are signed while they obviously should be unsigned.

With some minor work and proper care on the index front this is should be a very exciting release.

@GieltjE
Copy link

GieltjE commented Dec 4, 2022

@kspearrin Being specific about the collation is a good thing, it shows care being taken when designing the database.

For MariaDB/MySQL there's _ci for Case Insensitive and _bin for hard binary comparisons.

The scheme is generally , so either utf8mb4_bin (no locale!) or something like utf8mb4_general_ci or utf8mb4_danish_ci.
Though the country specific ones are rarely used.
Where utf8mb4 appears to have be gaining much traction in the past few years, it just works for nearly all solutions where any form of special characters outside of the ascii charset are used.

@udf2457
Copy link

udf2457 commented Dec 4, 2022

@kspearrin

Re 1, Can we change these datatypes without affecting data already stored?

I can't think why not, since we're not crossing boundaries like you might be with numeric-type fields. Should be a simple alter table alter column command I would have thought.

I guess the only question might be on the indexing side, but a quick drop/re-index would take care of that.

@GieltjE
Copy link

GieltjE commented Dec 4, 2022

On any modern DB the indexes should be updated accordingly, did this plenty of times and never manuals touched an index before when doing so.

@udf2457
Copy link

udf2457 commented Dec 4, 2022

On any modern DB the indexes should be updated accordingly, did this plenty of times and never manuals touched an index before when doing so.

Indeed ... I was just trying to think hard for anything that might be affected. In reality I think the only thing likely to change during a varchar->text are background PG catalog things rather than the user-provided data itself since its already in the right format, its only the column definition that is changing.

@GieltjE
Copy link

GieltjE commented Dec 4, 2022

Keep in mind that on MariaDB/MySQL [var]char is preferred over [tiny|medium|long|]text types, as the text type cannot be part of indexes on some MySQL versions and they are fixed in size whereas the [var]char type can vary in length.

@kspearrin
Copy link
Member Author

Great. Thank you for the feedback. None of these sound too terrible or fundamental and seem like we could address them without issue with schema updates in the future.

@udf2457
Copy link

udf2457 commented Dec 5, 2022

Sounds good @kspearrin.

I guess potentially the index definitions for Postgres could be up for review, but then I'm not familiar with the sort of queries you're making.

Certainly if you're doing any of the following, you could likely do better than a standard index definition:

  • Looking for things "anywhere" within a textfield (i.e. not just prefixes, i.e. %foo% or %bar rather than foobar%)
  • Making queries against bool fields where you're only really interested in one of the statuses.
  • Doing anything that might benefit from a partial index or an expression index

@GieltjE
Copy link

GieltjE commented Dec 5, 2022

For the idexing checks we could either read the source and check all queries or enable something like general_log on MySQL/MariaDB, perform all possible actions in the UI and read all queries that are now listed in the log file.

Using the EXPLAIN statement should give some more insight (https://mariadb.com/kb/en/explain/).
But for the quick wins just looking at all WHERE ... statements and just using some common sense works fine most of the time.

@kspearrin
Copy link
Member Author

There is certainly room for improving indexing in the EF implementations of MySQL and PostgreSQL, however, I think we've got a good understanding of that and we'll work on tuning that over time.

@kspearrin
Copy link
Member Author

We have announced an official beta release supporting PostgreSQL and MySQL through our new unified Docker image here: https://bitwarden.com/blog/new-deployment-option-for-self-hosting-bitwarden/

Docs on getting started: https://bitwarden.com/help/install-and-deploy-unified-beta/

Closing this issue now.

@danpoltawski
Copy link
Contributor

Is there another issue for tracking migrating out of mssql? (Have a 50+ user deployment I'd really like to migrate onto our existing postgres cluster!)

@kspearrin
Copy link
Member Author

@danpoltawski No, but we are tracking it internally. Feel free to open a public issue.

@danpoltawski
Copy link
Contributor

Since the issue tracker here strongly discourages feature requests i'll 🤞 that this will be implemented :) Great to see this :)

@justindbaur
Copy link
Member

Hi All, since you all are interested in using Postgres I just want to give a little visibility to #2480 where we are going to be tracking any issues with Bitwarden unified which includes the path towards using Postgres. If you run into any issue feel free to check there for if it's being tracked already and mention it if you run into something new.

@holow29
Copy link
Contributor

holow29 commented Dec 8, 2022

Will unified be the only officially supported way of using other DBMS? Would it be wrong to assume that standard self-host architecture has the same support if configured in the config? Haven't dug around the unified release enough yet to see all what it is doing vs. standard.

@justindbaur
Copy link
Member

@holow29 The standard architecture does have a way of using Postgres & MySQL you have to do exactly what you said configure the config to use it. But note that even using the Standard deployment the Postgres and MySQL options are also a beta.

@sjmf
Copy link

sjmf commented Dec 28, 2023

A note for others who stumble across this issue looking for a way to configure an alternate database server in the Linux Manual Deployment process, as I was.

You will need to add the following settings to global.env (or global.override.env) – see also [1, 2]:

globalSettings__DatabaseProvider=Postgres
globalSettings__PostgreSql__connectionString="Host=postgres;Username=postgres;Password=REPLACE_DEFAULT_PASSWORD;Database=vault"

You may also wish to configure an alternate database provider in docker-compose.yml, depending on your setup:

  postgres:
    image: postgres:16
    container_name: postgres
    restart: always
    volumes:
        - ../postgres:/var/lib/postgresql
    env_file:
        - ../env/pgsql.env

I also created a postgres folder to contain the data volume.

My pgsql.env contains the following:

POSTGRES_USER=postgres
POSTGRES_PASSWORD=REPLACE_DEFAULT_PASSWORD
POSTGRES_DB=vault

Hope this is helpful.

Sources:
[1] https://github.com/bitwarden/server/blob/main/.github/workflows/infrastructure-tests.yml
[2] https://github.com/bitwarden/server/blob/main/src/Core/Enums/SupportedDatabaseProviders.cs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests