-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Server: Database level user table isolation #224
Comments
For additional reference, the connection pool to the dataset database is set up at run time here matico/matico_server/src/lib.rs Lines 62 to 67 in 651d270
We need to generalize this a bit to allow for different backend connections but that's out of the scope of this ticket I think |
Is there a reason why we aren't taking a more fragmented approach in regards to user roles and permissions? Perhaps I am missing something, but linking individual users to the database's roles feels like something that would limit us in regards to database flavors. I've generally worked with central role and permissions tables as opposed to relying certain authorized connections. Relying on an authorized connection feels a little risky in the way that if you could spoof a valid connection you could get A LOT more data than if you just did a simple check for role and permission. Perhaps this would be better to pitch in a meeting, you're right. |
On a side note, I'd like to add some columns to our tables for future auditing and reporting purposes. #[derive(Debug, Queryable, Deserialize, Serialize, Insertable)]
pub struct Permission {
pub id: Uuid,
pub user_id: Uuid,
pub resource_id: Uuid,
pub permission: String,
pub resource_type: String,
pub created_at: NaiveDateTime,
pub updated_at: NaiveDateTime,
} Namely, I'd like to add |
I think it would be useful to have a session where we discuss the architecture here and talk through some options. I think we will need user level permissions at the database level in some form, as we allow users to execute arbitrary SQL against the datasets they have permission to read etc. I think it might be hard to determine from that SQL exact ally what tables are being accessed in order to do an out of database permissions check before the query is run. Might be worth clarifying that we have two databases in the system currently. The first holds all the business logic for the application (Users, Permissions,, etc) while the second we use to actually store a users uploaded datasets in tables and as an analytics engine using postgis. It's that second database that we need to somehow add permissions too. If there is a way we can do that without using database users and permissions at the database level I am totally open to it. Just struggling to see how we can make that work. |
Yeah not having updated_at and crated_at in there is an oversight by me. Do you imagine the column to track who updated / created / deleted the table would point to a given user in the system? |
Yup! Just as the created_by uuid NOT NULL REFERENCES users (id)
updated_by uuid REFERENCES users (id)
deleted_by uuid REFERENCES users (id) |
Agreed! I feel like we could both type up 1,000 words easily on this topic and go back and forth, or just have a 15 minute meeting and come to an agreement relatively quickly. I should be available later on Tuesday, after 3 PM. Throw some time on my calendar and I'll make it happen! |
In regards to your original request, do we have a means of preventing certain commands (changing roles, deleting tables, etc.) other than just text parsing? |
I think we could do some text parsing of the injected SQL looking for any instance of the word |
The more I read over the two solutions, the more I like the second solution. It feels generally more secure and far more finite. I don't like the concept of just leaving connections open once a user has initially connected to a DB.
I could very well be off base here, but I am not sure this will be a significant problem. Given that this is an in-house solution, the application SHOULDN'T be getting hundreds of connection requests. We could also set the maximum threshold for connections per data source to a number that is well below the actual performance threshold for said source. |
Great! I agree that the second solution here sounds the best. I think it will make testing easier as well as we can isolate the ConnectionService middleware an just test that. @michplunkett a good next step might be to look at how the AuthService works as a template for building out this new service. That uses the FromRequest trait (https://docs.rs/actix-web/latest/actix_web/trait.FromRequest.html) but we might need to use the more advanced middleware setup here: https://actix.rs/docs/middleware. to be able to access the application state to get access to the connection settings for the DB. |
Currently we have dataset assets protected at the API level through our permissions setup
matico/matico_server/src/models/permissions.rs
Lines 88 to 97 in 651d270
This allows us to check a user against a resource at the route and model level to determine access to said resource.
This works well for this level but we still have the problem of access at the database level.
For example lets say a user: User1 uploads a dataset: Dataset1 and marks it as private. The dataset gets loaded in to a table dataset_table When User2 tried to query that table through the API, or request tiles from that table etc, the permissions system will reject them for not having the correct permissions.
This doesn't however stop User2 from running a query against the dataset_table. For example
will still execute when run by User2, even though User2 doesn't have permissions at the API level because we currently have no way of controlling permissions at the database level.
Indeed right now there is only a single user at the database level, which is simply the root db user which has access to everything.
There are a couple of ways we can resolve this issue.
Essentially what we need to do is create a db user for each user on signup and then run any query as that user . Then when permissions change at the API level, make sure we reflect these permissions at the database level.
Managing access
Focusing on the postgresql Data Service just now ( we will need to find a way to generalize this in the future to other backends), there are potentially a couple of ways to do this.
Approach one
We retain the root user, but before we run any query we use the SET ROLE postgres command to change the user scope to the user who is running the query
This could potentially be achieved at the QueryBuilder level where we insert the SET ROLE directive before the rest of the query is built.
Pros
This means we can retain a common thread pool to the database, as we don't need to reconnect with a new user scoped connection for each query ( which might overload the database active connections for rapid multiple queries like requesting MVT tiles.
Means we just have to modify the query builder and not the connection logic.
Cons
I am not sure how secure this actually is. For example, if a user runs a raw query against the database, can they include a second SET ROLE command to switch to another user before executing commands? If so is there a good way to protect against this?
Building this at the QueryBuilder level would require other implementations for other backends also to build the user permissions in at query execution time rather than at connection time (which is likely to be more common .
Approach two
Instead of trying to change the user at query time, we simply do so at connection time. In this case, each API request would need to establish a connection to the database with the users credentials and use that to execute the query.
It seems to me the easiest way to do this would be through a middleware. Currently the database connection is passed to our route executors through the global state object. For example, in this route we get access to the metadata database connection "state.db" and the data table database connection "state.data_db" through the state object.
matico/matico_server/src/routes/data.rs
Lines 103 to 122 in 651d270
We could change this up to instead get a connection through a Middleware service
This ConnectionService middleware would need to
Pros
Cons
Managing permissions
The other part of this is we need to manage changing permissions. So for example if User1 grants read access to User2 on a given table, we need to propagate that permission to the database.
This would require modifying the grant_permissions method
matico/matico_server/src/models/permissions.rs
Lines 177 to 195 in 651d270
We would need to
This shouldn't be too hard but noting it here.
Other considerations to note
Recapping above, we will eventually need this to generalize to other backend connections. Currently we only have postgress but eventually want to expand to MongoDB, flat geopaquet files, big query etc. So any solution should ideally keep that in mind
We want to as much as possible avoid performance hits with this approach. The MVT tile endpoints especially need to be responsive.
Ideally this should be implemented in a way that allows the query builders / routes etc to be agnostic to database permission layer.
The text was updated successfully, but these errors were encountered: