-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: drop/delete database #25549
feat: drop/delete database #25549
Conversation
264726d
to
a910cf2
Compare
633fc44
to
780208f
Compare
This commit allows soft deletion of database using `influxdb3 database delete <db_name>` command. The write buffer and last value cache are cleared as well. closes: #25523
8a265cc
to
8f0c25a
Compare
- In previous commit, the deletion of database immediately triggered clearing last cache and query buffer. But on restarts same logic had to be repeated to allow deleting database when starting up. This commit removes immediate deletion by explicitly calling necessary methods and moves the logic to `apply_catalog_batch` which already applies `CatalogOp` and also clearing cache and buffer in `buffer_ops` method which has hooks to call other places. closes: #25523
8f0c25a
to
2b729e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good. The only comment I think you need to address is to add the deleted
field into the DatabaseSnapshot
struct to ensure the deleted field round-trips to and from JSON when the catalog is serialized. Once that is in and you fix compiler warnings / insta
snapshots, then I will approve.
The other comments are up to you if you want to address them.
influxdb3/tests/server/configure.rs
Outdated
.send() | ||
.await | ||
.expect("delete database call succeed"); | ||
assert_eq!(StatusCode::INTERNAL_SERVER_ERROR, resp.status()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we send back 404 NOT_FOUND
in this case?
influxdb3_catalog/src/catalog.rs
Outdated
if let Some(db) = self.databases.get(&catalog_batch.database_id) { | ||
let existing_table_count = db.tables.len(); | ||
|
||
if let Some(new_db) = db.new_if_updated_from_batch(catalog_batch)? { | ||
let new_table_count = new_db.tables.len() - existing_table_count; | ||
if table_count + new_table_count > Catalog::NUM_TABLES_LIMIT { | ||
return Err(Error::TooManyTables); | ||
} | ||
let new_db = Arc::new(new_db); | ||
self.databases.insert(new_db.id, Arc::clone(&new_db)); | ||
self.sequence = self.sequence.next(); | ||
self.updated = true; | ||
self.db_map.insert(new_db.id, Arc::clone(&new_db.name)); | ||
if let Some(new_db) = DatabaseSchema::new_if_updated_from_batch(db, catalog_batch)? { | ||
check_overall_table_count(db, &new_db, table_count)?; | ||
self.upsert_db(new_db); | ||
} | ||
} else { | ||
if self.databases.len() >= Catalog::NUM_DBS_LIMIT { | ||
return Err(Error::TooManyDbs); | ||
} | ||
|
||
let new_db = DatabaseSchema::new_from_batch(catalog_batch)?; | ||
if table_count + new_db.tables.len() > Catalog::NUM_TABLES_LIMIT { | ||
return Err(Error::TooManyTables); | ||
} | ||
|
||
let new_db = Arc::new(new_db); | ||
self.databases.insert(new_db.id, Arc::clone(&new_db)); | ||
self.sequence = self.sequence.next(); | ||
self.updated = true; | ||
self.db_map.insert(new_db.id, Arc::clone(&new_db.name)); | ||
let new_db = self.check_db_count(catalog_batch, table_count)?; | ||
self.upsert_db(new_db); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the upsert_db
method is nice here to DRY things up, but the logic captured in the check_overall_table_count
and Self::check_db_count
is simple enough to keep in line. Furthermore, I don't think their names as is capture what they are doing very well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, think check_db_count
does a bit more creates db and checks table count again. I'll rejig it and see if you still think it should be in-lined. Having said that am I missing something in check_overall_table_count
? I don't mind in-lining by the way, just wanted to see if I've missed anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think you missed anything the logic is still good. I'm being pedantic, but felt that moving the logic out to a helpers that aren't re-used otherwise was overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea definitely, what I wanted to tease out was the table count check as that gets used in both branches. I've kept the db check and creation inline now.
influxdb3_catalog/src/serialize.rs
Outdated
// todo: check if it's right to default to false here, | ||
// not sure where this is called | ||
deleted: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine to add a deleted
field on the DatabaseSnapshot
struct, then base it off of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit wasn't too obvious when I saw the From
impls, I see all the types have ..Snapshot
structs for serializing and deserializing, is that avoiding to build deserializer by hand? I guess in Deserialize
impls for partial structs (..Snapshot
) you can use deserialize
directly and then just build the relevant fields for the target type manually without involving visitor impl - is that the motivation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that about sums it up. The main motivation is the bi-directional maps that map from ID to name and vice versa. Those only need to be held in memory because the information in that bi-directional map is contained in the main map that maps the ID to the whole object, and which is serialized with the SerdeVecMap
. No need to also serialize the bi-directional map and waste bytes.
Earlier on in the project, the motivation for the _Snapshot
types was to capture the info that is contained in the arrow Schema
, since we could not rely on its Serialize
/Deserialize
impls being stable.
Co-authored-by: Trevor Hilton <[email protected]>
- `DatabaseSchema` serialization/deserialization is delegated to `DatabaseSnapshot`, so the `deleted` flag should be included in `DatabaseSnapshot` as well. - insta test snapshots fixed closes: #25523
d1a914d
to
f447bde
Compare
f447bde
to
5fbf7b4
Compare
5fbf7b4
to
c7d62aa
Compare
This commit allows to soft delete database using
influxdb3 database delete <db_name>
. The write buffer and last value cache are cleared as well.closes: #25523