Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: drop/delete database #25549

Merged
merged 5 commits into from
Nov 19, 2024
Merged

feat: drop/delete database #25549

merged 5 commits into from
Nov 19, 2024

Conversation

praveen-influx
Copy link
Contributor

@praveen-influx praveen-influx commented Nov 13, 2024

This commit allows to soft delete database using influxdb3 database delete <db_name>. The write buffer and last value cache are cleared as well.

closes: #25523

@praveen-influx praveen-influx force-pushed the praveen/drop-database branch 4 times, most recently from 264726d to a910cf2 Compare November 14, 2024 15:54
@praveen-influx praveen-influx changed the title feat: drop/delete database feature feat: drop/delete database Nov 14, 2024
@praveen-influx praveen-influx force-pushed the praveen/drop-database branch 2 times, most recently from 633fc44 to 780208f Compare November 15, 2024 16:12
This commit allows soft deletion of database using `influxdb3 database
delete <db_name>` command. The write buffer and last value cache are
cleared as well.

closes: #25523
@praveen-influx praveen-influx force-pushed the praveen/drop-database branch 4 times, most recently from 8a265cc to 8f0c25a Compare November 18, 2024 17:41
- In previous commit, the deletion of database immediately triggered
  clearing last cache and query buffer. But on restarts same logic had
  to be repeated to allow deleting database when starting up. This
  commit removes immediate deletion by explicitly calling necessary
  methods and moves the logic to `apply_catalog_batch` which already
  applies `CatalogOp` and also clearing cache and buffer in
  `buffer_ops` method which has hooks to call other places.

closes: #25523
@praveen-influx praveen-influx marked this pull request as ready for review November 18, 2024 17:42
@praveen-influx praveen-influx requested review from pauldix, hiltontj and mgattozzi and removed request for pauldix and hiltontj November 18, 2024 17:43
Copy link
Contributor

@hiltontj hiltontj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good. The only comment I think you need to address is to add the deleted field into the DatabaseSnapshot struct to ensure the deleted field round-trips to and from JSON when the catalog is serialized. Once that is in and you fix compiler warnings / insta snapshots, then I will approve.

The other comments are up to you if you want to address them.

.send()
.await
.expect("delete database call succeed");
assert_eq!(StatusCode::INTERNAL_SERVER_ERROR, resp.status());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we send back 404 NOT_FOUND in this case?

Comment on lines 388 to 395
if let Some(db) = self.databases.get(&catalog_batch.database_id) {
let existing_table_count = db.tables.len();

if let Some(new_db) = db.new_if_updated_from_batch(catalog_batch)? {
let new_table_count = new_db.tables.len() - existing_table_count;
if table_count + new_table_count > Catalog::NUM_TABLES_LIMIT {
return Err(Error::TooManyTables);
}
let new_db = Arc::new(new_db);
self.databases.insert(new_db.id, Arc::clone(&new_db));
self.sequence = self.sequence.next();
self.updated = true;
self.db_map.insert(new_db.id, Arc::clone(&new_db.name));
if let Some(new_db) = DatabaseSchema::new_if_updated_from_batch(db, catalog_batch)? {
check_overall_table_count(db, &new_db, table_count)?;
self.upsert_db(new_db);
}
} else {
if self.databases.len() >= Catalog::NUM_DBS_LIMIT {
return Err(Error::TooManyDbs);
}

let new_db = DatabaseSchema::new_from_batch(catalog_batch)?;
if table_count + new_db.tables.len() > Catalog::NUM_TABLES_LIMIT {
return Err(Error::TooManyTables);
}

let new_db = Arc::new(new_db);
self.databases.insert(new_db.id, Arc::clone(&new_db));
self.sequence = self.sequence.next();
self.updated = true;
self.db_map.insert(new_db.id, Arc::clone(&new_db.name));
let new_db = self.check_db_count(catalog_batch, table_count)?;
self.upsert_db(new_db);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the upsert_db method is nice here to DRY things up, but the logic captured in the check_overall_table_count and Self::check_db_count is simple enough to keep in line. Furthermore, I don't think their names as is capture what they are doing very well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, think check_db_count does a bit more creates db and checks table count again. I'll rejig it and see if you still think it should be in-lined. Having said that am I missing something in check_overall_table_count? I don't mind in-lining by the way, just wanted to see if I've missed anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think you missed anything the logic is still good. I'm being pedantic, but felt that moving the logic out to a helpers that aren't re-used otherwise was overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea definitely, what I wanted to tease out was the table count check as that gets used in both branches. I've kept the db check and creation inline now.

Comment on lines 73 to 75
// todo: check if it's right to default to false here,
// not sure where this is called
deleted: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine to add a deleted field on the DatabaseSnapshot struct, then base it off of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit wasn't too obvious when I saw the From impls, I see all the types have ..Snapshot structs for serializing and deserializing, is that avoiding to build deserializer by hand? I guess in Deserialize impls for partial structs (..Snapshot) you can use deserialize directly and then just build the relevant fields for the target type manually without involving visitor impl - is that the motivation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that about sums it up. The main motivation is the bi-directional maps that map from ID to name and vice versa. Those only need to be held in memory because the information in that bi-directional map is contained in the main map that maps the ID to the whole object, and which is serialized with the SerdeVecMap. No need to also serialize the bi-directional map and waste bytes.

Earlier on in the project, the motivation for the _Snapshot types was to capture the info that is contained in the arrow Schema, since we could not rely on its Serialize/Deserialize impls being stable.

influxdb3_client/src/lib.rs Outdated Show resolved Hide resolved
praveen-influx and others added 2 commits November 19, 2024 10:09
- `DatabaseSchema` serialization/deserialization is delegated to
 `DatabaseSnapshot`, so the `deleted` flag should be included in
 `DatabaseSnapshot` as well.
- insta test snapshots fixed

closes: #25523
@praveen-influx praveen-influx merged commit 33c2d47 into main Nov 19, 2024
13 checks passed
@hiltontj hiltontj deleted the praveen/drop-database branch November 19, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add drop database
2 participants