You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to have a cleanup command that fully cleaned up the sqlite database.
Some background:
I was testing some latency things with a database with ~1 million records and noticed it was slow when I queried using metadata ~8 seconds/query. I wanted to see how database size impacted things, so I removed a bunch of records until I had ~25k records left.
Inspecting the disk size of the database revealed that the size hadn't changed after removing the records.
Then I ran:
chops clean-wal /path/to/persist_dir
This reduced my sqlite3 database from 7.7 GB to 2.7 GB and sped up my query from ~8 seconds to ~2.5 seconds.
Then I thought, what if I had a fresh database, so I ran:
This reduced my sqlite3 database down to 187 MB and also reduced my vector index to several MB from a few GB. It also sped up my query to <0.2 seconds.
Would be nice when running this in production to be able to do that same type of cleanup without totally starting over. I'm thinking of applications with a rolling window of data.
Maybe this isn't realistic with the way the vector indexing works, please advise - would love to understand more.
The text was updated successfully, but these errors were encountered:
hey @shortcipher3, thanks for this. Indeed you bring a valid point about the HNSW, it needs to be rebuilt to optimize it. I have that in my backlog things to add to Chroma and for now in chops.
Regarding the sqlite3 situation I'll investigate. the clean-wal command is intended only for the WAL which in chroma is unbound.
It would be nice to have a cleanup command that fully cleaned up the sqlite database.
Some background:
I was testing some latency things with a database with ~1 million records and noticed it was slow when I queried using metadata ~8 seconds/query. I wanted to see how database size impacted things, so I removed a bunch of records until I had ~25k records left.
Inspecting the disk size of the database revealed that the size hadn't changed after removing the records.
Then I ran:
This reduced my
sqlite3
database from 7.7 GB to 2.7 GB and sped up my query from ~8 seconds to ~2.5 seconds.Then I thought, what if I had a fresh database, so I ran:
This reduced my
sqlite3
database down to 187 MB and also reduced my vector index to several MB from a few GB. It also sped up my query to <0.2 seconds.Would be nice when running this in production to be able to do that same type of cleanup without totally starting over. I'm thinking of applications with a rolling window of data.
Maybe this isn't realistic with the way the vector indexing works, please advise - would love to understand more.
The text was updated successfully, but these errors were encountered: