Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagecount #819

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Pagecount #819

wants to merge 4 commits into from

Conversation

glommer
Copy link
Contributor

@glommer glommer commented Jan 29, 2025

This PR implements the Pagecount pragma, as well as its associated bytecode opcode

That's what sqlite does. For example, for an empty database:

$ sqlite3 file:memory "pragma page_count"
0
It is partially supported - only on the existing database.
To do that, we also have to implement the vdbe opcode Pagecount.
@@ -223,7 +223,7 @@ impl Default for DatabaseHeader {
min_embed_frac: 32,
min_leaf_frac: 32,
change_counter: 1,
database_size: 1,
database_size: 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. The SQLite file always has the database header but more important the sqlite_schema table on page 1. @pereman2, @jussisaurio, please do correct me if i am wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah tests are failing and it seems to be because of that.

On an empty db page_count returns 0 on sqlite - I am investigating why

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, page 1 is allocated with db initialization and never deallocated. One of the reasons why balancing becomes a bit complicated.

Comment on lines +603 to +605
PragmaName::PageCount => {
query_pragma("page_count", header, program)?;
Ok(())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a transform from PragmaName to str to then PragmaName again. Quite unnecessary. (no need to solve it here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#823 created an issue

Comment on lines +10 to +22
#[test]
fn test_pragma_page_count() -> Result<(), Error> {
let mut child = spawn_command(run_cli(), Some(1000))?;
child.exp_regex("limbo>")?; // skip everything until limbo cursor appear
child.exp_regex(".?")?;
child.send_line("pragma page_count;")?;
child.exp_string("0")?;
child.exp_regex("limbo>")?;
child.send_line("create table foo(bar); pragma page_count;")?;
child.exp_string("2")?;
quit(&mut child)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this wizardy. Can we just have a TCL test? You should be able to add the equivalent of this test in testing/pragma.test.

Something like:

do_execsql_test pragma-cache-size {
  PRAGMA page_count
} {2}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be able to do this in the pragma.test. This is because that test is ran with two different databases as an input (testing.db and testing_norowidalias), and they each have a different page count.

I will see if I can come up with a simpler unit test somewhere.

@glommer
Copy link
Contributor Author

glommer commented Jan 29, 2025

@pereman2 and @penberg , before I touch this again, I am getting a different number of pages than SQLite almost every time. Sometimes it is a simple off-by-one, sometimes it is not.

The SQLite implementation of the relevant function is quite straightforward:

/*
** Return the size of the database file in pages. If there is any kind of
** error, return ((unsigned int)-1).
*/
static Pgno btreePagecount(BtShared *pBt){
  return pBt->nPage;
}
SQLITE_PRIVATE Pgno sqlite3BtreeLastPage(Btree *p){
  assert( sqlite3BtreeHoldsMutex(p) );
  return btreePagecount(p->pBt);
}

That is a property of the BTree, though, not of the database file, and it gets updated in multiple places.
I assume we do want to keep compatibility here, as in, return the same amount of pages for the same database.

It will be very hard for me to trust the result of me just reading through the SQLite code and trying to understand in which situations this is kept updated.

I assume we have some infra in place to fuzz a bunch of potential databases so we can compare them? What's your suggestion here?

@pereman2
Copy link
Collaborator

@glommer In which scenarios do you see differences? In a simple new database I see this:

Enter ".help" for usage hints.
sqlite> pragma page_count;
0
sqlite> PRAGMA journal_mode=WAL;
wal
sqlite> pragma page_count;
1
sqlite> CREATE TABLE t(x);
sqlite> pragma page_count;
2
sqlite> insert into t values(1);
sqlite> pragma page_count;
2
sqlite> 

I interpret this as we have 0 on an empty database without wal, 1 with wal and rest as expected. Meaning, if we start with WAL anyways, we could assume we have size 1 and initialize the header anyways.

@glommer
Copy link
Contributor Author

glommer commented Jan 29, 2025

database_size needs to be set back to 1 as others pointed out. But fine, at that point, all we have to do is subtract one before returning...

But then I see this on SQLite:

sqlite> pragma page_count;
0
sqlite> create table temp (t1 integer, primary key (t1));
sqlite> pragma page_count;
2 <=== note that for some reason this schema adds two pages!

This query, btw, crashes Limbo if database_size is set to start at 0. So if we go back to 1, and subtract one before returning, and run it in-memory:

limbo> pragma page_count;
0
limbo> create table temp (t1 integer, primary key (t1));
limbo> pragma page_count;
2

The problem is that if we save this to a file with SQLite, and then open it with Limbo, database_size now has 2, not 3. And then once we subtract 1, we are now wrong:

sqlite3 /tmp/k.db ".dump"
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE temp (t1 integer, primary key (t1));
COMMIT;
|15:43:50|glaubercosta@Glaubers-MBP:[limbo]> cargo run /tmp/k.db
limbo> pragma page_count;
1

it could be that this is just a matter of special casing database loading, but then the following in-memory schema seems wrong:

sqlite> pragma page_count;
0
sqlite> create table foo(bar);
sqlite> pragma page_count;
2

vs

limbo> pragma page_count;
0
limbo> create table foo(bar);
limbo> pragma page_count;
1

And for full reference, I am running this with the following code atop:

diff --git a/core/storage/sqlite3_ondisk.rs b/core/storage/sqlite3_ondisk.rs
index c2784a6..5543fa7 100644
--- a/core/storage/sqlite3_ondisk.rs
+++ b/core/storage/sqlite3_ondisk.rs
@@ -223,7 +223,7 @@ impl Default for DatabaseHeader {
             min_embed_frac: 32,
             min_leaf_frac: 32,
             change_counter: 1,
-            database_size: 0,
+            database_size: 1,
             freelist_trunk_page: 0,
             freelist_pages: 0,
             schema_cookie: 0,
diff --git a/core/vdbe/mod.rs b/core/vdbe/mod.rs
index a90e184..3c407c0 100644
--- a/core/vdbe/mod.rs
+++ b/core/vdbe/mod.rs
@@ -2401,8 +2401,9 @@ impl Program {
                         // TODO: implement temp databases
                         todo!("temp databases not implemented yet");
                     }
-                    state.registers[*dest] =
-                        OwnedValue::Integer(pager.db_header.borrow().database_size.into());
+                    let pages : u32 = pager.db_header.borrow().database_size;
+                    assert_ne!(pages, 0);
+                    state.registers[*dest] = OwnedValue::Integer((pages - 1).into());
                     state.pc += 1;
                 }
             }

And then, subtract one when returning
@glommer
Copy link
Contributor Author

glommer commented Jan 29, 2025

BTW, I think that this comes from the fact that SQLite does, in fact, write 2 to that field.

It's the last byte:

|15:58:53|glaubercosta@Glaubers-MBP:[turso-server]> hexdump  /tmp/k.db
0000000 5153 694c 6574 6620 726f 616d 2074 0033
0000010 0010 0101 400c 2020 0000 0100 0000 0200

Limbo writes 3:

|16:00:55|glaubercosta@Glaubers-MBP:[limbo]> cargo run /tmp/limbo-k.db
Limbo v0.0.13
Enter ".help" for usage hints.
limbo> create table temp (t1 integer, primary key (t1));
|16:01:05|glaubercosta@Glaubers-MBP:[limbo]> hexdump /tmp/limbo-k.db
0000000 5153 694c 6574 6620 726f 616d 2074 0033
0000010 0010 0202 4000 2020 0000 0100 0000 0300

So my tl;dr is that I think our logic to tracking this is broken today.

@penberg
Copy link
Collaborator

penberg commented Jan 31, 2025

@glommer #830 is now merged, which should fix the page count issue you're having here?

@glommer
Copy link
Contributor Author

glommer commented Jan 31, 2025

yes. I have a new version on top of #844 , and will update this PR once #844 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants