Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV list() and 'cursor' #109

Open
cfjello opened this issue Jan 25, 2025 · 0 comments
Open

KV list() and 'cursor' #109

cfjello opened this issue Jan 25, 2025 · 0 comments

Comments

@cfjello
Copy link

cfjello commented Jan 25, 2025

I think KV list iterators has great potential! In distributed environment they could be fantastic, especially if they could be shared among processes. and they can! - well, almost.

I found the following behavoir when trying out list iterators:

1) You cannot provide a cursor-name to the iterator, when it is first created:

const LIMIT = 5; 
const keyPart = ["user"]; 
const cursor= "USERS";
const itor = kv.list<User>({ prefix: keyPart }, { limit: LIMIT, cursor: cursor );

This code will not return any result, because it is trying to lookup an existing cursor/iterator by name.
Also note, the LIMIT above applies to the list iterator. It is not a SQL style fetch limit, so you have to create a new list iterator to fetch the next 5 rows. This is probably by design.

2) A newly created list itorator does not have a cursor attribute that can be referenced, it is only assigned after the first fetch:

export type User = {
    id: number;
    name: string;
    age: number;
}

// Generate 100 users with random ages
const users: User[] = [];
for (let i = 1; i <= 100; i++) {
    users.push({
        id: i,
        name: `John_${i}`,
        age: 20 + (i % 30) // Example age between 20 and 49
    });
}

const kv = await Deno.openKv("./db.sqlite3")

async function fetchBatch<T>(
    iterator: Deno.KvListIterator<T>,
  ): Promise<{ cursor: string; items: T[] }> {
    let cursor = "";
    let result = await iterator.next();
    const items: T[] = [];
    while (!result.done) {
      cursor = iterator.cursor;
      // result.value returns full KvEntry object
      const item = result.value.value as T;
      items.push(item as T);
      result = await iterator.next()
    }
    return { cursor, items };
  }

// Populate the KV store with the users
await kv.delete(["user"]);
await kv.delete(["user_by_age"]);
for (const user of users) {
    const result = await kv.atomic()
      .set(["user", user.id], user)
      .set(["user_by_age", user.age, user.id], user)
      .commit();
    if (!result.ok) {
      throw new Error(`Problem persisting user ${user.name}`);
    }
  }


const itor = kv.list<User>({ prefix: ["user"] }, { limit: 5 });
let pageNum = 1;

// const cursor = itor.cursor; - trying to reference the iterator cursor name here, before the first fetch, will fail
const batch = await fetchBatch<User>(itor);

console.log(`-----------------------\nPage ${pageNum}:`);
for (const u of batch.items) {
    console.log(`${u.name} ${u.age}`);
}

// Now we can assign the name of the cursor
const cursor = itor.cursor;
const itor2 = kv.list<User>({ prefix: ["user"] }, { limit: 5 , cursor: cursor});


const batch2 = await fetchBatch<User>(itor2);
console.log(`-----------------------\nPage ${++pageNum}:`);
for (const u of batch2.items) {
    console.log(`${u.name} ${u.age}`);
}

As the code shows, it is possible to create a second iterator that looks up the first and will fetch the next five rows, so the result is:

-----------------------
Page 1:
John_1 21
John_2 22
John_3 23
John_4 24
John_5 25
-----------------------
Page 2:
John_6 26
John_7 27
John_8 28
John_9 29
John_10 30

3) This functionality also works across processes, indicating that these iterators are somehow tracked on a lower level. I tried to store the cursor name in the KV database, fetch the information from an independent process (same database), create a new list iterator using the cursor name fetched and 'voir la', the result was the same as above: Page 1 was produced by first process and Page 2 by the second process. In my opinion this is absolutely brilliant, a shared iterator in a distributed environment.

4) However, there was a problem, both in the dual- and single-process scenario. The first iterator has a private attribute #count, that is only updated when it initially runs. The first derived named iterator seems to know this count and picks up at the right row. However the #count is not subsequently tracked and updated - it only works the first time.

I am aware that I am probably squeezing the lemon here, but I would love it, if what I described above is actually how it is supposed to work and it's only the #count that needs fixing.

I have attached some sample code:

Iterators.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant