Make framing less error prone and requiring buffered IO #2

daniel-j-h · 2023-01-13T19:22:17Z

At the moment we implement framing by size-prefixing graph chunk messages with a varint.

file header  # fixed size
graph header  # fixed size
size0
chunk0
size1
chunk1
..

this is the recommended way to implement framing; there's just one issue right now:

we do not hard-code the number of chunks, neither in the file header nor in the graph header
to check if there are still chunks in the file, we peek ahead and read four bytes; the idea was that we will either hit EOF and are done, or we will find a varint which then tells us about the size of the chunks to decode

This approach has the following downsides

in case the file contains en empty chunk, the varint will be a single byte. If such a chunk is at the very end of the file, we will hit EOF
we need buffered IO (e.g. a file or BufferedReader) to support peeking without reading; it would be great if we could avoid that

We need to figure out how we can change our approach and iterate some more. This really only came up when implementing the Python lib. Maybe we just encode how many chunks there are in the graph header. Or we use a fixed size prefix instead of a varint.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make framing less error prone and requiring buffered IO #2

Make framing less error prone and requiring buffered IO #2

daniel-j-h commented Jan 13, 2023

Make framing less error prone and requiring buffered IO #2

Make framing less error prone and requiring buffered IO #2

Comments

daniel-j-h commented Jan 13, 2023