-
Notifications
You must be signed in to change notification settings - Fork 0
Stream Stores: An inspiration issue of py2store
This issue of Sep 2019 is reproduced below since it is an expression of the need for creek
tools.
Streams is another common source and target of data, so we'd like to create tools to do the usual py2store stuff: rein different interfaces to a common one, and adapt to various data particularities.
The relevant builtin is https://docs.python.org/3/library/io.html.
What makes a data source or target a "stream" as opposed to other forms, such as key-value or sequence (list-like) store? We need to draw relationships between the different interfaces, and words used to describe the different functionalities within.
The stream interface seems to be built around the concept of a file's content, along with the possibility of an unbounded source of data.
- Streams have concepts like
read
andwrite
, which we also have in key-value or sequence constructs. For streams the read is more of an iterator of content, and write more of an append to content. - Streams have
readline
andwriteline
, pointing to the fact that content is assumed to be structured (above the atomic byte or bit, but below the source's location, metadata, etc.) - Streams have a concept of open, flush, and close, which is not needed in the key-value or sequence interfaces (why? because actions are assumed to be effective immediately?)
So we still get many key-value or sequence perspectives. A stream is a sequence of bytes, or lines. Seek navigates the sequence, tell gives us a key (a position) of the seek cursor, etc.
Hierarchical groupings
Since streams have a strong legacy to "file contents", it has this hierarchy of data.
bit --> byte --> [byte_word] --> [line] --> file
(the "[]" means optional here)
Byte words are fixed (usually) size byte sequences that should be taken as atomic when decoding. For example:
- Bytes of audio (you've seen PCM16 and PCM32 for example.
- Text encodings: Often 2 or 4 bytes as well.
Lines: Often of different sizes, but when there's lines, they're usually of a byte size that is a multiple of the word size.