Can I use TinyDB effectively with files which are larger than the memory available on my computer? #507
Replies: 2 comments
-
This is the read function of the default storage used. def read(self) -> Optional[Dict[str, Dict[str, Any]]]:
# Get the file size by moving the cursor to the file end and reading
# its location
self._handle.seek(0, os.SEEK_END)
size = self._handle.tell()
if not size:
# File is empty, so we return ``None`` so TinyDB can properly
# initialize the database
return None
else:
# Return the cursor to the beginning of the file
self._handle.seek(0)
# Load the JSON contents of the file
return json.load(self._handle) In the end it just does You could create your own Storage class for TinyDB which uses something like ijson. |
Beta Was this translation helpful? Give feedback.
-
Hey @patrick-nicodemus, I fear TinyDB might not be a good fit for your use case. Reading a 8 GB JSON file will be a challenge in almost every programming language. Even just naively reading the file without parsing would require 8 GB of RAM, so one would need to resort to some sort of streaming JSON parser. Implementing something like this is beyond the scope of TinyDB in my view. It would be technically possible to implement something like this, but at that point we'd definitely be beyond "Tiny" 🙂 |
Beta Was this translation helpful? Give feedback.
-
It looks like if you use the standard json storage backend, all of TinyDB's operations are done through interaction with the object provided by the json.load operation in the standard memory. I do not know the semantics of this exactly. If I have an 8GB json file and I want to get a single document from it, by an id, will the standard get() function try to read the entire file into memory? Or if I call the built-in iterator for tables, will this read lazily from the file?
Similarly for questions about writing to the end of the file. I understand the whole file has to be scanned through but my question is whether this data is kept in memory.
My primary intended use case is using TinyDB as a place to store excess data doing computations which require more memory than I have available so it's important whether TinyDB itself uses a lot of memory.
Beta Was this translation helpful? Give feedback.
All reactions