-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out strategy for managing memory and disk space #20
Comments
It appears as though we run into memory issues if the directory being watched for new images ( |
Instead of deleting images we could also just copy them to another directory that's not being watched and that would solve the memory issue but eventually exhaust disk space. |
That is an interesting problem. That raises some questions:
|
Good questions @postfalk. The only purpose for retaining them would be for backup I suppose, and if we are certain they made it to S3, I don't know how important that is. I like the idea of the random deletion above some threshold, but if we start to have an incomplete backup and we are confident that any images that are queued for deletion are already in Animl, it's a little hard to picture the scenario in which those backup images would come in handy. A more plausible scenario is that a base station goes offline so it can't upload images for a long time but is still receiving them. In that case we would want to make sure that there is ample disk space and memory to handle a long internet outage, so I guess maintaining a lot of headroom on both counts would be the best strategy. |
Agreed. However, a good design would be that would do something sensible when we hit the boundary. In which case it is really a decision. Do we want to have an increasing blurry picture of the past or do we just throw it out in favor of new incoming data. |
Ok so given that we're going to keep the threshold pretty low (maybe say 25k images), I'm leaning towards just deleting the oldest images once we reach that threshold. If we were to just randomly remove 1 out of every 5 images that would mean we could retain a slightly longer record of the data (i.e. the time extent of the data would be 20% longer) at the cost of that data being 20% blurrier, right? I don't have strong feelings, but making a hard cutoff and retaining an accurate backup of the the 25k most recent images that were successfully uploaded seems simplest and is a pretty reasonable strategy. What do you think? |
Sure. One useful consideration in the math might be that we usually shot more than one image of the same animal. So the information we retain would be still more precise than if the images would be entirely random. BUT I think deleting the oldest ones is sensible as well. |
Ok after a bit more thought I think this is the path forward I am going to pursue. I think one important thing to note is there are two separate but related problems here: the first being that chokidar consumes a lot of memory as the number of watched files grows, and the second is how to manage available disk space during normal operation (in which images are getting uploaded but we may want to retain a backup of uploaded images) and during internet outages (in which images will pile up on the drive and eventually exhaust the disk space). I think the following solution would address both:
Basically, have some fixed amount of disk space shared between the un-uploaded images in the |
We could either (a) delete images immediately once they're uploaded to s3 or (b) set a storage threshold, and once that's reached delete the oldest files as new ones come in.
The text was updated successfully, but these errors were encountered: