-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing data in file update stream #19
Comments
Looks like there is corrupted binary data on the block boundaries. This could account for the missing data. |
Related to this, when file size increases we try and pull the last block and ship its data out (as it may never be written again). We appear to be in some cases pulling the wrong block/data:
|
Also note that we might need to pull more than one block to see the new "valid" data, not just the final block. When we cross block boundaries this is most visible: the final block changes to a new one, thus we might miss data at the end of the "previous" final block. We might need to go back through an arbitrary number of blocks to get data written that is newly associated with a file based on file size update. |
This code needs to change to properly look up the blocks that are included in the file size update: https://github.com/cmusatyalab/gammaray/blob/master/src/gray-inferencer/deep_inspection.c#L1435 |
There appears to be missing data bytes in the file update stream when doing tests tracking a log file (syslog with timestamps embedded).
First reported by @hsj0660.
The text was updated successfully, but these errors were encountered: