Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing data in file update stream #19

Open
theonewolf opened this issue Mar 20, 2014 · 4 comments
Open

Missing data in file update stream #19

theonewolf opened this issue Mar 20, 2014 · 4 comments

Comments

@theonewolf
Copy link
Contributor

There appears to be missing data bytes in the file update stream when doing tests tracking a log file (syslog with timestamps embedded).

First reported by @hsj0660.

@theonewolf
Copy link
Contributor Author

Looks like there is corrupted binary data on the block boundaries. This could account for the missing data.

@theonewolf theonewolf added the ext4 label May 3, 2014
@theonewolf
Copy link
Contributor Author

Related to this, when file size increases we try and pull the last block and ship its data out (as it may never be written again). We appear to be in some cases pulling the wrong block/data:

-------------------
Message on Channel: blizzard:485bc61a-e7f3-4a1b-9383-e420046d969b:/home/wolf/scratch
        field           :       file.size
        new             :       4120
        old             :       4083
        transa          :       8940
        type            :       metadata


-------------------
Message on Channel: blizzard:485bc61a-e7f3-4a1b-9383-e420046d969b:/home/wolf/scratch
        end             :       4120
        start           :       4096
        transa          :       8940
        type            :       data
        write           :       #!/bin/bash
sudo apt-get

@theonewolf
Copy link
Contributor Author

Also note that we might need to pull more than one block to see the new "valid" data, not just the final block.

When we cross block boundaries this is most visible: the final block changes to a new one, thus we might miss data at the end of the "previous" final block. We might need to go back through an arbitrary number of blocks to get data written that is newly associated with a file based on file size update.

@theonewolf
Copy link
Contributor Author

This code needs to change to properly look up the blocks that are included in the file size update:

https://github.com/cmusatyalab/gammaray/blob/master/src/gray-inferencer/deep_inspection.c#L1435

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant