-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parser for unsaved Windows Notepad tabs #540
Conversation
0158683
to
7d5d5ce
Compare
Refactored the code to work with cstruct v3 after fox-it/dissect.cstruct#73 has been merged. |
Heyo! You guys still working on this? I have gotten quite a bit labeled in my tabstate util. Found you guys mentioned this on John Hammonds tweet about the video going over this format. I know there is a crc32 after the weird timestamp, but I have not implemented that. I was wondering, because looking at the code you guys have, so far, there might be a few things you guys are missing. I don't really understand how this package interprets the C code, but if it's just like normal C structures, I think the magic and header start might be off, here. This format seems to change a lot, depending on the "state" it is in (new tab, unsaved tab and saved tab). I was wondering if maybe @joost-j might have some more work done on this, and maybe we can collaborate to fill out the rest? Here is my tabstate util for Rust. |
Hi @Nordgaren, I think it's a great idea to collaborate on this! I have observed different storage formats as well, but could not yet link them to any action with regards to the "state" yet, so I'm really interested in that as well. Let's try to fill out the rest indeed! As mentioned in your repo, I'll contact you on Discord. |
Hi, I recently caught on to the Notepad tab hype and this plugin seems premature, IMO. It appears that Notepad tab files are a single format with three states:
The file extension can also slightly change depending on if the file is open somewhere else or not ( As it's written, this plugin will attempt to parse unsaved and saved files and will miss a lot of things, even in the unsaved state. For example, these structs can't be right at all: struct multi_block_entry {
uint16 offset;
uleb128 len;
wchar data[len];
char crc32[4];
};
struct single_block_entry {
uint16 offset;
uleb128 len;
wchar data[len];
char unk1;
char crc32[4];
}; In a large file, the offset cannot be a uint16 (65535 max size) because files can have more than 65535 characters. It's more than likely an Where 02 is the offset, 00 is "unknown" (not really unknown, read below), 01 is the uleb128 length, But if I remove that What this all equates to is that if the second With all that said, I've actually managed to figure out the notepad tab format (with the exception of a single u8 field) in all three states. I'll be posting about it soon and I'd recommend waiting before adding this plugin. |
@daddycocoaman will you be contributing your findings? |
I have given all of this data to Joost, myself, a while ago. I will be making a full write-up shortly. I didn't have time when I told him I would, but, everything listen here is in my util, except the crc. I just haven't taken the time to put it in yet. Unexpected life events. haha. So you guys should be able to implement what I have so far, and he markdown file should follow this weekend. Maybe even a .bt file if I can manage! |
I can confirm most of what you said here. It is uleb128, as Joost identified. You got the saved states, pretty much. There is the new tab state (which I believe you are calling unsaved) and additionally a like a "soft save", which happens when notepad closes, but the tab stays open without a filepath. This will write all of the buffer contents to the file, instead of the weird keystroke meme that it seems to be in the new tab state. Did you get the entire metadata structure in the saved file state? That is pretty much the only data I haven't figured out all the way, but I know it's size. you can checkout tabstate-util crate on my GitHub, if you want to cross reference your findings. Might be good. There's some weird curveballs in this format. I believe there still might be some structures that only appear in special conditions. Those are kinda the hardest to work out. I think a good idea now would be to consolidate test files so we can get all of the possible structures available for testing/parsing. |
There is also this issue on my repo which has a lot of good information |
That is a great thread. To answer your question, yes I have all the fields of the saved state, including the cursor locations, timestamp, and a few other things in the format. There's no special delimiters in the format. Everything has a specific value. To answer @Schamper, I hadn't actually heard of dissect before the comments on the John Hammond video. I think I might just put out the format and let people adopt it however they liked. Personally, as a red teamer, I have my own reasons. 😂 |
What did you find the single byte and then 4 byte int that mirrors it around the cursor start and end, to be? |
Heya, fellow red teamer here: Could you maybe elaborate on this part? I'm genuinely curious why you wouldn't just share the information if you have it. Dissect isn't just a blue-team tool, I use it as a red teamer myself all the time as it ships with some pretty sick file parsing capabilities (especially nice if you're don't have 4TB of bandwidth to spare when all you need is like 200 bytes or something). Hell, I'd like to see this plugin get implemented as well, can think of some pretty cool stuff you could do with it If it's losing out on credit you're worried about, external contributors are always credited (as I was here and here) :) |
My It's missing new tab state files (the files with keystrokes instead of just characters) and tabstate files with extra buffers after the main one, for now, but it should be otherwise accurate. |
I'm putting out a blog post through my employer (hopefully reviewed and posted this week once I submit it today). It's less about credit on this repo and more like making sure the work I put in to RE the format is written up and distributed more formally.
Sorry, I misspoke when I said "all" (I was trying to be careful about not saying that). At any rate, the bytes after the cursor end are not a single field, but 4 separate byte fields that represent different boolean options. I've labeled them as The byte before the cursor start always appear to be ``01` and is the field I mentioned earlier that I hadn't figured out. There are a couple of other boolean options related specifically to the Notepad UWF app but I have no idea how to configure them in either Notepad or the registry (like GhostFile or ClassicEditor, which I thought would mean using the old Notepad). I even tried turning off the integrated Copilot 🥲 |
Ah! More Notepad options... Hadn't thought to even test those yet. I hope you'll link the blog post when its published. This has been an interesting exercise and learning experience for me. Not to change topics, have you looked at the Windowstate files? I started to take a stab at those to stop from fixating on the Tabstate files too much. It stores window size and position as one would guess. I've only started trying to figure out the rest of the file. https://github.com/ogmini/Notepad-Windowstate-Buffer |
Amazing. Thank you! So now we just need the byte before the cursors and the byte at the end of those bools, then! Thank you for sharing! I will go label the 3 bools. The last byte could also be padding, maybe? Also, did you find anything for the |
I need to reconfirm but I think that first one is a null terminator for the SHA256. I remember seeing a comparison for 0x00 when it was comparing hashes but that could have been the data. Also, unfortunately, it's not the only thing left. The third byte in the format after "NP" is not part of the magic header. It's used by the .0 and .1 temp state files for unsaved files. |
I've tagged that as the sequence number for the .0 and .1 files. It goes up incrementally. It is also a uLEB128. https://github.com/ogmini/Notepad-Tabstate-Buffer?tab=readme-ov-file#0bin--1bin *Edit Oh, I think I misread your post. You already know about the .0 and .1 file. I've still assumed it to be a sequence number for the bin file. Just that it always appears to be 0x00. |
I have actually noticed that the the 4th byte in the file, which is supposed to be the saved state, is also the count of remaining characters in a file, I think? Could be a uleb128, too, as you mentioned. the 3rd byte in the file does seem to change, and I think I have seen it change with the correct magic, as well. I can't remember for certain. Will try to dig a bit, shortly! |
It would be weird for it to be a null terminator for the sha 256 hash, though, as it is a fixed sized hash in the file. 32 bytes, or 256 bits. But I also wouldn't put it past microsoft. It could also be padding, as well? One thing I find weird is that they are using a varint for a fixed sized int. I assume that is for future proofing, though. So maybe they just have a fixed sized varint (lol) and some extra padding to compensate? That also seems like an out there idea, but just throwing it out there in case it helps anyone else see something different. So far it doesn't look like they care about padding or even alignment, tbh, although I haven't actually sat down to check. Plus, they could be reading the individual bytes for the sha 256 hash, anyhow, so thus no real alignment issues. Should only be an issue if they are reading the type as multiple u32/u64/u128s or a single u256 |
Hi @ogmini, @daddycocoaman, @JustArion , thanks for all the suggestions and tips! The PR was aimed at getting initial support for unsaved tabs into dissect. Then the John Hammond video was published, after which it seemed that more and more about this file format was being researched by multiple people. I had indeed not yet uncovered/reversed all of the fields and structures that were suggested above, which e.g. also takes into account the state of the application (closed/opened). Looks like I can include some more test files covering all different states, to make this plugin more complete. Obviously, I will also incorporate the suggestions that were made in this thread into my code. Unfortunately I have not been able to work on this project for the past few weeks, but I'm planning to pick up the research again in the near future. |
…mestamp,saved_path,sha256) in the fields
I added the |
Although the file format is still changing and we might not have uncovered 100% of the file format, I feel this is a great example of online collaboration to get things done. I therefore submitted a proposal for a talk at the SANS summit in Prague in September 2024, which got accepted last week. 🎉 The goal of the talk is not to explain the file format in full detail, nor to take full credit of uncovering the file format, but to encourage and inspire people to take a (deep) dive into unknown file formats and share the findings with the entire DFIR community! Of course I'll be sure to mention your contributions @Nordgaren @daddycocoaman @ogmini @JustArion, as well as John Hammond's video. And of course, if one of you happens to be there in person, let's discuss this irl! |
Very cool, I hope you can share your slides or talk after the summit. I'm also going to be submitting a talk to BSides-NYC mainly talking about my experience, what I've learned, and my process. Hopefully it is accepted. I'll also make sure to mention you all. I haven't had much of a chance to update this thread since this isn't remotely my day job. I've been working on trying to automate generation of test state files with known parameters to more easily detect changes from future updates. I've also been poking at the application hive which is stored as the I continue to update my notes here: https://github.com/ogmini/Notepad-State-Library |
@daddycocoaman did you make any progress in the meantime or have a estimation on a possible release? I'm eager on how we can improve the implementation further! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #540 +/- ##
==========================================
+ Coverage 75.46% 75.54% +0.07%
==========================================
Files 303 305 +2
Lines 26229 26331 +102
==========================================
+ Hits 19794 19891 +97
- Misses 6435 6440 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…sect.target into feature/windows_notepad_tabs
The test files of this PR should be LFS objects. Could someone fix that? :)
|
Just like Notepad++, the Windows Notepad application for Windows 11 is now able to restore unsaved tabs when you re-open the application. This blog explains where it is stored and that you should be able to somehow view the contents.
It turned out that there was a bit more to it than running
strings
orgrep
on it. The application stores the tabs in different formats, depending on the size and some other unknown factors. I've even encountered a file where 26 characters of text were encoded as 34(?!) separate blocks; each block containing a length field, a single character and a CRC32 checksum of that very small block. Usinggrep
orstrings
on that file would not have yielded any results. Information stored in these Notepad tabs may be helpful during forensic investigations and/or incident response cases.The file format uses LEB128 variable-integer encoding for the block sizes, which is not yet present in the
dissect
framework. Therefore, this PR depends on fox-it/dissect.cstruct#69, so in the end it depends on a new major release ofdissect.cstruct
.The
dissect/target/plugins/aps/texteditor
folder was created, with corresponding new record types, so unsaved tabs from other text editors can also be added in the future.