-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards SMAZ decompression #1
Comments
Hi - I'm very impressed with the puzzling you've done here. I've spent quite a lot of time working on this, but I've been quite busy lately so it's been hard to concentrate, and I ended up banging my head on a brick wall. It's going to take me a few days to release the next part of the video, but I'll mention what I discovered... when the system boots, it uses a built-in boot-ROM to load firmware from the flash memory. The ROM passes the firmware through a hardware unit called the DPU, that decompresses the SMAZ data and computes the checksum. I was able to write some software that is able to load SMAZ data into RAM, and trigger the DPU and print the results. I used this method to fuzz the DPU with a python script. This is quite slow, because I couldn't figure out how to reset the DPU without a hardware reset, so I have to hard-reset the system every test, so every test takes about 20-seconds. The script logs the results in msgpack format which is easy to load in python. From the test data, I started building a [python smaz decoder](https://github.com/jhol/otl-lkv373a-tools/blob/master/smedia/smaz.py. Have you seen this? To help me do this, I have a script If the DPU has any error decoding the SMAZ, it will abort but the data it copies back to RAM may not include the last few bytes before the error. Therefore, in my testing, I added some sync chunks like this: By this method, I was able to make a corpus of >14000 tests. Of these 1457 were decoded successfully by the DPU, and my script is currently able to decode 800. But as you may have seen, my decoder is a complete mess - I keep finding more and more unexpected behavior, and there are several patterns of behaviour that I don't understand correctly. Here is my collection of test results: Would you be interested in hacking on the SMAZ decoder? I am happy to answer any questions. Also, if it would help, I can give you remote access to my rig so that you can run your own tests. Thanks for looking at this. I was hoping someone would help me figure it out. Joel |
Hi folks, In the past I tried to break the compression scheme of the SMAZ structure. Unfortunately without much progress, comparing to you. However my approach was quite different as I had no access to the hardware decompressor for fuzzing it. Instead I found in compressed buffer a place, where compression worked worst because of a lot of non-repeating input and the data was plain text, so I could easily identify which byte is data, which control stream. That way I was able to decompress few hundred bytes and guess simplest control commands. If someone is interested in the results, I can share my notes (most of them are on paper). I was doing this kind of black box approach before, on another unrelated compression scheme (also LZSS variation, like here) and I think it is possible to finish it that way. Especially useful might be getting completely decompressed SMAZ structures that are contained in firmware updates, which is now possible thanks to jhol's work. This would allow to confirm the guesswork required to make progress. By the way, you guys, both, did a really great job! Regards, |
Well, let's try to spare your head (and the wall) further damage from this compression scheme.
I have seen it, and it's what pointed me to the backref offset being the high seven bits, which might have taken me a bit longer to figure out otherwise. At the same time, you were/are clearly struggling to produce something without having a coherent model of the overall compression scheme. An essentially-stateless approach to an inherently-stateful control scheme. Looking at it again now, it seems that you have implemented some things that I haven't worked out how to trigger yet, so I'll need to give it a closer reading now that I know more of what I'm looking at.
Unfortunately, brute force and ignorance doesn't always work, and not having the time to sit down and work out the details doesn't help.
Thank you, that gives me a much easier angle than the multi-file differential analysis that I had been considering for the two mostly-ASCII regions that I had found. From about a day of looking at the corpus data, it appears that the decoder has at least three states, and there's both a backreference offset storage register and a copy length shift register. Looks like you found one of them and brute-forced the other?
I'm more likely to attempt to write my own decoder, but we'll see how things go.
Let's hold off on remote access for the moment. I have enough material to keep me busy for a while.
You're welcome. It's been an interesting challenge so far. |
Someone managed to find the SDK: It contains a tool mkrom: and mkrom.exe: There are two kinds of compression supported Jedi (old style - what we have been decoding), P9850. I think this code describes the decompression algorithm:
|
@v3l0c1r4pt0r the CPU core is OpenRISC |
Man, this is some breakthrough! |
Good find!
Seems very likely. And that leads to http://www.oberhumer.com/opensource/ucl/ which contains another version. I was already suspecting that there was a drastic simplification to be had with my current model, but I wasn't expecting this. It's bits, until it's suddenly possiblly-unaligned dibits, until it's suddenly "let's read a byte and use that". I'm not finding documentation on the actual compression, though, just source code. |
@jhol Wow, what a find! Sounds like you're going to make another video on this one? |
Wait, what? I have seen that library in the past. It seems that I was a lot closer to figuring this out than I thought. I can't believe it was that simple. Never mind. Now this sounds like a lot of new information. I have to start updating my wiki... |
I attempted at decompressing with original UCL library and for now it is moderate success. I am definitely getting some data that is not garbage, but no SMAZ was decompressed completely: |
Ok guys, I have a feeling that it would be feasible to create a fully open firmware with cool features. Won't you consider creating a team, splitting the work and start a crowdfunding? |
I started this because I was interested in the possibility of building a cheapo HDMI->RTSP/RTMP server. I suspect there is quite a bit about the SDK that is highly sketchy - which is typical for chinese SDKs. Therefore, my approach would be to start building a from-scratch OpenRISC FreeRTOS build and get the major chip features working one by one. I would also like to know more about MU1 vs U2. Are they both the same chip? Why does the LKV board have two ITE processors on it? This SDK is legally tricky. Since we needed to read the code to do this work, we can no longer make this a clean-room project. It would be dependent on ITE headers. I guess a user might be able to download the SDK separately, but there are still legal issues to consider. Also, there's a lot of open source software mixed with ITE proprietary information. IANAL, so I don't know what is and isn't allowed. Also, how do folks feel about this platform? It's an interesting device to play with, but is it really interesting as a development platform? The platform is now quite old. I don't think it can fully capture 1080p . I don't think it can do 60FPS, and I'm sure it can't handle 4K. So I would caution against sinking lots of time into something that may soon go obsolete, unless you are 100% convinced it is a worthwhile long-term effort. |
I doubt that this stupid decompression algorithm is hard-coded in the DPU. It is much more likely that the microcode at 0x70 contains the instructions for the DPU with this decompression algorithm, written to the shared Memory Management Port (MMP), rather than for some video processing algorithm in a Multimedia Pipeline (MMP). If one of the 0xac0 microcode bytes was altered slightly, I bet it would affect how the DPU decompresses the data. |
@v3l0c1r4pt0r , I took the original code (https://gist.github.com/danielkucera/6ed4266d8095795adf3e225b266f3b75) and tried to decompress the same file:
so I guess the difference is only in the bug handler in |
(and it's giving the same results) |
Found two other repos with diffrent version of sdk: |
@patapovich Interesting! It seems to have more implementations of both the SMAZ compressor and decompressor: https://github.com/KennyOP2/JEDI_9910_Fengyun/blob/master/sdk/example/decompress/decompresstest/main.c#L700 |
it's also interresting to see what other devices use this platform: https://github.com/KennyOP2/JEDI_9910_Fengyun_FY280/blob/master/project/hw_grabber/it_usbd_pcgrabber.cpp#L84 |
I am almost sure that I've seen openrisc when trying to identify ISA. It does not seem to be the same as this. In SDK i found reference to different architecture called sm32: http://219.87.84.106/blob/~bsp2%2Fbr06p.git/br06p/ITE_Castor3_SDK%2Fopenrtos%2Ftoolchain.cmake#L35 By the way this file mentiones also ARM architecture as chances are, blobs guarded by SMAZ are ordinary ARM code. |
@v3l0c1r4pt0r I was referencing this document for or1k: https://raw.githubusercontent.com/openrisc/doc/master/openrisc-arch-1.3-rev1.pdf . I got a hint from someone in my Twitter DMs. The similarity of the 0x38 ALU instruction is particularly noticeable. |
Ok, this is indeed the same architecture as ours. I have just compiled objdump for or1k architecture. Should be possible to disassemble the firmware code that way. |
Getting back to the topic, I tweaked my code a little bit and seems that I have complete SMAZ decompressed. Strings call seems to confirm that:
Yesterday I forgot that SMAZ is split into multiple compressed chunk, so I was getting only the first chunk. Adding a loop seems to solve the problem :) I updated the code on gist. I have to write the code from scratch, though. Now it is a mess. |
@v3l0c1r4pt0r Amazing! - from the SDK, it looks like the checksum is a CRC32 based on the decoded data. It should be possible to determine the polynomial now that you have decompressed data |
@jhol, I wasn't able to confirm this. Maybe my output is corrupted somehow. The fact that it seems to be fine does not mean it is. But I tried to check CRC of whole ITEPKG and I was more lucky with that one. The last 4 bytes of ITEPKG are CRC as calculated by ugCalcBufferCrc function. Here is the output of very simple program calling it:
Note little endian order of bytes. Edit: I have to check what did they do wrong, so normal crc returns different value. The lookup tables are the same, so it should work, but isn't. |
The CRC field is located before the SMAZ signature. It goes:
...SMAZ Compressed chunks Looking at the SDK, I think it's a CRC32 of the decompressed data. Though I'm not sure about the bit-order, endianness, or polynomial of the CRC. |
I was able to get the specification for the IT985x family. https://drive.google.com/open?id=1FqdPdRaIyqWa7r1fb6J9qPavFpsRdjQd |
Either way, I am pretty sure the device is using a IT978 but sadly there isn't any public information or leaked documents with information about it. |
@FFY00 The description in this document matches more or less what I've seen inside packages. There is indeed ARM core somewhere and one of SMAZ contains some ARM program (at least decompiling it returns reasonable results). The audio core is, I think, on encoder (uncompressed one), which is weird, because the documentation says about three cores on application(?) SoC. Maybe both processors are in fact very similar, but with different programs? By the way I made a tool to extract ITEPKG image into chunks (or actually finished what I did years ago). For now only ITEPKG itself, no SMEDIA nor SMAZ, but eases things significantly. I see that I started some work on SMEDIA and SMAZ also, but have to review that before pushing to Github. It is here: https://github.com/v3l0c1r4pt0r/ittk/tree/devel Any feedback, help, pull requests appreciated. |
The IT97x and IT985x seem very similar but only the IT978 matches the specs (1080p). http://soc.ite.com.tw/index.php/products/it970-series |
Are you sure LKV373A is capable of 1080p? Mine is by default in 720p mode. I know in some firmware there is a switch, but I never tried if it is possible to use it. |
I mean the manufacturer website says so. Maybe it's just a trick in the firmware but it is possible. |
Yeah, I know they say so. Either way, probably I wouldn't buy it. 720p is a bit too low quality these days. I hope they didn't lie about that. We'll see, I am not so far from knowing the interfaces. All I need to know now is what is the base address for this ARM blob and RISC ones. Then it should be possible to learn what communication channels we have inside. |
Do you have the datasheet? That is not from the document I sent. |
@v3l0c1r4pt0r - I just tried decompressing some SMAZ with your gist. I seem to get plenty of plausable strings, but also lots of errors:
Is it confirmed that the compression scheme is 100% compatible with the UCL decompressor? |
What does the error 0xffffff36 mean ? |
Probably not exactly: http://219.87.84.106/blob/~bsp2%2Fbr06p.git/br06p/ITE_Castor3_SDK%2Fsdk%2Fdriver%2Fxcpu_master%2Fdecompress.c#L248 @Robinson-George I have no idea, I looked into nrv2e code and I don't see where it come from |
@v3l0c1r4pt0r |
I registered #lkv373a on freenode, if anyone wants to join you are welcome. |
I just published my own take on an SMAZ decoder based on code borrowed from the SDK: https://github.com/jhol/otl-lkv373a-tools/tree/master/smazdec |
sdk link seems to be down. anyone manage to save a copy and willing to share? |
After four days of attempting to figure out this compression scheme without reference to actual hardware, this is what I have:
The compressed data stream consists of control bytes and data bytes, interleaved, with the first byte being a control byte. Interpreting each control byte consumes some number of data bytes, after which the next byte will be another control byte.
The decompressor can be modeled as a state machine with two states, currently the "default" state and the "alternate" state, starting in the default state (hence the names).
Control bytes are divided into four "dibits" (two-bit units) from most significant to least significant, and the interpretation of these dibits is state-dependent. To be explicit about the "dibit" mapping, the byte 0x1b would be dibits "00", "01", "10", and "11", in order.
In the default state, the dibits appear to have the following meaning:
In the alternate state, the dibits appear to have the following meaning:
Backreference bytes have two (or three?) interpretations. If the low bit is 0, the high seven bits indicate an earlier position in the output stream from which to copy two bytes (specifically, shift the backreference byte right by one bit, add one, and subtract this value from the current position). If the low bit is 1, the same applies, but copy three bytes. There may be a special semantic if both the low and high bits are 1, or there may not.
Extended operations are signaled by a 00 dibit in the control stream and consume a second dibit to select an operation to perform (technically, this means that the state machine has three or four states, especially since the extended operations occasionally seem to be split across a control-byte boundary).
The extended operations are currently believed to be:
Overall, the two-state-and-dibit model seems to be fairly solid, while the specific extended operations are considerably sketchier.
The text was updated successfully, but these errors were encountered: