-
Notifications
You must be signed in to change notification settings - Fork 0
PPM format
Animations created within Flipnote Studio are stored in the PPM format. The file extension comes from Flipnote Studio's original working title, Para Para Manga Koubou (translated as "Flipbook Workshop").
Offset | Type | Details |
---|---|---|
0x0 | char[4] | File magic (PARA ) |
0x4 | uint32 | Animation data size |
0x8 | uint32 | Sound data size |
0xC | uint16 | Frame count |
0xE | uint16 | Format version - always 0x24
|
The app checks that (version & 0xf0) >> 4 != 0
to decide whether it should read the metadata section. This is always the case however, and it's likely a leftover from development.
Note: The frame count in this header is 0 indexed, so you must add 1 in order to get the actual frame count.
The metadata section follows directly after the header, starting at 0x10
:
Offset | Type | Details |
---|---|---|
0x0 | uint16 | Lock, 0 if unlocked, 1 if locked |
0x2 | uint16 | Thumbnail frame index |
0x4 | wchar[11] | Root author name |
0x1A | wchar[11] | Parent author name |
0x30 | wchar[11] | Current author name |
0x46 | byte[8] | Parent author ID |
0x4E | byte[8] | Current author ID |
0x56 | byte[18] | Parent filename |
0x68 | byte[18] | Current filename |
0x7A | byte[8] | Root author ID |
0x82 | byte[8] | Root filename fragment |
0x8A | uint32 | Last modified timestamp |
0x8E | uint16 | Padding, always null |
Frame count starts at 0, and should be incremented by 1 when displayed.
Author names are null-padded UTF-16 LE strings. Author IDs are also stored in little-endian byte order, so you may need to reverse them.
Parent and current filenames are stored as:
- 3 bytes representing the last 6 digits of the consoles's MAC address
- 13-character string
- uint16 edit counter
The root filename fragment is stored as:
- 3 bytes representing the last 6 digits of the consoles's MAC address
- 5 bytes representing the first 10 characters of the 13-character string
Filenames are formatted as <3-byte MAC as hex>_<13-character string>_<edit counter as a zero-padded 3 digit number>
, eg F78DA8_14768882B56B8_030
. See this page for more information on the ID and filename formats.
Timestamps are stored as the number of seconds since midnight on the 1st of January, 2000.
The Flipnote thumbnail starts at 0xA0 and is 1536 bytes long.
Thumbnail images are 64 x 48 and arranged in a series of 8 x 8 tiles. Pixels are stored as 4-bit palette indices, referencing a hardcoded color palette.
Pseudocode:
# create an image that is 64 pixels wide and 48 pixels high
image = Image(64, 48)
# read thumbnail data
data = file.read_bytes(1536)
data_offset = 0
for tile_y = 0; tile_y < 48; tile_y += 8:
for tile_x := 0; tile_x < 64; tile_x += 8:
for line := 0; line < 8; line += 1:
for pixel := 0; pixel < 8; pixel += 2:
x = tile_x + pixel
y = tile_y + line
image.SetPixel(x, y, data[data_offset] & 0x0F)
image.SetPixel(x + 1, y, (data[data_offset] >> 4) & 0x0F)
data_offset += 1
Index | Hex color |
---|---|
0 | #FFFFFF |
1 | #525252 |
2 | #FFFFFF |
3 | #9C9C9C |
4 | #FF4844 |
5 | #C8514F |
6 | #FFADAC |
7 | #00FF00 |
8 | #4840FF |
9 | #514FB8 |
10 | #ADABFF |
11 | #00FF00 |
12 | #B657B7 |
13 | #00FF00 |
14 | #00FF00 |
15 | #00FF00 |
The animation header starts at 0x06A0
.
Type | Details |
---|---|
uint16 | Size of the frame offset table |
uint32 | Unknown, always seen as 0 |
uint16 | Flags |
Following the animation header is a table of uint32 offsets for each frame. These offsets are relative to the start of the animation data section.
Bitmask | Details |
---|---|
flags & 0x1 | Unknown, potentially used |
flags & 0x2 | Loop Flipnote playback if set |
flags & 0x4 | Unknown, potentially used |
flags & 0x8 | Unknown, potentially used |
flags & 0x10 | Hide layer 1 if set |
flags & 0x20 | Hide layer 2 if set |
flags & 0x40 | Always set |
The animation data begins at 0x06A8 + the size of the frame offset table
. Frames are not necessarily stored in playback sequence, and can sometimes share the same offset.
Frames are 256 x 192 pixels and comprise of two image layers plus a "paper" background. The paper is either black or white, and layers can be red, blue, or the inverse of the paper color.
Each layer is a 1-bit monochrome bitmap with some basic compression done on a (horizontal) line-by-line basis to make the file more space-efficient.
Every frame begins with at least a one-byte header:
Data | Details |
---|---|
(header >> 7) & 0x1 | Frame type |
(header >> 5) & 0x3 | Frame translate flag |
(header >> 3) & 0x3 | Pen color for layer 2 |
(header >> 1) & 0x3 | Pen color for layer 1 |
header & 0x1 | Paper color |
If the frame type is 0
, then frame diffing is used on this frame, and if the frame translate flag is also set to anything besides 0
then the header also contains 2 int8 values which give the x and y position of the previous frame compared to the current one. This is covered in more detail in the frame diffing section.
Color index | Name | Hex code |
---|---|---|
0 | black | 0x0e0e0e |
1 | white | 0xffffff |
Color index | Name | Hex code |
---|---|---|
0 | inverse of paper (not used under normal circumstances) | - |
1 | inverse of paper | - |
2 | red | #ff2a2a |
3 | blue | #0a39ff |
After the frame header, a series of 2-bit values that represent the encoding method used for every line in each layer. The following pseudocode will unpack these line encoding values -- this should be done for both layers:
# array type should be uint8
line_encoding = Array(192)
line_index = 0
# unpack 48 bytes into 192 2-bit line encoding types
for byte_offset = 0; byte_offset < 48; byte_offset += 1:
byte = file.readUint8()
# each line's encoding type is stored as a 2-bit value
for bit_offset = 0; bit_offset < 8; bit_offset += 2:
line_encoding[line_index] = (byte >> bit_offset) & 0x03
line_index += 1
Following the layer encoding values (48 bytes per layer) is the compressed frame data itself. It begins with the top layer followed by the bottom layer.
Layers are compressed horizontally, line-by-line. The layer encoding values indicate the type of compression used for each line.
Once decompressed, a line is represented as a list of 256 pixels. A pixel's value will be 0
if it's transparent, or 1
if it uses the layer's pen color. The layer's pen color can be found from the frame's header.
No data is stored for this line, it is empty and can skipped.
This line is compressed. Compression works by splitting each line into 8-pixel 'chunks' (32 in total) with bitflags to indicate whether a particular chunk is used or not. The line data begins with 32 bits for the chunk flags, followed by the chunk data. If a chunk flag is 1 then you read one byte from the chunk data, otherwise you can skip ahead 8 pixels and try the next chunk flag.
Pseudocode:
line = Array(256)
pixel = 0
# read chunk flags
# they're easier to work with if read as a single big-endian uint32
chunk_flags = file.read_uint32(bigendian=true)
while chunk_flags & 0xFFFFFFFF:
# check the highest chunk flag is set
if chunk_flags & 0x80000000:
chunk = file.read_uint8()
# unpack each bit of the chunk
for bit = 0; bit < 8; bit += 1:
line[pixel] = chunk >> bit & 0x1
pixel += 1
else:
# skip -- no data is stored for this chunk
pixel += 8
chunk_flags <<= 1
The same as type 1, except the pixels in this line are first set to 1 before decoding.
Pseudocode:
line = Array(256)
pixel = 0
for i = 0; i < 256; i += 1:
line[i] = 1
# ... continue reading the line the same way as line type 1
Like line type 1 except every chunk is used, so there's no need for the chunk flags.
Pseudocode:
line = Array(256)
pixel = 0
while pixel < 256:
chunk = file.read_uint8()
# unpack each bit of the chunk
for bit = 0; bit < 8; bit += 1:
line[pixel] = chunk >> bit & 0x1
pixel += 1
If the frame type flag in the frame header is set to 1
, then this frame is only stores the difference since the last frame. To produce a complete image, the current frame has to be merged with the previous frame by XORing each layer with the one form the previous frame.
Pseudocode to do this:
# loop through lines
for y = 0; y < 192; y += 1:
# skip to next line if this one falls off the top edge of the screen
if y - translation_y < 0:
continue
# stop once the bottom screen edge has been reached
if y - translation_y >= 192:
break
# loop through pixels
for x = 0; x < 256; x += 1:
# skip to the next pixel if this one falls off the left edge of the screen
if x - translation_x < 0:
continue
# stop diffing this line once the right screen edge has been reached
if x - translation_x >= 256:
break
# merge pixels with a binary XOR
# assumes each layer is a 2d array
# translation_x and translation_y should be read from the frame header if the translation flag is set,
# else they default to 0
layer_1[y][x] ^= prev_layer_1[y - translation_y][x - translation_x]
layer_2[y][x] ^= prev_layer_2[y - translation_y][x - translation_x]
The sound effect flags begin at 0x6A0 + the animation data size
, and are one byte per frame indicating if each sound effect is played on that frame.
Data | Details |
---|---|
flags & 0x1 | SE1 played |
flags & 0x2 | SE2 played |
flags & 0x4 | SE3 played |
The sound header offset can be calculated as 0x6A0 + the animation data size + the number of frames
, rounded up to the nearest multiple of 4.
Type | Details |
---|---|
uint32 | BGM track size |
uint32 | SE1 track size |
uint32 | SE2 track size |
uint32 | SE3 track size |
uint8 | Frame playback speed |
uint8 | Frame playback speed when recording bgm |
char[14] | Null padding |
Frame speed values are reversed for whatever reason, you must subtract them from 8 to get the real frame speed.
Value | Frames per second |
---|---|
1 | 1 / 2 |
2 | 1 / 1 |
3 | 2 / 1 |
4 | 4 / 1 |
5 | 6 / 1 |
6 | 12 / 1 |
7 | 20 / 1 |
8 | 30 / 1 |
Sound tracks are stored in the order of BGM, SE1, SE2 then SE3.
Each track is monochannel IMA ADPCM audio sampled at 8192 Hz with the nibbles reversed. 1 second of audio is about 4096 bytes long.
The BGM track can be no longer than 60 seconds. Each sound effect track can be no longer than 2 seconds.
If the playback speed at the time of recording is different to the current playback speed, the BGM sample rate must be adjusted using the following formula:
8192 * (1 / recording frames per second) / (1 / current frames per second);
You can decode raw Flipnote audio using sox:
sox -t ima -N -r 8192 [input.adpcm] [output.wav]
Or encode it:
sox [input.wav] -t ima -N -r 8192 [output.adpcm]
The last 144 bytes of a PPM is an RSA-1024 SHA-1 signature over the rest of the file, followed by 16 bytes of null padding.