diff --git a/UNIXFS.md b/UNIXFS.md index a53c7af2c..eb5492a97 100644 --- a/UNIXFS.md +++ b/UNIXFS.md @@ -74,15 +74,139 @@ message UnixTime { } ``` -This `Data` object is used for all non-leaf nodes in Unixfs. +### IPLD `dag-pb` -For files that are comprised of more than a single block, the 'Type' field will be set to 'File', the 'filesize' field will be set to the total number of bytes in the file (not the graph structure) represented by this node, and 'blocksizes' will contain a list of the filesizes of each child node. +A very important spec for unixfs is the `dag-pb` IPLD spec: https://ipld.io/specs/codecs/dag-pb/spec/ -This data is serialized and placed inside the 'Data' field of the outer merkledag protobuf, which also contains the actual links to the child nodes of this object. +```protobuf +message PBLink { + // binary CID (with no multibase prefix) of the target object + optional bytes Hash = 1; + + // UTF-8 string name + optional string Name = 2; + + // cumulative size of target object + optional uint64 Tsize = 3; // also known as dagsize +} + +message PBNode { + // refs to other objects + repeated PBLink Links = 2; + + // opaque user data + optional bytes Data = 1; +} +``` + +The two different schemas plays together and it is important to understand their different effect, +- `dag-pb` also named `PBNode` is the "outside" protobuf message, it is the first one you decode. It contain the list of links. +- `Message` is the "inside" protobuf message, this can be decoded by first decoding the `PBNode` object and then decoding `Message` from the `PBNode.Data` field, this will contain all the rest of information. + +This mean we deal with protobuf inside protobuf. + +## How to read a File + +First you get some CID OOB, this will be what will we be trying to decode. + +This CID MUST include: +1. A [multicodec](https://github.com/multiformats/multicodec), also called codec. +2. A [Multihash](https://github.com/multiformats/multihash) (used to specify a hashing algorithm, some hashing parameters and some digest) + +### Get the block + +The first step is to get the block, by get the block we mean get the actually bytes which hashed by the multihash give you the same multihash back. +This step can be achived in many ways (bitswap, downloading a car file, ...) all we care about is that you got the bytes and you confirmed that they are correct using the hashing function provided in the CID. + +This step will be repeated when downloading any block and thus will be implicitly assumed to be done when downloading any block. + +### Start decoding the bytes + +With Unixfs we deal with two codecs which will be decoded differently: +- `Raw`, single blocks files +- `Dag-PB`, possibly multi-block files (single block is limited to 2MiB but it may point to childrens, joining them) + +#### `Raw` files + +The most simplest file is a `Raw` file. + +They can be recognised because their CIDs have `Raw` codec. + +Their contents is purely the block body. + +They never have any childs, and thus are also known as single block file. + +Their size (both `dagsize` and `blocksize`) is the length of the block body. + +##### `Raw` Example + +Let's build a `Raw` file whoses content is `test`. + +1. First hash the data: +```console +$ echo -n "test" | sha256sum +9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 - +``` + +2. Add the CID header: +``` +f this is the multibase prefix, we need it because we are working with a hex CID, this is omitted for binary CIDs + 01 the CID version, here one + 55 the codec, here we MUST use Raw because this is a Raw file + 12 the hashing function used, here sha256 + 20 the digest length 32 bytes + 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 the digest we computed earlier +``` + +3. Profit +Assuming we stored this block in some implementation of our choice which makes it accessible to our client, we can try to decode it: +```console +$ ipfs cat f015512209f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 +test +``` + +#### `Dag-PB` nodes + +Thoses nodes supports many different types (found in `decodeMessage(PBNode.Data).Type`), files are the only allowed one when decoding files. -For files comprised of a single block, the 'Type' field will be set to 'File', 'filesize' will be set to the total number of bytes in the file and the file data will be stored in the 'Data' field. +##### The sister-lists `PBNode.Links` and `decodeMessage(PBNode.Data).blocksizes` -## Metadata +The sister-lists are the key point of why `dag-pb` is important. + +This allows us to concatenate files together. + +Linked files might be loaded recursively with the same process. + +Childs might be any file (so `Dag-PB` where type is `File` or `Raw`) + +For example this example pseudo-json block: +```json +{ + "Links": [{"Hash":"Qmfoo"}, {"Hash":"Qmbar"}], + "Data": { + "blocksizes": [20, 30] + } +} +``` + +This indicates that this file is the concatenation of the `Qmfoo` and `Qmbar` files. + +So when reading this file, +the `blocksizes` array give us the size in bytes of the child files, each index in `blocksizes` give the value at the same index in `Links`. + +This allows to do fast indexing into the file, for example if someone is trying to read bytes 25 to 35 we can compute an offset list by summing all previous indexes in `blocksizes`, then do a search to find which indexes contain the range we are intrested in. + +For example here the offset list would be `[0, 20]` and thus we know we only need to download `Qmbar` to get the range we are intrested in. + +If `blocksizes` or `Links` are not of the same length the block is invalid. + +##### `decodeMessage(PBNode.Data).Data` + +This field is an array of bytes, it is also file content and is appended before the links. + +This must be taken into a count when doing offsets calculations (the len of the `Data.Data` field define the value of the zeroth element of `blocksizes` when computing offsets). + +### Metadata UnixFS currently supports two optional metadata fields: @@ -112,42 +236,6 @@ UnixFS currently supports two optional metadata fields: - When no `mtime` is specified or the resulting `UnixTime` is negative: implementations must assume `0`/`1970-01-01T00:00:00Z` ( note that such values are not merely academic: e.g. the OpenVMS epoch is `1858-11-17T00:00:00Z` ) - When the resulting `UnixTime` is larger than the targets range ( e.g. 32bit vs 64bit mismatch ) implementations must assume the highest possible value in the targets range ( in most cases that would be `2038-01-19T03:14:07Z` ) -### Deduplication and inlining - -Where the file data is small it would normally be stored in the `Data` field of the UnixFS `File` node. - -To aid in deduplication of data even for small files, file data can be stored in a separate node linked to from the `File` node in order for the data to have a constant [CID] regardless of the metadata associated with it. - -As a further optimization, if the `File` node's serialized size is small, it may be inlined into its v1 [CID] by using the [`identity`](https://github.com/multiformats/multicodec/blob/master/table.csv) [multihash]. - -## Importing - -Importing a file into unixfs is split up into two parts. The first is chunking, the second is layout. - -### Chunking - -Chunking has two main parameters, chunking strategy and leaf format. - -Leaf format should always be set to 'raw', this is mainly configurable for backwards compatibility with earlier formats that used a Unixfs Data object with type 'Raw'. Raw leaves means that the nodes output from chunking will be just raw data from the file with a CID type of 'raw'. - -Chunking strategy currently has two different options, 'fixed size' and 'rabin'. Fixed size chunking will chunk the input data into pieces of a given size. Rabin chunking will chunk the input data using rabin fingerprinting to determine the boundaries between chunks. - - -### Layout - -Layout defines the shape of the tree that gets built from the chunks of the input file. - -There are currently two options for layout, balanced, and trickle. -Additionally, a 'max width' must be specified. The default max width is 174. - -The balanced layout creates a balanced tree of width 'max width'. The tree is formed by taking up to 'max width' chunks from the chunk stream, and creating a unixfs file node that links to all of them. This is repeated until 'max width' unixfs file nodes are created, at which point a unixfs file node is created to hold all of those nodes, recursively. The root node of the resultant tree is returned as the handle to the newly imported file. - -If there is only a single chunk, no intermediate unixfs file nodes are created, and the single chunk is returned as the handle to the file. - -## Exporting - -To read the file data out of the unixfs graph, perform an in order traversal, emitting the data contained in each of the leaves. - ## Design decision rationale ### Metadata