-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPIP-431: Opt-in Extensible CAR Metadata on Trustless Gateway #431
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
@lidel @willscott Here is my proposal for allowing gateway clients to request the response to include a metadata block at the end. This is my first IPIP. Please let me know what and how to improve, where to add more details, etc. Feel free to edit the text directly if you like (edits by maintainers are allowed). |
meta
(content type parameter)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting this @bajtos.
I like the framing of this as an extensible opt-in CAR manifest. Quick thought:
- Best case, it will solve multiple problems (retrieval attestation, interrupted streams) without inventing anything new (reusing CARv1, DAG-CBOR, CAR content type parameters from Trustless Gateway spec).
- Worst case, will have niche utility, but will create a standard for the ecosystem on how random metadata can be passed between paid HTTP services, allowing CAR-aware clients to identify it and strip it out before storing in caches.
Made some editorial tweaks + quick first-pass feedback in comments inline.
- `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block). | ||
- `b3h_sig` - A signature over `<len><b3h><request>` using server's Ed2559 identity. | ||
- `len` is encoded as `varint`, | ||
- `b3h` is encoded as 32 bytes, | ||
- The effective query as executed by the gateway. This query is the request url - path and query string arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These b3*
fields are specific to SPARK retrieval attestation and should not be listed in the trustless gateway spec as a MUST. These may be mandatory for SPARK, but are optional for the rest of IPFS ecosystem.
Please move them to "User benefit" section of the IPIP document and explain how meta=eof
enables SPARK use case by allowing for these custom signatures to be passed along with the data. It makes a good example of extensibility that does not require PL's permission.
ps. I know other services like dagHouse use different hash functions for getting "CAR CID", putting all bets on Blake3 feels like an unnecessary divergence.
Perhaps this could be made bit more future-proof and generic if blake3 is represented as Multihash wrapped in CIDv1+car codec (0x0202)? Just an idea, fine to ignore, given these are specific to SPARK.
Either way, this belongs to the "userland benefiting from metadata extensibility" story.
- `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block). | |
- `b3h_sig` - A signature over `<len><b3h><request>` using server's Ed2559 identity. | |
- `len` is encoded as `varint`, | |
- `b3h` is encoded as 32 bytes, | |
- The effective query as executed by the gateway. This query is the request url - path and query string arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the Spark use case belongs in Userland section. However, the individual keys of the metadata section, and what the servers must do to implement them, feels like something that should be in the trustless gateway spec.
Keys like car_bytes
, data_bytes
, block_count
will be used by Spark but also may be used by others, and the definition of a key (i.e. what does the server actually return as the value for each key) must be the same for each use case. E.g. if one use case sets data_bytes
to be the total byte length of blocks and another use case sets it to be the total byte length of the CAR stream then trustless gateway implementers will need to implement different logic for each different use case.
What's more, for the Spark use case, we do not want gateway operators to know that they are serving a Spark request and not some other request. Since the Spark ones will be incentivised and other request may not be, servers may simply provide a good retrieval service to Spark clients and a poor service to other clients.
car_bytes
, data_bytes
, block_count
seem generic enough. The troublesome one is then the Blake3Hash and signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps what we need to do is leave this IPIP to concern the metadata block being appended without any constraints on what can be included in it. Then in a separate place, we define a canonical way to include a key value object in the metadata block and how the server should implement certain useful keys such as car_bytes
et al
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be ok to suggest a key name convention for generic things like car_bytes
data_bytes
and block_count
in the section that described meta
parameter, as long it is
- scoped to JSON (perhaps list under explicit
meta=eof[+json]
?) - change requirement from MUST to SHOULD (convention, not a hard requirement)
I think it would be also ok to have a documented convention for passing a hash of the CAR stream (aka CAR CID) – maybe name it car_cid
and use CIDv1 with 0x0202 codec – this convention is already used by .storage folks, no need to invent anything new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be also ok to have a documented convention for passing a hash of the CAR stream (aka CAR CID) – maybe name it
car_cid
and use CIDv1 with 0x0202 codec – this convention is already used by .storage folks, no need to invent anything new.
+1 to document car_cid
.
For SPARK, we specifically want a Blake3 hash so that we can use inclusion proofs. That's why we want to use a dedicated field b3checksum
instead of a more generic car_cid
.
- `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block). | ||
- `b3h_sig` - A signature over `<len><b3h><request>` using server's Ed2559 identity. | ||
- `len` is encoded as `varint`, | ||
- `b3h` is encoded as 32 bytes, | ||
- The effective query as executed by the gateway. This query is the request url - path and query string arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including content path and CAR export parameters feels generic enough to keep in the spec, but we should not mix content path with car and url parameters as it leads to bugs around things like percent-encoding especially where ?
or /
is involved (cc ipfs/gateway-conformance#115).
These should be three separate fields:
content_path
- requested content pathdag_params
- map with DAG params likedag-scope
,entity-bytes
from IPIP-402car_params
- map with CAR content type params likeorder
anddups
from IPIP-412
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel what is the reasoning behind splitting up dag_params
and car_params
? Could we instead go for content_path
and query_params
to keep it simple and generic (allowing for other query params)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that was a conscious design choice, to avoid mixing data selector with details of the transport format (not everything is in URL query params):
dag_params
are about what user data was selected, not tied to any specific transport (could be applied to something other than CAR)- these are things that land in URL query
car_params
are specific to CAR container format, they do not change the user data that was selected, only the way it is represented when sent as CAR- these are things that land in Accept/Content-Type headers
- people read "query" and assume URL query :)
@patrickwoodhead that being said, if you want to simplify, this IPIP could go with a single dag-json map named response_params
(to avoid confusion with URL query params, and account for the fact that server may ignore some of request params when producing a response)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using retrieval_params
instead of response_params
? In my mind, response parameters don't describe "what user data was selected".
What is our motivation in SPARK:
We want the gateway to describe what exactly the client is retrieving (CID, subpath, dag params, car params) and provide a signature over that.
If two SPARK checker clients submit a metadata block with the same retrieval parameters (CID, subpath, dag params, car params) then we want:
- To be able to verify that the clients retrieved the DAG (sub)tree they were expected to retrieve.
- Confidence that both clients made the same request to the gateway (semantically?) and were supposed to receive exactly the same response from the gateway.
content_path
- requested content path
When the client requests GET /ipfs/bafy1234/cat.jpg
, what is content_path
?
/ipfs/bafy1234/cat.jpg
bafy1234/cat.jpg
/cat.jpg
- something else?
We need the metadata block to describe both the CID requested (bafy1234
) and the resource subpath if specified (/cat.jpg
).
I don't think we're going to be able to do a new CID codec code for this (
If you supply Unfortunately this option mixes up the metadata in the payload it's trying to describe, but at least we have mechanisms to signal the special nature of it.
If we want to avoid the metadata being within the CARv1 payload. The One problem is that if you
CARv2 isn't really designed as a transport format, that's why we're using CARv1, but we may be able to squish it into a usable form to help with this problem. We have a "characteristics" field we can play with, and there's an index section we can also play with. We can define a new index format very easily, and the CAR decoders will just complain if they can't read the index, which is fine for this purpose. We could define an index as a "trustless gateway metadata index"; which doesn't get the location data, but we can use it to put whatever we want into the trailer of the CARv2 stream—we could just encode well-defined IPLD block strictly conforming to a schema, to present this metadata and anything else we want. The main problem is that CARv2 requires a "DataSize" in the header to tell us the length of the CARv1 payload, which we don't have up-front, and an "IndexOffset" to tell us where the index starts, which we don't have for the same reasons. We've used We get to leave the CARv1 intact, in the same form that you would get it if you didn't turn As an aside, we could use any of these options to do our error signalling, which I'm pretty keen on having. A schema for this metadata block could be a union type of the metadata presented here or an |
the keys / schema presented here, i think, should be considered an example. I would hope that it would be treated as an arbitrary key-value map of metadata objects, and that these key-values could be used to signal an error, could be used to signal an 'eof' signal, and/or could be used to provide additional check-sum attestation as described in the current text |
I'm prototyping a form of option 1 above with Frisbii and Lassie, will let you know how it goes. |
…m http fetches Ref: ipfs/specs#431 Ref: ipld/frisbii#15
…m http fetches Ref: ipfs/specs#431 Ref: ipld/frisbii#15
…m http fetches Ref: ipfs/specs#431 Ref: ipld/frisbii#15
…m http fetches Ref: ipfs/specs#431 Ref: ipld/frisbii#15
…m http fetches Ref: ipfs/specs#431 Ref: ipld/frisbii#15
filecoin-project/lassie#378 and ipld/frisbii#15 demonstrate an approximation of the option 1 I presented above.
|
I'll acknowledge that it may be better to just go with the plain map approach, without a schema, as Will's suggested. That would even let us do novel things for specific situations like having Lassie tell Saturn about retrieval clients and their timings (currently can only use Server-Timings header which ends when the data starts). But there's a bit of a can of worms that we open up that I wondered if we could avoid by having a strong schema, at least for the first version. All of the things that http headers have to deal with - like what to do with duplicate keys, what limits we need to put on the sizes of things to avoid abuse, etc. Constraining within the bounds of dag-json, which itself is a bit strict, and having a schema, let's us be very clear about rules and avoid abuse. |
@rvagg prefixing metadata with Existing CARv1 implementations will error without explicit support for This thing starts looking like a new content type, changing the scope of this IPIP to something similar to Not saying it is bad, maybe a separate content type for streaming CARs is the right call here. It mitigates risks around mixing regular CAR responses with ones that include metadata trailer and causing issues on clients that don't support But i'm worried about duplicated effort across teams and project in light of CARv3, which (iiuc) also needs to happen some time in the next ~12-24 months and might have overlapping scope, solving similar problems. @willscott @bajtos is this IPIP something we intend to expose on all gateways and support forever in the IPFS ecosystem, even when we have CARv3? Would this be intended for wrapping CARv2 and v3 too? Or is this just a stop-gap for Rhea/Boost internally until we have CARv3 with built-in metadata/eof support? |
Co-authored-by: Miroslav Bajtoš <[email protected]>
meta=eof+json update
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Hello folks; thank you for your patience! Together with @patrickwoodhead, we incorporated your feedback and updated both the proposal and the spec. We are ready for the next round of reviews. 🙏🏻 |
"content_path": { | ||
"description": "The url path in the request as executed by the gateway, e.g. `/ipfs/bafy1234/cat.jpg`. The query string MUST BE stripped from the path.", | ||
"type": "string" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion point leading back to #431 (comment):
How do we represent the information about what content was requested?
- The CID
- An optional path to a file inside UnixFS
"data": { | ||
"type": "object", | ||
"description": "Properties of the response" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion point:
In the current proposal, the top-level "data" object combines fields about "what was requested" (e.g. CAR & DAG params) with "what was returned" (e.g. CARv1 length in bytes).
I'd like to discuss an alternative: split data
into two fields req
and res
. The first will describe what the client requested, the second will describe what the server returned.
Such division would allow us to shorten field names, e.g. data.car_params.dup
can become req.dups
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting into req
and res
sgtm, improves clarity
|
||
When the parameter is not set or does not equal `eof+json`, the server SHOULD not add any extra blocks to the response, neither the 0x00 byte nor any metadata. | ||
|
||
When `meta=eof+json`, the JSON object SHOULD conform to the following [JSON schema](https://json-schema.org/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion points:
- In the current spec & IPIP, we are formatting metadata as JSON. Should we say DAG-JSON instead?
- Do we want to serialise the metadata as a CAR block, prefixing the JSON data with
varint | CID
header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott @rvagg thoughts? Value added in DAG-JSON prefixed with own CID is that it allows client to detect truncation beyond 0x00
byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe clients can already easily detect truncation of the metadata block.
- The block is a DAG-JSON object, it must start with
{
and end with a matching}
. - If the block is truncated, it will not end with the matching
}
and the JSON parser will throw an error.
TBD | ||
|
||
Using one CID, request the CAR data using various combinations of content type parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging this TODO to show in the PR discussion.
- native truncation detection and standardized error handling and passing during streaming | ||
- support for things like [Large Blocks](https://discuss.ipfs.tech/t/supporting-large-ipld-blocks/15093/) | ||
|
||
TODO: link to some public artifact about CARv3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging this TODO to show in the PR discussion.
Any suggestions for the artefacts I can link to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann do we have anything on GH?
Previews of the current version: |
@lidel @rvagg @willscott Ping 👋🏻 What's the best way to move this proposal forward? |
|
||
- The metadata `sig` field SHOULD also be populated, returning a signature, using the server's Ed2559 identity, over the metadata properties object. This allows gateway clients to submit the metadata block as an attestation of retrieval that 3rd parties can verify. | ||
|
||
### Compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bajtos Let's go extra mile here and elaborate what happens when CAR response with 0x00
-prefixed suffix is parsed by existing CAR software.
My suggestion is to add some clear statement about expected interop, like "libraries and implementations SHOULD ignore the suffix after 0x00", otherwise we will create a bad UX/DX, where developer tries to debug things with existign tooling and the tooling errors.
I imagine we don't want things to fail due to 0x00
suffix, bare minimum being:
- >80% of Amino DHT IPFS network (including IPFS Desktop and Brave) is Kubo
ipfs dag import
should ignore suffix
- reference CAR libraries ignore 0x00 by default
- CLI tools we recommend to developers, they will try to use these for debugging CAR responses with the suffix:
- ipfs-car (JS CLI)
- car (CLI)
- go-fixtureplate (CLI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go extra mile here and elaborate what happens when CAR response with 0x00-prefixed suffix is parsed by existing CAR software.
It's a great idea to think about compatibility with existing & future tooling and clearly describe our thinking. 👍🏻
The most important aspect is avoiding the "0x00 insertion attack" vector. You can find more details in the section Zero-length-block insertion attacks (including the Filecoin-specific logic). I am cross-posting the mitigation I proposed:
Our proposal avoids this attack vector:
- It does not change the current semantics of CARv1. Zero-length blocks remain invalid.
- Instead, we treat the response body as a new container format combining the CARv1 file with additional data.
- Clients must explicitly request this new container format. Existing clients not aware of the new metadata will not receive responses in the new format.
When developers use existing tooling, they will never receive a CAR file with the 0x00
suffix.
There are two major ways how a CAR with a 0x00
suffix can emerge:
-
Somebody makes an HTTP request to a Trustless Gateway, explicitly asks to receive CAR with
meta=eof+json
, saves the response body to a.car
file and forgets to extract the CAR payload from the container (remove the\x00{metadata}
trailer). -
Somebody uses a tool that is aware of
meta=eof+json
. The tool opts into this new feature when requesting content from a Trustless Gateway, but does not extract the CAR payload from the container in the response body before returning the content back to the user.
I am arguing that (2) is a bug in the tooling, introduced by the change that modified Trustless Gateway requests to opt-into meta=eof+json
, and therefore, the maintainers of that tool should fix that bug - make the tool adhere to spec.
Regarding (1): do you think this will happen frequently enough to justify the effort required to change all libraries you mentioned to start ignoring the 0x00
byte?
Maybe it's actually a good thing that the tooling reports an error because it tells the user they are using the new meta=eof+json
feature incorrectly.
As an alternative to silently stripping the 0x00
suffix, the tooling can detect the situation where 0x00
is followed by a valid DAG-JSON object and report a more helpful error message to the user, advising them to either change the "accept" header in the request to the Trustless Gateway or else remove the 0x00
suffix (unpack CARv1 from the container format).
Thoughts?
go-car/cmd/car/inspect.go
seems to always treat 0x00
as EOF, if I am reading the source code correctly:
https://github.com/ipld/go-car/blob/5c5d432d582564f88fd2124f2fce4f2f3e47a654/cmd/car/inspect.go#L26
rd, err := carv2.NewReader(inStream, carv2.ZeroLengthSectionAsEOF(true))
js-car
seems to always reject zero-length blocks:
https://github.com/ipld/js-car/blob/562c39266edda8422e471b7f83eadc8b7362ea0c/src/decoder.js#L94-L97
let length = decodeVarint(await reader.upTo(8), reader)
if (length === 0) {
throw new Error('Invalid CAR section (zero length)')
}
I guess I can test how existing tooling handles zero-length blocks and document this behaviour in the IPIP, so that we better understand the current landscape.
"b3checksum": { | ||
"description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block). The value should be serialized as a multihash with multibase prefix, preferably using Base58 encoding.", | ||
"type": "string" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bajtos What is the difference between car_cid
and this field?
Hardcoding Blake3 in field name and description makes no sense if you use Multihash. It could use functions other than blake3 in the future.
To reduce future confusion, could this be renamed to car_checksum
? (and remove car_cid
since it is redundant?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is your description of car_cid
, see #431 (comment):
I think it would be also ok to have a documented convention for passing a hash of the CAR stream (aka CAR CID) – maybe name it
car_cid
and use CIDv1 with 0x0202 codec – this convention is already used by .storage folks, no need to invent anything new.
Regarding b3checksum
:
For SPARK, we specifically need the Blake3 hash of the CAR stream, and we need gateways to always return this hash. In particular, clients cannot ask the server to use Blake3 for the CAR checksum because the server could use this information to detect SPARK clients vs. other clients and provide different quality of service.
I agree it's confusing to have both car_cid
and b3checksum
, but I don't see a better solution. Do you?
"data_bytes": { | ||
"description": "Total byte length of the flat file before it was encoded into a CAR file", | ||
"type": "integer" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bajtos what happens when returned CAR is for:
- HAMT-sharded UnixFS directory?
- a single file under some sub-path of HAMT-sharded UnixFS directory?
Is the semantic meaning here to be "raw bytes of all files, ignoring UnixFS directory metadata", or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great questions! TBH, I don't know the answers. We don't need data_bytes
for SPARK. I think this field was added based on the discussion in this proposal, but I could not find the specific comment requesting it.
I am proposing to remove data_field
from the spec. We can introduce it later if there is a clear need. We will better understand the desired semantics at that point.
"sig": { | ||
"type": "string", | ||
"description": "A signature, using the server's Ed2559 identity, over the `data` object serialized as JSON." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- HTTP Gateways have no concept of "sever ED25519" introduced here. How one verifies the signature without knowing the pubkey?
- One way to avoid being prescriptive about key type or its location, is to have
sig_key
with CID-encoded publiclibp2p-key
that can be used for signature verification.- The nice thing about this is that Gateway/client implementation will already have relevant code/library as we use these in IPNS and libp2p.
- One way to avoid being prescriptive about key type or its location, is to have
- If you sign JSON, you want it to be deterministic variant like DAG-JSON, otherwise someone will run into bugs when they use less strict JSON library in different languages.
"sig": { | |
"type": "string", | |
"description": "A signature, using the server's Ed2559 identity, over the `data` object serialized as JSON." | |
"sig_pubkey": { | |
"type": "string", | |
"description": "A libp2p-key used for signing" | |
}, | |
"sig": { | |
"type": "string", | |
"description": "A signature, using the `sig_pubkey`, over the `data` object serialized as DAG-JSON." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is our use case:
- An untrusted/permissionless client makes a retrieval request to the Storage Provider's booster-http address advertised in IPNI.
- The client submits the measurement to the SPARK orchestration layer.
- Later, SPARK's evaluation service wants to verify that the client contacted the SP.
To do so, we must not accept signatures from any identity, only the signature from the identity advertised by SP to IPNI.
I am arguing this is true for everybody else who wants to use the signature to verify that a metadata block submitted by an untrusted party was indeed produced by the expected Trustless Gateway.
Consider a simple attack vector: the attacker takes the metadata block produced by the origin gateway and replaces the signature with one created using the attacker's identity. Clients verifying the signature against the sig_pubkey
field in the metadata will not notice the attack.
Now I can see how including sig_pubkey
can simplify troubleshooting:
- If
sig_pubkey
does not match the pubkey we expected, then we know the metadata block was signed by somebody else - If
sig_pubkey
matches but the signature does not, then we know the metadata block was modified from the original.
Compare that with my proposal: - If the signature is not valid, then either the metadata block was tampered with or it was signed by a different identity.
IMO, this improvement is not worth the cost of increasing metadata block size and, thus, egress traffic for Trustless Gateways.
Do you have any other use case for the signature in your mind?
IMO, the clients making retrieval requests don't need this signature for validating the metadata block, as they can rely on guarantees provided by the underlying transport - HTTPS.
- HTTP Gateways have no concept of "server ED25519" introduced here.
Good point. We don't require all Gateways to sign the metadata block, SPARK needs the signature only from Storage Providers' servers handling retrieval (booster-http
).
Let's update the spec to explicitly mention the signature is an optional field.
How one verifies the signature without knowing the pubkey?
- One way to avoid being prescriptive about key type or its location, is to have
sig_key
with CID-encoded publiclibp2p-key
that can be used for signature verification.
As I wrote above, if you don't know the expected server identity, then the signature is not useful for you.
Having said that, I like the idea of adding more details about the identity/public key to the spec.
The proposed format CID-encoded public libp2p-key
seems like a good candidate, although AFAICT, that's not the format advertised to IPNI. In IPNI, I see identities in the format that can be used in multiaddr's /p2p/{id}
part:
12D3KooWAWHEbCQy22d45mKbKSewoB1xksDDhR7o5S4mDrSNKXNk
12D3KooWAy5kaLtHf5uS7PZVLjSYd8sGqJ6fn7bxMjqLLZ1uULp9
12D3KooWEiPRcfjXJVehty8okJGJpBZP8zM5UBoCK5yw2MXfx98x
12D3KooWFpv7LP1MUmjfQ8sAUXgJXG5FRMJLnqnJyR32fVboqspB
12D3KooWHKeaNCnYByQUMS2n5PAZ1KZ9xKXqsb4bhpxVJ6bBJg5V
12D3KooWNHwmwNRkMEP6VqDCpjSZkqripoJgN7eWruvXXqC2kG9f
12D3KooWSfsqUahHLCmiENT8oN4FkVtz5pSCxKtNEb7wrR1rrRjk
- If you sign JSON, you want it to be deterministic variant like DAG-JSON, otherwise someone will run into bugs when they use less strict JSON library in different languages.
Makes sense; I'll update the spec to require the metadata to be a DAG-JSON.
Define an optional enhancement of the CARv1 stream that allows a Gateway server to provide additional metadata about the CARv1 response. Introduce a new content type that allows the client and the server to signal or negotiate the inclusion of extra metadata.
The PR discussing a new multi-codeccar-metadata
: multiformats/multicodec#334