-
Notifications
You must be signed in to change notification settings - Fork 440
add RPC call Metadata requesting sector IDs #2584
base: master
Are you sure you want to change the base?
Conversation
75c8e7a
to
da928fc
Compare
This is now a high priority pull request. Finally :P |
NegotiateMetadataMaxSliceSize = build.Select(build.Var{ | ||
Dev: uint64(1 << 17), | ||
Standard: uint64(1 << 17), | ||
Testing: uint64(1 << 4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these numbers are kind of magic; better would be SectorSize / crypto.HashSize
.
func verifyRecentRevision(conn net.Conn, contract contractHeader, hostVersion string) error { | ||
// getRecentRevision downloads the current revision from the host | ||
// and checks signatures. | ||
func getRecentRevision(conn net.Conn, fcid types.FileContractID, sk crypto.SecretKey, windowStart types.BlockHeight, hostVersion string) (types.FileContractRevision, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why it's necessary to change verifyRecentRevision
. It doesn't look like you're using lastRevision
in GetMetadata
, so why do you need to return it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller of GetMetadata
can calculate merkle root of sector IDs returned by GetMetadata
and compare it with merkle root from lastRevision. This is a way to verify that sector IDs are correct for some revision (host still can return old revision).
@@ -219,6 +219,7 @@ type RenterContract struct { | |||
ID types.FileContractID | |||
HostPublicKey types.SiaPublicKey | |||
Transaction types.Transaction | |||
SecretKey crypto.SecretKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in TestIntegrationMetadata
to pass it to GetMetadata
IIUC this only allows you to fetch the first 2^17 = 131072 sector roots, i.e. 500 GiB of contract data. The RPC is also free. I suggest that instead, we allow requesting an arbitrary number of roots, but the renter must pay for them. One interesting way to implement this would be to reuse almost all of the One tricky thing here is that the renter needs to know how many roots there are. But it can get this by looking at the |
I think I am not in favor of re-using the download rpc to do this, though I am in favor of having the renter pay to download the roots. I'm not sure the best way to mesh them to avoid code duplication, but using special values for the desired merkle root just feels like a bad plan to me. I am also in favor of limiting the number of roots that can be downloaded at once to 4 MB of roots (131,000 roots), and having the downloader be able to specify an offset and length. |
Some code duplication is probably okay. We don't want to intermingle the metadata RPC with the download RPC too much because we're going to be splitting download+upload into a separate RPC at some point anyway. type SectorRootsRequest struct {
StartIndex uint64
EndIndex uint64
} and of course changing the retrieval code to fetch roots rather than data. But everything else should stay: doing repeated requests in a loop, exchanging new host settings on each iteration, etc. Actually, while we're at it, we could consider tightening up the protocol a little. One thing that stands out to me in particular is that we could eliminate a round trip. The current download protocol is:
We can eliminate a round trip by merging steps 5 and 7: there is no reason for the renter to wait until the host has accepted or rejected the request before sending its signature. If we want to go further, we could modify the protocol to send only the parts of the revision that changed, instead of the full new revision. Currently the only values that change are the amounts in the proof outputs, and the revision number. So all we really need to send is a single |
To speed up the protocol, we can go even further:
Just one round-trip - the fastest possible way. The only concern is that a malicious host can keep the half-signature and not send the data, but it can do exactly this in the current protocol as well. |
I think we're describing the same thing. The new protocol would be:
We can get this down to a single round trip if we remove the settings exchange. Which I think is feasible; host settings probably shouldn't change during the protocol. If the host really wants to change their prices, they can still do so by terminating the protocol and waiting for the renter to "call them back," and then establishing the new settings. So then the protocol is:
|
I propose not to make settings exchange in the beginning, because the renter can cache those settings and they change really rare. The only part of settings that matters are prices. If settings change and the renter sends insufficient or excess payment, the host drops the request with the message "please update settings". In this case the renter makes RPCSettings and redoes the request. The latency of such protocol is just one round trip like of HTTP. |
Yeah, after some thought, that sounds okay to me. It's like how you sometimes need to make multiple HTTP requests to figure out what content you want, but once you start receiving data it's a tight single-round-trip loop. |
To avoid code duplication, I suggest to write the following private function:
The function returns the payload: sector data or an array of sector IDs (in This function can be used by existing RPCDownload and by RPCMetadata. We can also add new RPCDownload2 which will just call this function (like RPCMetadata). As an optimization, new RPCs don't need to pass whole paymentRevision on wire, but can pass just |
Some hosts provide downloads for free. In this case we can just keep previous contract version instead of issuing a new one and writing it to disk. For highly loaded hosts it can be a significant optimization (worth providing download for free). Can we take it into account in this new RPC? |
That's an interesting thought. Perhaps we can. |
Should we just have 1 RPC between the renter and the host? There are multiple things you can do:
When the renter opens the first communication, the renter can probably send a request that includes the thing that the renter wants (upload/download/etc), a partially signed contract to pay for the request (omitted if the request doesn't require a new signature), and whatever other data is needed for the host to provide everything necessary. Then the host can respond in a way that completes the request, and allows the renter to make a new request on the next communication using the same connection. That way basically everything happens in 1 round trip. The tricky part is if the host rejects whatever proposal the renter makes. If the renter makes a bad proposal (example: host has increased download price and renter doesn't know), the host can return some settings indicating the new settings / requirements and then the renter can try again on the same connection. The one place this probably doesn't work is creating contracts, because as a renter you definitely want to see the most recent settings before you send a propsal contract, that way you don't risk over-paying. |
I think I agree with starius' argument that it can be the renter's responsibility to fetch up-to-date settings. Especially if the host can return errors like "renter didn't pay enough", it's pretty clear that the renter should respond by fetching the most recent settings and then trying again. Contract formation/renewal are pretty rare operations and don't need to be highly performant, so I'm less concerned about reducing round trips and such for those actions. I'd probably keep them distinct from the upload/download/metadata RPC, although I don't have super strong feelings about this. I will say that form/renew share a LOT of code, so it might make sense to formally merge them into one RPC. The other important thing to keep in mind is that actions like upload and download are stateful. Our current loop-based RPCs are a good fit for this because, after initial setup, the renter and host can focus on just exchanging the minimum amount of data as fast as possible. For example, currently the renter has to prove to the host that it can sign with the renter key before any transfers take place -- but this only has to happen once. Similarly, for performance reasons, the host will want to load as much into memory as it can during the loop (such as the contract obligation) so that it doesn't need to fetch that info from disk for each request. If we move to a stateless single request/single response protocol, this stuff becomes harder; the host would have to maintain a LRU cache of obligations or something to that effect. The loop model isn't perfect though; for small transfers, we could definitely do better. In particular, we want to design the protocol such that users can download small (sub-sector) files as quickly as possible. For sub-sector downloads, latency becomes much more important, to the point where even the TCP connection handshake starts to feel like a burden. David and I have kicked around the idea of a UDP renter-host protocol; I think that, for downloads that can fit in a small number of UDP packets, it could make sense. But my current feeling is that sub-sector downloads have such different requirements from the other RPCs that it may be worth devoting a separate RPC to them, distinct from the standard multi-sector download RPC. The advantage of this is that it allows us to add a sub-sector download RPC later, as an optimization, instead of trying to design a download RPC that's highly optimized for both use cases from the beginning. The standard download RPC can stick with the tried-and-tested TCP-based request/response loop (although of course we'll strive to eliminate as many round-trips as possible). That way, we're free to experiment with a highly-optimized download protocol without blocking progress on the new set of RPCs. |
I also think upload/download RPCs should not be mixed with new/renew RPCs. Actually I am not sure about upload RPCs. To organize upload RPC in one round trip way, a renter needs to send the data with everything else. But if the server rejects the request, it is still bombarded with 4M of uploaded data. On the other hand, upload requests are not as latency sensitive as download requests are, so they won't benefit much from being one round trip. So I think we should not change upload at this point.
If renter does not prove that he can sign with the renter key, then the only extra information that can be sent once in the beginning is contract ID. We can create an RPC that starts with sending contract ID once and then the renter can send download/upload/metadata requests, sending only needed data. The renter doesn't wait after sending contract ID, so it still one round trip even for the first sub-request. Stateful handlers can keep some state in memory, but they still have to sync with disk each new recent revision. I like the idea with UDP. I had a slightly different idea of stateful download requests, which can be combined with UDP: pay first, download next. I.e. you paid for downloading of 1G and the host gives you a token, which you can use to download that 1G later (probably with some expiration time). Here you trust the host not to cheat. But you do it now as well, just for smaller size (one sector). If you know that the host works well, you can increase the amount you pay for in advance. Hosts can keep the table with download tokens in memory and sync it in background (if they crash, then some tokens will have slightly more download space then they paid for, which is fine). The same tokens can be reused with UDP: you can send the token and the target sector and coordinates in it in every UDP packet. Hosts should reward renters downloading pieces from the same sector rather than from different sectors. There can be two UDP RPCs: (1) download piece and (2) download piece and keep that sector in memory for some time. Both of them charge less if the sector is already in memory. But I think I am going into details too deep.
In https://godoc.org/github.com/NebulousLabs/Sia/modules#DownloadAction I see Offset and Length parameters. Don't they already provide sub-sector downloads? |
Function getRecentRevision downloads recent revision from the host and checks its signatures. It takes the minimal data sufficient for this operation: contract ID, secret key, window start and host version. Function verifyRecentRevision calls getRecentRevision and compares the recent revision from host with locally known recent revision. It takes a private type (contractHeader) used by other code in the package.
This is second attempt of NebulousLabs#2321 The whole list of IDs can be expensive for a host to provide for free, but a short list of sector IDs (up to 2^17) is cheap. At the same time it opens doors for stateless clients, i.e. recovering everything from seed only.
I addressed comments of @lukechampine in #2321. This pull request adds light version of Metadata RPC which returns just a first sector ID - it is cheap for host (so can be provided for free), but is sufficient to build a stateless client, recovering everything from seed only. (IDs of other sectors can be stored in the first sector.)