Skip to content
This repository has been archived by the owner on Nov 2, 2018. It is now read-only.

add RPC call Metadata requesting sector IDs #2584

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

starius
Copy link
Contributor

@starius starius commented Dec 26, 2017

I addressed comments of @lukechampine in #2321. This pull request adds light version of Metadata RPC which returns just a first sector ID - it is cheap for host (so can be provided for free), but is sufficient to build a stateless client, recovering everything from seed only. (IDs of other sectors can be stored in the first sector.)

@tbenz9 tbenz9 requested a review from lukechampine January 2, 2018 19:25
@starius starius changed the title add RPC call Metadata requesting first sector ID add RPC call Metadata requesting sector IDs Feb 5, 2018
@DavidVorick
Copy link
Member

This is now a high priority pull request.

Finally :P

NegotiateMetadataMaxSliceSize = build.Select(build.Var{
Dev: uint64(1 << 17),
Standard: uint64(1 << 17),
Testing: uint64(1 << 4),
Copy link
Member

@lukechampine lukechampine Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these numbers are kind of magic; better would be SectorSize / crypto.HashSize.

func verifyRecentRevision(conn net.Conn, contract contractHeader, hostVersion string) error {
// getRecentRevision downloads the current revision from the host
// and checks signatures.
func getRecentRevision(conn net.Conn, fcid types.FileContractID, sk crypto.SecretKey, windowStart types.BlockHeight, hostVersion string) (types.FileContractRevision, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why it's necessary to change verifyRecentRevision. It doesn't look like you're using lastRevision in GetMetadata, so why do you need to return it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller of GetMetadata can calculate merkle root of sector IDs returned by GetMetadata and compare it with merkle root from lastRevision. This is a way to verify that sector IDs are correct for some revision (host still can return old revision).

@@ -219,6 +219,7 @@ type RenterContract struct {
ID types.FileContractID
HostPublicKey types.SiaPublicKey
Transaction types.Transaction
SecretKey crypto.SecretKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in TestIntegrationMetadata to pass it to GetMetadata

@lukechampine
Copy link
Member

IIUC this only allows you to fetch the first 2^17 = 131072 sector roots, i.e. 500 GiB of contract data. The RPC is also free. I suggest that instead, we allow requesting an arbitrary number of roots, but the renter must pay for them.

One interesting way to implement this would be to reuse almost all of the RPCDownload code, but with sentinel sector roots. Requesting crypto.Hash{0} would return the first 2^17 roots, crypto.Hash{1} would be the next 2^17, etc. (Actually, we'd probably want to use a special prefix, e.g. crypto.Hash{'s','e','c','t','o','r', 0, 1, 2, 3, 4, 5, 6, 7, 8}, where 0-8 is a little-endian uint64.) All the host needs to do is detect these roots and translate them into indices. We could even reuse the Offset and Length fields of modules.DownloadAction to enable requesting arbitrary subsets of roots.

One tricky thing here is that the renter needs to know how many roots there are. But it can get this by looking at the FileContractRevision reported by the host: the number of roots will be revision.NewFilesize / modules.SectorSize.

@DavidVorick
Copy link
Member

I think I am not in favor of re-using the download rpc to do this, though I am in favor of having the renter pay to download the roots. I'm not sure the best way to mesh them to avoid code duplication, but using special values for the desired merkle root just feels like a bad plan to me.

I am also in favor of limiting the number of roots that can be downloaded at once to 4 MB of roots (131,000 roots), and having the downloader be able to specify an offset and length.

@lukechampine
Copy link
Member

Some code duplication is probably okay. We don't want to intermingle the metadata RPC with the download RPC too much because we're going to be splitting download+upload into a separate RPC at some point anyway.
I'd be fine with copying the download RPC almost entirely, changing only the DownloadAction to:

type SectorRootsRequest struct {
    StartIndex uint64
    EndIndex   uint64
}

and of course changing the retrieval code to fetch roots rather than data. But everything else should stay: doing repeated requests in a loop, exchanging new host settings on each iteration, etc.

Actually, while we're at it, we could consider tightening up the protocol a little. One thing that stands out to me in particular is that we could eliminate a round trip. The current download protocol is:

  1. Renter initiates request
  2. Enter loop
  3. Host sends settings
  4. Renter sends accept/reject
  5. Renter sends download request and unsigned revision
  6. Host sends accept/reject
  7. Renter sends signature
  8. Host sends accept/reject
  9. Host sends signature and data
  10. Go to 3

We can eliminate a round trip by merging steps 5 and 7: there is no reason for the renter to wait until the host has accepted or rejected the request before sending its signature.

If we want to go further, we could modify the protocol to send only the parts of the revision that changed, instead of the full new revision. Currently the only values that change are the amounts in the proof outputs, and the revision number. So all we really need to send is a single types.Currency value: the amount that the renter is paying the host for the download. The host can use this to calculate the new proof outputs, and increment the revision number by 1. This would save about 500 bytes per iteration. However, we probably don't want to be so restrictive; for example, the renter should be able to set the revision number to math.MaxUint64 if it wants to, so we should explicitly send the revision number instead of always incrementing by 1. There may be other degrees of freedom that we want to permit as well. So I'm only suggesting this optimization tentatively.

@starius
Copy link
Contributor Author

starius commented Jun 12, 2018

To speed up the protocol, we can go even further:

  1. the renter sends new revision and a partial signature.
  2. Host sends back the full signature + data (IDs or sector data) or sends "reject" and deletes the half-signature.

Just one round-trip - the fastest possible way.

The only concern is that a malicious host can keep the half-signature and not send the data, but it can do exactly this in the current protocol as well.

@lukechampine
Copy link
Member

lukechampine commented Jun 12, 2018

I think we're describing the same thing. The new protocol would be:

  1. Renter initiates request
  2. Enter loop
  3. Host sends settings
  4. Renter sends accept/reject
  5. Renter sends download request, revision, and signature
  6. Host sends accept/reject, signature, and data
  7. Go to 3

We can get this down to a single round trip if we remove the settings exchange. Which I think is feasible; host settings probably shouldn't change during the protocol. If the host really wants to change their prices, they can still do so by terminating the protocol and waiting for the renter to "call them back," and then establishing the new settings. So then the protocol is:

  1. Renter initiates request
  2. Host sends settings
  3. Renter sends accept/reject
  4. Enter loop
  5. Renter sends download request, revision, and signature
  6. Host sends accept/reject, signature, and data
  7. Go to 5

@starius
Copy link
Contributor Author

starius commented Jun 12, 2018

I propose not to make settings exchange in the beginning, because the renter can cache those settings and they change really rare. The only part of settings that matters are prices. If settings change and the renter sends insufficient or excess payment, the host drops the request with the message "please update settings". In this case the renter makes RPCSettings and redoes the request. The latency of such protocol is just one round trip like of HTTP.

@lukechampine
Copy link
Member

Yeah, after some thought, that sounds okay to me. It's like how you sometimes need to make multiple HTTP requests to figure out what content you want, but once you start receiving data it's a tight single-round-trip loop.
The renter is at risk of overpaying if they don't use the most recent settings, but I don't think that's a huge problem; they can always manually request before each transfer if they want to, and regardless if a price is too high they should just refuse to pay it in the first place.

@starius
Copy link
Contributor Author

starius commented Jun 13, 2018

To avoid code duplication, I suggest to write the following private function:

type DownloadTarget int
const (
  DOWNLOAD_DATA DownloadTarget = iota
  DOWNLOAD_ID
)

func download(fcid types.FileContractID, target DownloadTarget, sectorID crypto.Hash,
  startIndex, endIndex uint64, paymentRevision types.FileContractRevision) ([]byte, error)

The function returns the payload: sector data or an array of sector IDs (in encoding format, so it can be written directly to conn.

This function can be used by existing RPCDownload and by RPCMetadata. We can also add new RPCDownload2 which will just call this function (like RPCMetadata). As an optimization, new RPCs don't need to pass whole paymentRevision on wire, but can pass just NewRevisionNumber and paymentRevision.NewValidProofOutputs[1] (this data is sufficient to build paymentRevision).

@starius
Copy link
Contributor Author

starius commented Jun 16, 2018

Some hosts provide downloads for free. In this case we can just keep previous contract version instead of issuing a new one and writing it to disk. For highly loaded hosts it can be a significant optimization (worth providing download for free). Can we take it into account in this new RPC?

@DavidVorick
Copy link
Member

That's an interesting thought. Perhaps we can.

@DavidVorick
Copy link
Member

Should we just have 1 RPC between the renter and the host?

There are multiple things you can do:

  • upload
  • download
  • renew
  • fetch sector ids
  • maybe something else?

When the renter opens the first communication, the renter can probably send a request that includes the thing that the renter wants (upload/download/etc), a partially signed contract to pay for the request (omitted if the request doesn't require a new signature), and whatever other data is needed for the host to provide everything necessary.

Then the host can respond in a way that completes the request, and allows the renter to make a new request on the next communication using the same connection.

That way basically everything happens in 1 round trip.

The tricky part is if the host rejects whatever proposal the renter makes. If the renter makes a bad proposal (example: host has increased download price and renter doesn't know), the host can return some settings indicating the new settings / requirements and then the renter can try again on the same connection.

The one place this probably doesn't work is creating contracts, because as a renter you definitely want to see the most recent settings before you send a propsal contract, that way you don't risk over-paying.

@lukechampine
Copy link
Member

I think I agree with starius' argument that it can be the renter's responsibility to fetch up-to-date settings. Especially if the host can return errors like "renter didn't pay enough", it's pretty clear that the renter should respond by fetching the most recent settings and then trying again.

Contract formation/renewal are pretty rare operations and don't need to be highly performant, so I'm less concerned about reducing round trips and such for those actions. I'd probably keep them distinct from the upload/download/metadata RPC, although I don't have super strong feelings about this. I will say that form/renew share a LOT of code, so it might make sense to formally merge them into one RPC.

The other important thing to keep in mind is that actions like upload and download are stateful. Our current loop-based RPCs are a good fit for this because, after initial setup, the renter and host can focus on just exchanging the minimum amount of data as fast as possible. For example, currently the renter has to prove to the host that it can sign with the renter key before any transfers take place -- but this only has to happen once. Similarly, for performance reasons, the host will want to load as much into memory as it can during the loop (such as the contract obligation) so that it doesn't need to fetch that info from disk for each request. If we move to a stateless single request/single response protocol, this stuff becomes harder; the host would have to maintain a LRU cache of obligations or something to that effect.

The loop model isn't perfect though; for small transfers, we could definitely do better. In particular, we want to design the protocol such that users can download small (sub-sector) files as quickly as possible. For sub-sector downloads, latency becomes much more important, to the point where even the TCP connection handshake starts to feel like a burden. David and I have kicked around the idea of a UDP renter-host protocol; I think that, for downloads that can fit in a small number of UDP packets, it could make sense. But my current feeling is that sub-sector downloads have such different requirements from the other RPCs that it may be worth devoting a separate RPC to them, distinct from the standard multi-sector download RPC. The advantage of this is that it allows us to add a sub-sector download RPC later, as an optimization, instead of trying to design a download RPC that's highly optimized for both use cases from the beginning. The standard download RPC can stick with the tried-and-tested TCP-based request/response loop (although of course we'll strive to eliminate as many round-trips as possible). That way, we're free to experiment with a highly-optimized download protocol without blocking progress on the new set of RPCs.

@starius
Copy link
Contributor Author

starius commented Jun 18, 2018

I also think upload/download RPCs should not be mixed with new/renew RPCs. Actually I am not sure about upload RPCs. To organize upload RPC in one round trip way, a renter needs to send the data with everything else. But if the server rejects the request, it is still bombarded with 4M of uploaded data. On the other hand, upload requests are not as latency sensitive as download requests are, so they won't benefit much from being one round trip. So I think we should not change upload at this point.

Our current loop-based RPCs are a good fit for this because, after initial setup, the renter and host can focus on just exchanging the minimum amount of data as fast as possible. For example, currently the renter has to prove to the host that it can sign with the renter key before any transfers take place -- but this only has to happen once.

If renter does not prove that he can sign with the renter key, then the only extra information that can be sent once in the beginning is contract ID. We can create an RPC that starts with sending contract ID once and then the renter can send download/upload/metadata requests, sending only needed data. The renter doesn't wait after sending contract ID, so it still one round trip even for the first sub-request.

Stateful handlers can keep some state in memory, but they still have to sync with disk each new recent revision.

I like the idea with UDP. I had a slightly different idea of stateful download requests, which can be combined with UDP: pay first, download next. I.e. you paid for downloading of 1G and the host gives you a token, which you can use to download that 1G later (probably with some expiration time). Here you trust the host not to cheat. But you do it now as well, just for smaller size (one sector). If you know that the host works well, you can increase the amount you pay for in advance. Hosts can keep the table with download tokens in memory and sync it in background (if they crash, then some tokens will have slightly more download space then they paid for, which is fine). The same tokens can be reused with UDP: you can send the token and the target sector and coordinates in it in every UDP packet. Hosts should reward renters downloading pieces from the same sector rather than from different sectors. There can be two UDP RPCs: (1) download piece and (2) download piece and keep that sector in memory for some time. Both of them charge less if the sector is already in memory. But I think I am going into details too deep.

sub-sector downloads have such different requirements from the other RPCs that it may be worth devoting a separate RPC to them

In https://godoc.org/github.com/NebulousLabs/Sia/modules#DownloadAction I see Offset and Length parameters. Don't they already provide sub-sector downloads?

@MSevey MSevey added this to the v1.3.4 milestone Jun 25, 2018
@MSevey MSevey added v1.4.0 and removed v1.3.4 labels Jun 28, 2018
Boris Nagaev added 3 commits July 15, 2018 12:34
Function getRecentRevision downloads recent revision from the host
and checks its signatures. It takes the minimal data sufficient for
this operation: contract ID, secret key, window start and host version.

Function verifyRecentRevision calls getRecentRevision and compares
the recent revision from host with locally known recent revision.
It takes a private type (contractHeader) used by other code in the package.
This is second attempt of NebulousLabs#2321

The whole list of IDs can be expensive for a host to provide for free,
but a short list of sector IDs (up to 2^17) is cheap. At the same time
it opens doors for stateless clients, i.e. recovering everything from
seed only.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants