-
Notifications
You must be signed in to change notification settings - Fork 440
Sia file format #3113
base: master
Are you sure you want to change the base?
Sia file format #3113
Conversation
efd41bb
to
f842b24
Compare
modules/renter/siafile/siafile.go
Outdated
func (sf *SiaFile) ErasureCode() modules.ErasureCoder { | ||
sf.mu.RLock() | ||
sf.mu.RUnlock() | ||
return sf.erasureCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a defer here (also missing on a few other functions)
modules/renter/siafile/siafile.go
Outdated
siaPath: siaPath, | ||
}, | ||
erasureCode: erasureCode, | ||
uid: base32.StdEncoding.EncodeToString(fastrand.Bytes(20))[:20], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used for? Why is it base 32? Why do you slice it to the first 20 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It's just the way we have done it before :P
Previously it was done by a method called RandomSuffix
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the uid is not persisted ever, we can just use fastrand.Bytes, there's no need to encode it.
I've been wondering whether we should persist the uid, but I think that could cause problems.
And, if we aren't going to persist the uid but instead keep everything in memory, another approach would be to use a uid counter and just count up each time that we add a file, because all we are really looking for is uniqueness, and it doesn't need any sort of collision resistance or security from an external party.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the downside to a counter of course is that you need to keep a counter somewhere.
modules/renter/siafile/siafile.go
Outdated
// Get the index of the host in the public key table. | ||
tableIndex := -1 | ||
for i, hpk := range sf.pubKeyTable { | ||
if reflect.DeepEqual(hpk, pk) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflection is Overkill and always makes me uneasy; I'd prefer hpk.Algorithm == pk.Algorithm && bytes.Equal(hpk.Key, pk.Key)
.
modules/renter/siafile/siafile.go
Outdated
defer sf.mu.RUnlock() | ||
if chunkIndex >= uint64(len(sf.chunks)) { | ||
return nil, fmt.Errorf("index %v out of bounds (%v)", | ||
chunkIndex, len(sf.chunks)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems reasonable to panic here, since it's similar to a slice out of bounds
modules/renter/siafile/metadata.go
Outdated
fileSize int64 // total size of the file | ||
masterKey crypto.TwofishKey // masterkey used to encrypt pieces | ||
pieceSize uint64 // size of a single piece of the file | ||
trackingPath string // file to the local copy of the file used for repairing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to use a name like localPath
modules/renter/siafile/metadata.go
Outdated
masterKey crypto.TwofishKey // masterkey used to encrypt pieces | ||
pieceSize uint64 // size of a single piece of the file | ||
trackingPath string // file to the local copy of the file used for repairing | ||
siaPath string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth having a comment here too, someone completely new to the codebase is not going to understand immediately the difference between the siapath and the localpath.
// The following fields are the usual unix timestamps of files. | ||
modTime time.Time // time of last content modification | ||
changeTime time.Time // time of last metadata modification | ||
accessTime time.Time // time of last access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we intending to have the accessTime and the downloadTime be the same?
My guess is not, does that mean that we should add a downloadTime as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just having the accessTime
is fine since that is also the way Unix does it. modTime
is bascially a subset of accessTime
since it only includes uploads.
If you just want to know the last time you manually downloaded a file, you can still take a look at the download history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.
Doesn't a download count as an access though? You did technically access the file if you performed a download.
And, are we treating these only as applies to the logical data? Because that means that repairing would not update any of these files at all (which I think is correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a download would be an access. Once we support updating the file we would also reset modTime
when we do so.
I'm not sure about repairing since it also downloads the file. Would make sense for it not to change the timestamp though.
modules/renter/siafile/metadata.go
Outdated
// chunkHeaderSize is the size of each of the following chunk's metadata. | ||
chunkHeaderSize uint64 | ||
// chunkBodySize is the size of each of the following chunk's bodies. | ||
chunkBodySize uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two fields and comments aren't perfectly clear.
IF I understand correctly, the chunkHeaderSize is the amount of data in the siafile that is allocated to each chunk, which tells you how to perform random access within the siafile to read and modify chunk metadata.
And then chunkBodySize is the size of the logical data within the chunk so that you know what part of the logical file each chunk index maps to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I'm actually not sure what does fields were about anymore :P I don't think we need both of them. We can always figure out the logical size of a chunk using the erasure coding and the pieceSize.
modules/renter/siafile/metadata.go
Outdated
// need to resize later on. | ||
// | ||
// chunkOffset is the offset of the first chunk, forced to be a factor of | ||
// 4096, default 16kib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure that a default of 4096 is not good enough? A default of 16kib means that there's going to be a lot of metadata overhead for tiny files.
So much actually that it makes sense to have really small files (<64kb) just to be stored entirely instead of the chunks, which I think is probably the best solution here, heh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is just a possible value we talked. We can also do 4kib instead.
modules/renter/siafile/metadata.go
Outdated
// chunkOffset is the offset of the first chunk, forced to be a factor of | ||
// 4096, default 16kib | ||
// | ||
// pubKeyTableOffset is the office of the publicKeyTable within the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can I run for office of the publicKeyTable
? :p
modules/renter/siafile/metadata.go
Outdated
) | ||
|
||
// Available indicates whether the file is ready to be downloaded. | ||
func (sf *SiaFile) Available(offline map[string]bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function probably does not belong in the metadata file
modules/renter/siafile/siafile.go
Outdated
// size of the metadata on disk should always be a multiple of 4kib. | ||
// The metadata is also the only part of the file that is JSON encoded | ||
// and can therefore be easily extended. | ||
metadata Metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukechampine @ChrisSchinnerl what do you think about embedding the Metadata
field into the SiaFile struct, that way every method and field of the metadata can be accessed directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I didn't do that is because the Metadata
will be JSON encoded. That way we can just pass the Metadata
field directly to the encoder. Otherwise we need a separate struct for that.
modules/renter/siafile/siafile.go
Outdated
|
||
// utility fields. These are not persisted. | ||
deleted bool | ||
erasureCode modules.ErasureCoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the erasure code be a field of the chunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this is a field of the file when each chunk has its own erasure coding settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I'll change it so that it always uses siafile.chunks[0].erasureCode
but we need to think about how we would like to report the redundancy on a file that has different settings for each chunk.
modules/renter/siafile/siafile.go
Outdated
// erasureCodeType. Currently params will be parsed as follows: | ||
// Reed Solomon Code - 4 bytes dataPieces / 4 bytes parityPieces | ||
// | ||
erasureCodeType [4]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a full 4 bytes to specify the erasure code type? :p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. It's more of a placeholder right now though so we can change that in the persistence PR together with other persistence related fields.
modules/renter/siafile/siafile.go
Outdated
|
||
// AddPiece adds an uploaded piece to the file. It also updates the host table | ||
// if the public key of the host is not aleady known. | ||
func (sf *SiaFile) AddPiece(pk types.SiaPublicKey, chunkIndex, pieceIndex uint64, merkleRoot crypto.Hash) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add a TODO here so that we remember to add persistence changes when we move the persistence over.
// contains all the necessary information to recover a file from its hosts and | ||
// allows for easy constant-time updates of the file without having to read or | ||
// write the whole file. | ||
SiaFile struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be able to make a bunch of the fields static for both the SiaFile and the Metadata, right? then we can minimize lock contention when working with the files, something that for some situations may end up being fairly serious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wanted to do that once everything else in this PR looks good to you and won't be changed anymore.
modules/renter/siafile/siafile.go
Outdated
KeyNonce [4]byte // nonce used for encrypting the piece | ||
HostPubKey types.SiaPublicKey // public key of the host | ||
MerkleRoot crypto.Hash // merkle root of the piece | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this is something we aren't going to be able to do soon, but I do think that we are going to at some point need a mapping in the contractor of all of the pieces that are in each contract. it'll probably need to be on disk, but it's the only way I can think of to be sure that we actually have the data still in the contract, especially once we build in the rpc to allow hosts to tell us that they've lost some pieces (in the event of a corrupted disk, for example) but not all of them in the contract.
Once we have that mapping, we genuinely won't need 32 bytes of the merkle root per piece, we'll be able to safely trim it down to about 10 bytes and still have high confidence of correctness, because every piece in our contracts was of our own choosing, and if there's ever a conflict / collision we can just change the key nonce for the piece and try again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the note of key nonces, a disadvantage of this strategy is that you can't modify only part of a chunk, if you are going to modify a file on Sia you have to modify the whole chunk to stay secure. I'm not sure if there are encryption strategies that allow you to perform safe in-place modification without needing a ton of nonce overhead though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't XTS let you do that without a nonce? I think it is used by Microsoft in Bitlocker.
06f6642
to
755650e
Compare
0da8060
to
a5f0b70
Compare
…eating unimplemented methods
1d58da8
to
6181e0a
Compare
func (r *Renter) saveFile(f *file) error { | ||
if f.deleted { | ||
func (r *Renter) saveFile(f *siafile.SiaFile) error { | ||
if f.Deleted() { // TODO: violation of locking convention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. I'm not sure there's an easy path to fix this.
Typically you'd just turn saveFile
into managedSaveFile
, but I think that there are places where we need to keep holding the lock before and after we call saveFile
to be safe.
modules/renter/siafile/siafile.go
Outdated
|
||
// Piece represents a single piece of a chunk on disk | ||
Piece struct { | ||
KeyNonce [4]byte // nonce used for encrypting the piece |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the KeyNonce anymore now that we have decided to switch to threefish
modules/renter/siafile/metadata.go
Outdated
staticVersion [16]byte // version of the sia file format used | ||
staticFileSize int64 // total size of the file | ||
staticMasterKey crypto.TwofishKey // masterkey used to encrypt pieces | ||
staticPieceSize uint64 // size of a single piece of the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need 2 more fields for encryption, the first field needs to indicate the type of encryption, and the second field needs to be a backup field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*a backup key. Siafiles will have 2 keys, one that you give out when you share the file and one that you don't give out. The second key prevents hosts from deduplicating files that you've shared, because they can't decrypt the pieces you didn't give out the second key for.
Then I think at the chunk level we'll need a byte indicating what piece index starts using the backup key instead of the master key for that chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have partial file sharing?
modules/renter/siafile/siafile.go
Outdated
// never go below 1. | ||
if minRedundancy < 1 { | ||
return minRedundancyNoRenew | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if minRedundancy < 1 && minRedundancyNoRenew >= 1 {
return 1
} else if minRedunancy < 1 {
return minRedundancyNoRenew
}
return minRedundancy
modules/renter/upload.go
Outdated
) | ||
|
||
var ( | ||
// errUploadDirectory is returned if the user tries to upload a directory. | ||
errUploadDirectory = errors.New("cannot upload directory") | ||
) | ||
|
||
// newFile is a helper to more easily create a new Siafile for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment correct? Is this function in the right file? This isn't a testing file.
modules/renter/uploadheap.go
Outdated
// Iterate through the pieces of the file and mark which hosts are already | ||
// in use for the chunk. As you delete hosts from the 'unusedHosts' map, | ||
// also increment the 'piecesCompleted' value. | ||
for chunkIndex := uint64(0); chunkIndex < f.NumChunks(); chunkIndex++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is no longer correct, since the file format has changed.
modules/renter/uploadheap.go
Outdated
// This host has a piece, but it is the same piece another host | ||
// has. We should still remove the host from the unusedHosts | ||
// since one host having multiple pieces of a chunk might lead | ||
// to unexpected issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand this comment to explain that if we leave the host in the set of unused hosts, it's possible that one host gets multiple pieces, then the other redundant hosts on those pieces go offline, and then you have one host providing false redundancy?
modules/renter/siafile/siafile.go
Outdated
// Piece represents a single piece of a chunk on disk | ||
Piece struct { | ||
KeyNonce [4]byte // nonce used for encrypting the piece | ||
HostPubKey types.SiaPublicKey // public key of the host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch this out for an index in the pubKeyTable (2 bytes is enough)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change that in the PR that changes the persistence? Right now it would only make the code in compat.go
more complicated since the table doesn't have a fixed ordering.
Overall this is looking pretty good I think. |
f453955
to
b240acb
Compare
This is still missing a few things but I think it's ready for a first round of reviews. It should be pretty much done except for:
Something we should think about is how to make sure the redundancy/availablility reporting remains correct after changing the file format. Since we no longer store the file contract but only the public key of the host and since we don't remove the hosts from the sia file's host table, we don't know if a piece is supposed to be stored on a host. All we know is, that at some point in time a piece was stored on a host.
Another thing I have not yet decided on is when to update the 4 timestamps (
modTime
,changeTime
,accessTime
,createTime
). Right now my guess is, thatcreateTime
,modTime
, andchangeTime
would mostly stay the same until we add modifying files as a feature.accessTime
would probably updated every time we download a file.