You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently have two places where we can store information about an attachment: In an attachment record, which is part of an observation, or using the arbitrary JSON file metadata supported by hyperdrive.
Currently in the attachment record we store:
Attachment name & hyperdrive id (which identify the linked file(s) in hyperdrive)
Hash of the original attachment
type of the attachment (photo, video etc)
Previously we stored just the mimeType in hyperdrive metadata, but now we are storing some photo metadata in there too.
I think there is a difference between "an attachment" and "a file/blob". We generate multiple versions of some attachments (e.g. photos), so there is more than one file per attachment.
It feels like the "correct" thing to do is put information about the file in the hyperdrive metadata, and information about the attachment in the attachment record, although I'm not sure there is a clear logical distinction between these two.
The advantage of attachment records is that they are stored with protobuf and we have some guarantees about the structure / type of the data. The metadata from hyperdrive could be arbitrary JSON, so we kind of need to treat it as unknown and validate it to get what we want.
Another advantage of attachment records is that the information is available with the observation, it does not require additional requests.
Keeping information in metadata also requires a separate approach for accessing a history of the information in there and validating signatures.
For me it feels like most additional information should be in the attachment record, although I don't feel able to make a strong argument for that. Plus we are currently putting additional metadata into the hyperdrive metadata records... so...
I would welcome feedback and opinions on this! I think it's early enough that we could move what we currently have in hyperdrive metadata into attachments and create a basic fallback.
The text was updated successfully, but these errors were encountered:
Advantages of putting metadata on the attachments property of observations:
Structured data with Protobuf (as you say)
Fewer database lookups (as you say)
Metadata is available when the blob isn't available
Easier querying (e.g., "give me all observations with a thumbnail")
Disadvantages:
Slightly more difficult in a future where attachments could be part of multiple data types, e.g. tracks (probably not that bad)
This refactor requires additional work
I would personally opt to put all blob metadata onto attachments, even the MIME type, because (1) it's a bit simpler to have all the data in one place (2) you could infer the attachment type from the MIME type. But I don't feel strongly about this detail.
I'm not sure we have time to implement this, but if we decide it's a priority, I think we should put metadata on attachments.
In whatever case, I think #901 is a step in the right direction there.
I would personally opt to put all blob metadata onto attachments, even the MIME type
As discussed, MIME type is better in hyperdrive metadata, not the attachment, because different variants could have different mimetypes, e.g. an audio file preview could be in a more compressed format like .ogg or .3gp, and a thumbnail could be a waveform image. Anything that could differ by variant should be in the hyperdrive metadata, since it's per blob, and there's a one-to-many relationship between an attachment and blobs.
We currently have two places where we can store information about an attachment: In an attachment record, which is part of an observation, or using the arbitrary JSON file metadata supported by hyperdrive.
Currently in the attachment record we store:
Previously we stored just the mimeType in hyperdrive metadata, but now we are storing some photo metadata in there too.
I think there is a difference between "an attachment" and "a file/blob". We generate multiple versions of some attachments (e.g. photos), so there is more than one file per attachment.
It feels like the "correct" thing to do is put information about the file in the hyperdrive metadata, and information about the attachment in the attachment record, although I'm not sure there is a clear logical distinction between these two.
The advantage of attachment records is that they are stored with protobuf and we have some guarantees about the structure / type of the data. The metadata from hyperdrive could be arbitrary JSON, so we kind of need to treat it as
unknown
and validate it to get what we want.Another advantage of attachment records is that the information is available with the observation, it does not require additional requests.
Keeping information in metadata also requires a separate approach for accessing a history of the information in there and validating signatures.
For me it feels like most additional information should be in the attachment record, although I don't feel able to make a strong argument for that. Plus we are currently putting additional metadata into the hyperdrive metadata records... so...
I would welcome feedback and opinions on this! I think it's early enough that we could move what we currently have in hyperdrive metadata into attachments and create a basic fallback.
The text was updated successfully, but these errors were encountered: