Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Where to store attachment metadata? #903

Open
gmaclennan opened this issue Oct 16, 2024 · 2 comments
Open

Question: Where to store attachment metadata? #903

gmaclennan opened this issue Oct 16, 2024 · 2 comments

Comments

@gmaclennan
Copy link
Member

We currently have two places where we can store information about an attachment: In an attachment record, which is part of an observation, or using the arbitrary JSON file metadata supported by hyperdrive.

Currently in the attachment record we store:

  • Attachment name & hyperdrive id (which identify the linked file(s) in hyperdrive)
  • Hash of the original attachment
  • type of the attachment (photo, video etc)

Previously we stored just the mimeType in hyperdrive metadata, but now we are storing some photo metadata in there too.

I think there is a difference between "an attachment" and "a file/blob". We generate multiple versions of some attachments (e.g. photos), so there is more than one file per attachment.

It feels like the "correct" thing to do is put information about the file in the hyperdrive metadata, and information about the attachment in the attachment record, although I'm not sure there is a clear logical distinction between these two.

The advantage of attachment records is that they are stored with protobuf and we have some guarantees about the structure / type of the data. The metadata from hyperdrive could be arbitrary JSON, so we kind of need to treat it as unknown and validate it to get what we want.

Another advantage of attachment records is that the information is available with the observation, it does not require additional requests.

Keeping information in metadata also requires a separate approach for accessing a history of the information in there and validating signatures.

For me it feels like most additional information should be in the attachment record, although I don't feel able to make a strong argument for that. Plus we are currently putting additional metadata into the hyperdrive metadata records... so...

I would welcome feedback and opinions on this! I think it's early enough that we could move what we currently have in hyperdrive metadata into attachments and create a basic fallback.

@EvanHahn
Copy link
Contributor

I agree.

Advantages of putting metadata on the attachments property of observations:

  • Structured data with Protobuf (as you say)
  • Fewer database lookups (as you say)
  • Metadata is available when the blob isn't available
  • Easier querying (e.g., "give me all observations with a thumbnail")

Disadvantages:

  • Slightly more difficult in a future where attachments could be part of multiple data types, e.g. tracks (probably not that bad)
  • This refactor requires additional work

I would personally opt to put all blob metadata onto attachments, even the MIME type, because (1) it's a bit simpler to have all the data in one place (2) you could infer the attachment type from the MIME type. But I don't feel strongly about this detail.

I'm not sure we have time to implement this, but if we decide it's a priority, I think we should put metadata on attachments.

In whatever case, I think #901 is a step in the right direction there.

@gmaclennan
Copy link
Member Author

I would personally opt to put all blob metadata onto attachments, even the MIME type

As discussed, MIME type is better in hyperdrive metadata, not the attachment, because different variants could have different mimetypes, e.g. an audio file preview could be in a more compressed format like .ogg or .3gp, and a thumbnail could be a waveform image. Anything that could differ by variant should be in the hyperdrive metadata, since it's per blob, and there's a one-to-many relationship between an attachment and blobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants