-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metadata #19
Comments
So https://x.com/i_steves/status/1017569725340151809 also Otherwise, I'd need to always read the |
So metadata can actually be saved in Parquet files, so I'd still go for that as our storage format. Now the question should rather be: What metadata should we support by default?
Maybe we should try to add an Operations category too, so the operations that have been performed on the data automatically gets saved with the data (e.g. how is the data filtered/smoothed, what's the average confidence, etc.). Maybe a kind of data integrity category. For the record, this is how it is added, accessed, saved and read: library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
attributes(df)$units <- c("seconds", "minutes")
attributes(df)$metadata <- list(
one_thing = "here",
another = "there",
a_vector = c(1,2,3,4,5),
a_double_vector = c(1.22,1.55)
)
attributes(df)$metadata
#> $one_thing
#> [1] "here"
#>
#> $another
#> [1] "there"
#>
#> $a_vector
#> [1] 1 2 3 4 5
#>
#> $a_double_vector
#> [1] 1.22 1.55
write_parquet(df, "test.parquet")
a <- read_parquet("test.parquet", as_data_frame = TRUE)
attributes(a)
#> $names
#> [1] "col1" "col2"
#>
#> $row.names
#> [1] 1 2 3
#>
#> $class
#> [1] "tbl_df" "tbl" "data.frame"
#>
#> $units
#> [1] "seconds" "minutes"
#>
#> $metadata
#> $metadata$one_thing
#> [1] "here"
#>
#> $metadata$another
#> [1] "there"
#>
#> $metadata$a_vector
#> [1] 1 2 3 4 5
#>
#> $metadata$a_double_vector
#> [1] 1.22 1.55 Created on 2024-11-06 with reprex v2.1.1 |
I've restructured this issue in light of metadata being an option in Parquet files - anything prior is found below the line.
So metadata can actually be saved in Parquet files, so I'd still go for that as our storage format. Now the question should rather be: What metadata should we support by default?
movement
- or any of the tracking software if they do this)datetime
timestamp, see Time and timestamp conversion #66)difftime
, see Time and timestamp conversion #66)normal
?,cv
for computer vision where increasing y goes "down", other)Would be a good way to e.g. keep track of units (
frame
/s
ordots
/pixels
/cm
). See https://stackoverflow.com/a/68675903. Also to the dataframe itself (throughattr
), which could e.g. be the starting time stamp when present - that way we can always dig it out despite having converted to seconds since start (so we can convert back and forth between absolute and relative time).The text was updated successfully, but these errors were encountered: