-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Br v4.0.0 #103
Br v4.0.0 #103
Conversation
05a50f1
to
a0aa58e
Compare
We were having trouble commenting the diff in #103 and have no clue why it's not working. Maybe this will fix it?
Note I did manage to re-indent with If someone has better ideas let me know! |
/diff |
Here are your diffs for this pull request
|
I think we should just bend to the formatter. One thing we could do to avoid this is to run 3.0.1 through the formatter in this PR as well so that the diff appears the same. The formatter will not change any structural information of the JSON, just its format. From there, we can add guidance for future additions to run it through the formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update, I just added a question about the derived_tasks_ids
.
v4.0.0/tasks-schema.json
Outdated
"derived_task_ids": { | ||
"description": "Names of derived task IDs, i.e. task IDs whose values are derived from (and therefore dependent on) the values of other variables.", | ||
"type": [ | ||
"array", | ||
"null" | ||
], | ||
"uniqueItems": true, | ||
"items": { | ||
"type": "string" | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong here but should it not be at the "round" level, as depending on the round, you might have so really different task IDs ? (or is it already?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's a hub property overall in that I can't see many situations where a task ID is derived in one round and not derived in another. The chances of it being stable are far greater than the property changing by round so it would be annoying to have to re-define it again in every round.
I'll make sure however that if a new derived task ID is added to a new round and the task id name added to derived_task_ids
, it will not affect validation of older rounds, i.e. if the task ID is not present in the data, derived_task_ids
will just be ignored.
The is the potential to allow for a derived_task_ids
property at the round level that overrides the overall property but I think we can wait and see if anyone requests that as it feels like an extreme edge case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the additional information.
You make a really good point, however I am worrying a bit about the scenario hubs. They tend to have lot of rounds and have different tasks id for some round as for the need of a specific round or scenario you might need to create new column that are not replicated in the next round.
I am not sure how frequent this new columns will be tagged as derived tasks ids, but even saying that it looks to me a little bit weird to have the tasks id as hub level as they are not define at hub level but at round level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. The problem though does not arise from a derived task id not being in all rounds. The problem is if say you create a task id as a derived taks id and then later one change that (i.e. the task ID is no longer derived. That's the only situation where this could be problematic but seems to me not good practice within a hub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don't want everyone to have to respecify this property at each round as for the majority of hubs this is very stable so we will for sure keep the overall high level specification. As mentioned there is the option for overriding at the round but I do not want to implement unless its actually necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem though does not arise from a derived task id not being in all rounds
I misunderstood and I was afraid that would be an issue but if not, that solve some of my problems! Apologize for that!
I agree that changing the behavior of a column is bad practice and should not be supported.
I definitely don't want everyone to have to respecify this property at each round as for the majority of hubs this is very stable so we will for sure keep the overall high level specification.
I also agree, that's to much to ask for something that might not be used often.
And the optional overriding seems like a good idea, and I think a hub can do their own wrapper/function to deal with it if necessary.
So to summarize, thank you for the additional information, I change my mind and I am ok with it being at hub level!
Good option! See #106
I opened the following issue to discuss approach: #107 |
/diff |
1 similar comment
/diff |
Merge branch 'main' into br-v4.0.0 # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
Hey @LucieContamin ! I added a round level property now too so you can now use that to override the global property. Glad you asked for this feature as going back to introduce it I found that the global |
/diff |
Add `target_keys` property schema
/diff |
point estimatesSomething that was brought up in response to reichlab/variant-nowcast-hub#117 (comment) is that the Now that we are using
This is what I think it would look like in the schema:
DemoHere's a demo that shows that hub_con <- hubData::connect_hub(
system.file("testhubs/simple", package = "hubUtils")
)
hp <- attributes(hub_con)$hub_path
# Get the tasks file which has a `mean` output type
tasks <- fs::path(hp, "hub-config", "tasks.json")
# copy the file to a new temp file
tmp <- withr::local_tempfile()
fs::file_copy(tasks, tmp, overwrite = TRUE)
# The two files are identical
unname(tools::md5sum(tmp) == tools::md5sum(tasks))
#> [1] TRUE
cfg <- readLines(tmp)
writeLines(cfg[231:240]) # optional is NA
#> "output_type": {
#> "mean": {
#> "output_type_id": {
#> "required": null,
#> "optional": ["NA"]
#> },
#> "value": {
#> "type": "integer",
#> "minimum": 0
#> }
# Change '["NA"]' to '[null]'
nullcfg <- sub('["NA"]', '[null]', cfg, fixed = TRUE)
writeLines(nullcfg[231:240]) # optional is now null
#> "output_type": {
#> "mean": {
#> "output_type_id": {
#> "required": null,
#> "optional": [null]
#> },
#> "value": {
#> "type": "integer",
#> "minimum": 0
#> }
writeLines(nullcfg, con = tmp) # changing to null
# The two files are no longer identical
unname(tools::md5sum(tmp) == tools::md5sum(tasks))
#> [1] FALSE
waldo::compare(cfg, nullcfg)
#> old[81:87] vs new[81:87]
#> " \"mean\": {"
#> " \"output_type_id\": {"
#> " \"required\": null,"
#> - " \"optional\": [\"NA\"]"
#> + " \"optional\": [null]"
#> " },"
#> " \"value\": {"
#> " \"type\": \"integer\","
#>
#> old[232:238] vs new[232:238]
#> " \"mean\": {"
#> " \"output_type_id\": {"
#> " \"required\": null,"
#> - " \"optional\": [\"NA\"]"
#> + " \"optional\": [null]"
#> " },"
#> " \"value\": {"
#> " \"type\": \"integer\","
# The resulting R object after reading in JSON are identical
identical(
jsonlite::fromJSON(tasks, simplifyDataFrame = FALSE),
jsonlite::fromJSON(tmp, simplifyDataFrame = FALSE)
)
#> [1] TRUE Created on 2024-10-18 with reprex v2.1.1 |
Co-authored-by: Evan Ray <[email protected]>
Co-authored-by: Evan Ray <[email protected]>
…ut-type-ids/109 Encode point estimate output type IDs with null.
…/114 v4 - Do not allow additional properties in lower level objects
/diff |
This PR:
is_required
boolean property at theoutput_type
level to configure whether the output type is required for submissions to be considered valid (introduceis_required
field for all output_types #99).optional
property inoutput_type_id
objects. As such, when a given output type is submitted, values for all output type IDs much be submitted (disallow optional values in pmf output_type specifications #100,disallow optional values in cdf output_type specifications #101, disallow optional values in quantile output_type specifications #102).derived_task_ids
property to enable hub administrators to define derived task IDs (i.e. task IDs whose values depend on the values of other task IDs) at a hub level. This allows primarily validation functionality to ignore such task IDs when appropriate which can significantly improve validation efficency (Create property to record derived task IDs #96 ). For more information seehubValidations
documentation on ignoring derived task IDs.hubDocs