Skip to content

Commit

Permalink
Merge pull request #194 from hubverse-org/znk/inst-schema/193
Browse files Browse the repository at this point in the history
Ship released schemas with hubUtils
  • Loading branch information
zkamvar authored Jan 13, 2025
2 parents b4ff26a + 4a7f1b7 commit 8af0884
Show file tree
Hide file tree
Showing 24 changed files with 16,808 additions and 0 deletions.
226 changes: 226 additions & 0 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,232 @@ Our procedures for contributing bigger changes, code in particular, generally fo
- We use [testthat](https://cran.r-project.org/package=testthat) for unit tests.
Contributions with test cases included are easier to accept.

## Synchronizing with `hubverse-org/schemas`

The official home for the hubverse schemas are at
https://github.com/hubverse-org/schemas. These schemas are copied over here
under the `inst/schemas` folder, which allows offline validation for hubs.

**If you are developing against an in-development version of the hubverse
schemas, you must ensure that the schemas in this repository are synchronized.**

### Synchronization script

The script that synchronizes the schemas is in
[data-raw/schemas.R](https://github.com/hubverse-org/hubUtils/blob/main/data-raw/schemas.R)
and it can be run from within R, as a standalone script, or as a git hook. It
takes one environment variable `HUBUTILS_SCHEMA_BRANCH`. **If the environment
variable is unset, the branch information from the `inst/schemas/update.json`
is used.**

#### Usage: within R

```r
source("data-raw/schemas.R")
```

#### Usage: from BASH

```bash
Rscript data-raw/schemas.R
```

#### Usage: commit hook

```bash
git push
```

See [Installing the Git Hook](#installing-the-git-hook). A Git hook is a way to
run a local script before or after you do something in Git. For example, a
pre-push hook (the one we use here) will run every time before you push to the
remote repository.

#### Details

By default, this script will make a single call to the GitHub API to determine
the status of the most recent commit on the branch listed in
`inst/schemas/update.json`. If the sha and branch match and the timestamp is
ahead of the the most recent commit, then you are good to go!

If an update is needed, then your system git is used to clone the branch and
copy it over to `inst/schemas`.

When running this as a script (not interactive), then when a schema update
happens, the tests are re-run.


### Synchronizing a development branch

In order to synchronize a development branch, you should set a temporary
environment variable called `HUBUTILS_SCHEMA_BRANCH` to the name of the branch.
This can only be done interactively in R or as a BASH script.

#### Via R

```r
Sys.setenv("HUBUTILS_SCHEMA_BRANCH" = "br-v4.0.1")
source("data-raw/schemas.R")
#> ✔ removing /path/to/hubUtils/inst/schemas
#> ✔ Creating inst/schemas/.
#> ℹ Fetching the latest version of the schemas from GitHub
#> Cloning into '/path/to/temp/folder'...
#> ✔ Copying v4.0.1, v4.0.0, v3.0.1, v3.0.0, v2.0.1, v2.0.0, v1.0.0, v0.0.1, v0.0.0.9,
#> and NEWS.md to inst/schemas
#> [ ... snip ... ]
#> ✔ Done
#> ✔ Schemas up-to-date!
#> ℹ branch: "br-v4.0.1"
#> ℹ sha: "43b2c8aceb3a316b7a1929dbe8d8ead2711d4e84"
#> ℹ timestamp: "2024-12-19T16:40:16Z"
Sys.unsetenv("HUBUTILS_DEV_BRANCH")
```

#### Via BASH

When run via script (both manually and via git hook), if any synchronization
happens, tests are automatically run:

```bash
HUBUTILS_SCHEMA_BRANCH=br-v4.0.1 Rscript data-raw/schemas.R \
&& unsetenv HUBUTILS_SCHEMA_BRANCH
#> ✔ removing /path/to/hubUtils/inst/schemas
#> ✔ Creating inst/schemas/.
#> ℹ Fetching the latest version of the schemas from GitHub
#> Cloning into '/path/to/temp/folder'...
#> ✔ Copying v4.0.1, v4.0.0, v3.0.1, v3.0.0, v2.0.1, v2.0.0, v1.0.0, v0.0.1, v0.0.0.9,
#> and NEWS.md to inst/schemas
#> [ ... snip ... ]
#> ✔ Done
#> ✔ Schemas up-to-date!
#> ℹ branch: "br-v4.0.1"
#> ℹ sha: "43b2c8aceb3a316b7a1929dbe8d8ead2711d4e84"
#> ℹ timestamp: "2024-12-19T16:40:16Z"
#>
#> ── ⚠ schema updated ──
#>
#> ! Re-running tests.
#> ℹ Testing hubUtils
#> ✔ | F W S OK | Context
#> ✔ | 7 | as_config
#> ✔ | 9 | as_model_out_tbl
#> ✔ | 5 | check_deprecated_schema
#> ✔ | 17 | model_id_merge
#> ✔ | 7 | read_config [6.4s]
#> ✔ | 7 | utils-get_hub
#> ✔ | 16 | utils-model_out_tbl
#> ✔ | 14 | utils-round_ids
#> ✔ | 8 | utils-round-config
#> ✔ | 39 | utils-schema-versions
#> ✔ | 14 | utils-schema [1.2s]
#> ✔ | 3 | utils-task_ids
#> ✔ | 7 | v3-schema-utils
#>
#> ══ Results ════════════════════════════
#> Duration: 9.0 s
#>
#> [ FAIL 0 | WARN 0 | SKIP 0 | PASS 153 ]
#> ✔ OK
```

### Installing the Git Hook

It is optional, but recommended to use this script as a pre-push hook so that
the schemas are checked for updates before each commit.

```r
usethis::use_git_hook("pre-push", readLines(usethis::proj_path("data-raw/schemas.R")))
```

This will create or overwrite `.git/hooks/pre-push`.

**If you want to uninstall the git hook, remove the `.git/hooks/pre-push`
file**

In addition to checking that the schemas in `inst/schemas` are synchronized, [as
demonstrated above](#via-bash), this hook will also check:

1. the local hook is up-to-date
2. the `inst/schemas` folder contents are all committed


When you install this as a git hook, you will get a message before every
successful push:

```
$ git push
#>
#> ── pre-push: schema synchronization ───────────────────────────────────────
#>
#> ── pre-push: checking that the hook is up-to-date ──
#>
#> ✔ Setting active project to "/path/to/hubUtils".
#> ✔ OK
#>
#> ── pre-push: checking that schemas are up-to-date ──
#>
#> → branch: "main"
#> → sha: "0163a89cc38ba3846cd829545f6d65c1e40501a6"
#> → timestamp: "2024-12-19T16:56:13Z"
#> ✔ OK
#>
#> ── pre-push: checking for changes in inst/schemas ──
#>
#> ✔ OK
```

#### When the schema updates

If the schemas are updated but not committed, this hook will prevent you from
pushing the changes until they are updated:

```
$ git push
#> [ ... snip ... ]
#>
#> ── ⚠ schema updated ──
#>
#> [ ... snip ... ]
#> ✔ OK
#>
#> ── pre-push: checking for changes in inst/schemas ──
#>
#> Error in `check_status()`:
#> ! New schemas must be committed before pushing.
#> Backtrace:
#> ▆
#> 1. └─global check_status(usethis::proj_path())
#> 2. └─cli::cli_abort(c("New schemas must be committed before pushing."))
#> 3. └─rlang::abort(...)
#> Execution halted
#> error: failed to push some refs to 'https://github.com/hubverse-org/hubUtils.git'
```

#### When the script changes

If the git hook script changes, you will be given instructions to update:

```
$ git push
#>
#> ── pre-push: schema synchronization ───────────────────────────────────────
#>
#> ── pre-push: checking that the hook is up-to-date ──
#>
#> ✔ Setting active project to "/path/to/hubUtils".
#> Error in `check_hook()`:
#> ! git hook outdated
#> ℹ Use `usethis::use_git_hook("pre-push", readLines(usethis::proj_path("data-raw/schemas.R")))`
#> to update your hook.
#> Backtrace:
#> ▆
#> 1. └─global check_hook(usethis::proj_path())
#> 2. └─cli::cli_abort(c("git hook outdated", i = "Use {.code {cmd}} to update your hook."))
#> 3. └─rlang::abort(...)
#> Execution halted
#> error: failed to push some refs to 'https://github.com/hubverse-org/hubUtils.git'
```

## Code of Conduct

Please note that the hubUtils project is released with a
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# hubUtils (development version)

* Released schemas are now shipped with the package, so an internet connection
is no longer necessary for local validation. Released versions of `hubUtils` will always only contain released versions of schemas while dev versions from `hubUtils` (installed from GitHub) may contain versions of schema under active development.
* Added `subset_task_id_names()` function to subset task ID names from a character vector of column names (#149).
* Added functions `subset_task_id_cols()` and `subset_std_cols()` to subset a `model_out_tbl` or submission `tbl` to task ID or standard (non-task ID) columns respectively (#149).

Expand Down
41 changes: 41 additions & 0 deletions R/utils-schema.R
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ get_schema_url <- function(config = c("tasks", "admin", "model"),
#' @examplesIf asNamespace("hubUtils")$not_rcmd_check()
#' get_schema_valid_versions()
get_schema_valid_versions <- function(branch = "main") {
if (branch == "main") {
schema_path <- system.file("schemas", package = "hubUtils")
return(list.files(schema_path, pattern = "^v"))
}
branches <- gh(
"GET /repos/hubverse-org/schemas/branches"
) %>%
Expand Down Expand Up @@ -75,6 +79,21 @@ get_schema_valid_versions <- function(branch = "main") {
#' schema_url <- get_schema_url(config = "tasks", version = "v0.0.0.9")
#' get_schema(schema_url)
get_schema <- function(schema_url) {
# If the branch is "main", then we can use the stored schemas inside the
# package.
pieces <- extract_schema_info(schema_url)
if (pieces$branch[1] == "main") {
version <- pieces$version
config <- pieces$config
path <- system.file("schemas", version, config, package = "hubUtils")
if (fs::file_exists(path)) {
return(jsonlite::prettify(readLines(path)))
} else {
cli::cli_alert_warning("{.file {version}/{config}} not found.
This could mean your version of hubUtils is outdated.
Attempting to connect to GitHub.")
}
}
response <- try(curl_fetch_memory(schema_url), silent = TRUE)

if (inherits(response, "try-error")) {
Expand All @@ -96,6 +115,28 @@ get_schema <- function(schema_url) {
}
}

#' Given a vector of URLs, this will extract the branch version and config for
#' each
#'
#' @param id a url for a given hubverse schema file
#' @return a data frame with three columns: branch, version, and config
#'
#' @noRd
#' @examples
#' urls <- c(
#' "https://raw.githubusercontent.com/hubverse-org/schemas/main/v3.0.1/tasks-schema.json",
#' "https://raw.githubusercontent.com/hubverse-org/schemas/main/v2.0.0/admin-schema.json",
#' "https://raw.githubusercontent.com/hubverse-org/schemas/br-v4.0.0/v4.0.0/tasks-schema.json"
#' )
#' extract_schema_info(urls)
extract_schema_info <- function(id) {
lead <- "^https[:][/][/]raw.githubusercontent.com[/]hubverse-org[/]schemas[/]"
good_stuff <- "(.+?)[/](v[0-9.]+?)[/]([a-z]+?-schema.json)$"
pattern <- paste0(lead, good_stuff)
proto <- setNames(character(3), c("branch", "version", "config"))
utils::strcapture(pattern, id, proto)
}

#' Get the latest schema version
#'
#' Get the latest schema version from the schema repository if "latest" requested
Expand Down
Loading

0 comments on commit 8af0884

Please sign in to comment.