Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an API to fetch repositories and list their branches. #12

Merged
merged 3 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Authors@R: c(person("Rich", "FitzJohn", role = c("aut", "cre"),
Description: Small HTTP server for running orderly reports.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
Roxygen: list(markdown = TRUE, roclets = c("rd", "namespace", "porcelain::porcelain_roclet"))
URL: https://github.com/mrc-ide/orderly.runner
BugReports: https://github.com/mrc-ide/orderly.runner/issues
Expand All @@ -21,6 +21,7 @@ Imports:
ids,
jsonlite,
orderly2 (>= 1.99.13),
openssl,
porcelain,
R6,
redux,
Expand Down
46 changes: 44 additions & 2 deletions R/api.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
##'
##' @param root Orderly root
##'
##' @param repositories_base_path Path in which Git repositories are
##' cloned.
##'
##' @param validate Logical, indicating if validation should be done
##' on responses. This should be `FALSE` in production
##' environments. See [porcelain::porcelain] for details
Expand All @@ -19,7 +22,8 @@
##'
##' @export
api <- function(
root, validate = NULL, log_level = "info",
root, repositories_base_path,
validate = NULL, log_level = "info",
skip_queue_creation = FALSE) {
logger <- porcelain::porcelain_logger(log_level)

Expand All @@ -31,7 +35,10 @@ api <- function(
}

api <- porcelain::porcelain$new(validate = validate, logger = logger)
api$include_package_endpoints(state = list(root = root, queue = queue))
api$include_package_endpoints(state = list(
root = root,
repositories_base_path = repositories_base_path,
queue = queue))
api
}

Expand All @@ -46,6 +53,41 @@ root <- function() {
}


##' @porcelain POST /repository/fetch => json(repository_fetch_response)
##' state repositories_base_path :: repositories_base_path
##' body data :: json(repository_fetch_request)
repository_fetch <- function(repositories_base_path, data) {
data <- jsonlite::parse_json(data)
r <- git_sync(repositories_base_path, data$url)

empty_object()
}


##' @porcelain GET /repository/branches => json(repository_branches)
##' state repositories_base_path :: repositories_base_path
##' query url :: string
repository_branches <- function(repositories_base_path, url) {
repo <- repository_path(repositories_base_path, url)
branches <- git_remote_list_branches(repo)
message <- vcapply(branches$commit, function(commit) {
gert::git_commit_info(repo = repo, ref = commit)$message
})

branches$message <- message
list(
default_branch = scalar(git_remote_default_branch_name(repo)),
branches = data.frame(
name = branches$name,
commit_hash = branches$commit,
time = as.numeric(branches$updated),
message = message,
row.names = NULL
)
Comment on lines +80 to +86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a suggestion, just curious as to why you used data.frame instead of list, is it because it fills in the data frame and serialises each row and you dont have to wrap each branch name in scalar for example?

also are we not worrying about the new line characters? rich was worrying about them for some reason in the past, i cant remember what we decided on that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a suggestion, just curious as to why you used data.frame instead of list, is it because it fills in the data frame and serialises each row and you dont have to wrap each branch name in scalar for example?

Err, I don't know. It just works. I don't even know how what the easy way to turn this into a list would be.

If I just replace data.frame by list and do no other transformation then the result is radically different. It would be one object with fields name, commit_hash, ... where each entry is an array of strings, which is not idiomatic JSON at all. Using data.frame gives the expected result because jsonlite has special support for it. Orderly2 does this all the time when serializing, eg. the list of files in a packet.

The other way would be to do the pivot myself and generate a list of lists, where each nested list contains a bunch of scalars. Doable but way more tedious and verbose.

also are we not worrying about the new line characters? rich was worrying about them for some reason in the past, i cant remember what we decided on that

I never understood this concern. JSON can serialize newline characters just fine. But yeah you are right this breaks with the outpack_server API. I can go back to using a list of strings if you prefer, but it makes no sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yh i mean feel free to include newline characters, we can always fix it pretty easily if needed, do you think its worth having the type as an array of 1 though? just thinking about the fact that we may have to migrate the database to be just string and then if we implement a newline char change in the future itll have to be migrated again to string arrays, idm either way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just thinking about the fact that we may have to migrate the database to be just string and then if we implement a newline char change in the future itll have to be migrated again to string arrays

This doesn't show up in the database does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not, im not 100% sure but it shows up in the data class for the response, should be easy enough to change but might wanna make a quick ticket for it so we dont forget!

)
}


##' @porcelain
##' GET /report/list => json(report_list)
##' query ref :: string
Expand Down
59 changes: 57 additions & 2 deletions R/git.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,48 @@
#' Get the storage path for a repository.
#'
#' Repositories are all stored as subdirectories of a common base path,
#' using a hash of their URL as the directory name.
#'
#' @param base the base directory in which all repositories are stored.
#' @param url the URL of the remote repository.
#' @param check if TRUE and the repository does not exist locally yet, an error
#' is raised.
#' @return the path to the repository.
repository_path <- function(base, url, check = TRUE) {
hash <- openssl::sha1(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont mind this, but also makes it less easy for us to poke around, you said its to help with weird characters and stuff in the url, whats a character that would break it that github allows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I do get the less pokeable argument and was a bit worried about it. It is still possible to get the URL from a cloned repo though by doing git remote get-url origin, although of course it's not as obvious at a glance than just looking at directory names.

The "weird character" comes in large part from the fact that I did not want to assume anything about GitHub here, so we need to encode the full URL, not just org name and repo name. That include : and @, eg. [email protected]:org/repo (technically Linux allows any characters but NUL and / in filenames, but not really a game I want to play). We would need to deal with slashes (creat nested directories?), including leading slashes if the URL is actually a file path (which the tests do a lot). You also have to consider what the effect of . and .. in the URL has on the filesystem (should http://foo/bar/.. and http://foo map to the same repository in the cache?).

I suppose the alternative is to do a percent encoding of the URL, which would work to avoid all the special characters (but it would do nothing about . and ..), but you still have a problem if the URL is longer than 256 characters, which is the limit for filenames on Linux. Not a likely scenario, but a possible one.

Using a hash of the URL was a very easy to workaround all these problems by guaranteeing a fixed length, unique name using only safe characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay i was just thinking a very simple directory structure like org/repo wasnt thinking about the whole url although is there any reason we even want the whole url? and github repo are pretty clean, they only allow the following:
image
and also . and .. are reserved names so i think thats chill too

but yh youre right about it being us relying on github

i do think in practice however, none of these issues will ever affect us with the org/repo directory structure but the hash with the whole url doesnt care about where the repo actually is so more general i guess

happy to leave it to your judgement!

if we do it with the hashes though, i think it would be useful to do a tiny script in another pr that prints a mapping of hashes to origin url by doing a simple dir walk so that we can easily debug

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason we even want the whole url?

Just being able to use a file path as URL in unit tests is a pretty good use case, instead of having to mock GitHub or have some tedious fake setup

if we do it with the hashes though, i think it would be useful to do a tiny script in another pr that prints a mapping of hashes to origin url by doing a simple dir walk so that we can easily debug

Sure can do

path <- file.path(base, hash)
if (check && !fs::dir_exists(path)) {
porcelain::porcelain_stop(
message = "Repository does not exist",
code = "NOT_FOUND",
status_code = 404)
}
path
}


#' Clone or fetch a remote repository.
#'
#' @param base the base directory in which all repositories are stored.
#' @param url the URL of the remote repository.
#' @return the path to the local clone of the repository.
git_sync <- function(base, url) {
repo <- repository_path(base, url, check = FALSE)
if (!fs::dir_exists(repo)) {
gert::git_clone(url = url, path = repo, bare = TRUE, verbose = FALSE)
} else {
gert::git_fetch(repo = repo, prune = TRUE, verbose = FALSE)
}
repo
}

git_remote_list_branches <- function(repo) {
branches <- gert::git_branch_list(repo, local = FALSE)
branches$name <- gsub("^refs/remotes/origin/", "", branches$ref)
branches <- branches[branches$name != "HEAD", ]
branches
}

git_run <- function(args, repo = NULL, check = FALSE) {
git <- sys_which("git")
if (!is.null(repo)) {
Expand All @@ -12,7 +57,7 @@ git_run <- function(args, repo = NULL, check = FALSE) {
}


git_get_default_branch <- function(repo = NULL) {
git_remote_default_branch_ref <- function(repo) {
# This is assuming remote origin exists. We'll get an error if it
# doesn't. But this should be safe for us as we'll always have cloned
# this from GitHub.
Expand All @@ -21,10 +66,20 @@ git_get_default_branch <- function(repo = NULL) {
}


git_remote_default_branch_name <- function(repo) {
ref <- git_remote_default_branch_ref(repo)
if (!is.null(ref)) {
gsub("^refs/remotes/origin/", "", ref)
} else {
NULL
}
}


git_get_modified <- function(ref, base = NULL,
relative_dir = NULL, repo = NULL) {
if (is.null(base)) {
base <- git_get_default_branch(repo)
base <- git_remote_default_branch_ref(repo)
}
if (is.null(relative_dir)) {
relative <- ""
Expand Down
5 changes: 3 additions & 2 deletions R/main.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
parse_main <- function(args = commandArgs(TRUE)) {
usage <- "Usage:
orderly.runner.server [options] <path>
orderly.runner.server [options] <path> <repositories>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too right?

Suggested change
orderly.runner.server [options] <path> <repositories>
orderly.runner.server [options] <path> <repositories-base-path>


Options:
--log-level=LEVEL Log-level (off, info, all) [default: info]
Expand All @@ -12,12 +12,13 @@ Options:
validate = dat$validate,
port = as.integer(dat$port),
path = dat$path,
repositories = dat$repositories,
host = dat$host)
}

main <- function(args = commandArgs(TRUE)) {
dat <- parse_main(args)
api_obj <- api(dat$path, dat$validate, dat$log_level)
api_obj <- api(dat$path, dat$repositories, dat$validate, dat$log_level)
api_obj$run(host = dat$host, port = dat$port)
}

Expand Down
20 changes: 20 additions & 0 deletions R/porcelain.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions R/util.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ scalar <- function(x) {
jsonlite::unbox(x)
}

empty_object <- function(x) {
# This is needed to get an empty JSON object.
# list() is `[]` and NULL may be `null` depending on the options passed to
# toJSON.
x <- list()
names(x) <- character(0)
x
}


package_version_string <- function(name) {
as.character(utils::packageVersion(name))
Expand Down
3 changes: 2 additions & 1 deletion docker/test/run-test
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ docker run --rm -d --pull=always \
-v $ORDERLY_VOLUME:$CONTAINER_ORDERLY_ROOT_PATH \
-v $ORDERLY_LOGS_VOLUME:$LOGS_DIR \
$ORDERLY_RUNNER_IMAGE \
$CONTAINER_ORDERLY_ROOT_PATH
$CONTAINER_ORDERLY_ROOT_PATH \
/repositories

docker run --rm -d --pull=always \
--net=$NETWORK \
Expand Down
39 changes: 39 additions & 0 deletions inst/schema/repository_branches.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"branches": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"commit_hash": {
"type": "string"
},
"message": {
"type": "string"
},
"time": {
"type": "number"
plietar marked this conversation as resolved.
Show resolved Hide resolved
}
},
"required": [
"name",
"commit_hash",
"message",
"time"
]
}
},
"default_branch": {
"type": "string"
}
},
"required": [
"branches",
"default_branch"
]
}
8 changes: 8 additions & 0 deletions inst/schema/repository_fetch_request.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"url": { "type": "string" }
},
"required": [ "url" ]
}
4 changes: 4 additions & 0 deletions inst/schema/repository_fetch_response.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object"
plietar marked this conversation as resolved.
Show resolved Hide resolved
}
11 changes: 10 additions & 1 deletion man/api.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions man/git_sync.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 23 additions & 0 deletions man/repository_path.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading