Add an API to fetch repositories and list their branches. #12

plietar · 2025-01-09T18:13:00Z

This API is a replacement for the one currently provided by outpack_server.

A single instance of orderly.runner API is able to operate over an arbitrary number of Git repositories. The first time a URL is fetched, a git clone is performed. Subsequent fetches are incremental using git fetch.

Repositories are each stored in their own directory, named using the hash of the URL. Using the hash makes it easy to support arbitrary upstreams, without having to worry about encoding the URL characters as file names.

The existing endpoints still operate on the "root" repository, which is assumed to exist before the API is launched. Eventually all those APIs will be migrated to have a url argument.

inst/schema/repository_branches.json

inst/schema/repository_fetch_response.json

tests/testthat/helper-orderly-runner.R

docker/test/run-test

M-Kusumgar

Got a couple of comments but on the whole looks good!

R/api.R

M-Kusumgar · 2025-01-10T13:33:53Z

R/git.R

+#'   is raised.
+#' @return the path to the repository.
+repository_path <- function(base, url, check = TRUE) {
+  hash <- openssl::sha1(url)


i dont mind this, but also makes it less easy for us to poke around, you said its to help with weird characters and stuff in the url, whats a character that would break it that github allows?

Yeah I do get the less pokeable argument and was a bit worried about it. It is still possible to get the URL from a cloned repo though by doing git remote get-url origin, although of course it's not as obvious at a glance than just looking at directory names.

The "weird character" comes in large part from the fact that I did not want to assume anything about GitHub here, so we need to encode the full URL, not just org name and repo name. That include : and @, eg. [email protected]:org/repo (technically Linux allows any characters but NUL and / in filenames, but not really a game I want to play). We would need to deal with slashes (creat nested directories?), including leading slashes if the URL is actually a file path (which the tests do a lot). You also have to consider what the effect of . and .. in the URL has on the filesystem (should http://foo/bar/.. and http://foo map to the same repository in the cache?).

I suppose the alternative is to do a percent encoding of the URL, which would work to avoid all the special characters (but it would do nothing about . and ..), but you still have a problem if the URL is longer than 256 characters, which is the limit for filenames on Linux. Not a likely scenario, but a possible one.

Using a hash of the URL was a very easy to workaround all these problems by guaranteeing a fixed length, unique name using only safe characters.

ah okay i was just thinking a very simple directory structure like org/repo wasnt thinking about the whole url although is there any reason we even want the whole url? and github repo are pretty clean, they only allow the following:

and also . and .. are reserved names so i think thats chill too

but yh youre right about it being us relying on github

i do think in practice however, none of these issues will ever affect us with the org/repo directory structure but the hash with the whole url doesnt care about where the repo actually is so more general i guess

happy to leave it to your judgement!

if we do it with the hashes though, i think it would be useful to do a tiny script in another pr that prints a mapping of hashes to origin url by doing a simple dir walk so that we can easily debug

is there any reason we even want the whole url?

Just being able to use a file path as URL in unit tests is a pretty good use case, instead of having to mock GitHub or have some tedious fake setup

if we do it with the hashes though, i think it would be useful to do a tiny script in another pr that prints a mapping of hashes to origin url by doing a simple dir walk so that we can easily debug

Sure can do

M-Kusumgar · 2025-01-10T13:38:08Z

R/api.R

+    branches = data.frame(
+      name = branches$name,
+      commit_hash = branches$commit,
+      time = as.numeric(branches$updated),
+      message = message,
+      row.names = NULL
+    )


not a suggestion, just curious as to why you used data.frame instead of list, is it because it fills in the data frame and serialises each row and you dont have to wrap each branch name in scalar for example?

also are we not worrying about the new line characters? rich was worrying about them for some reason in the past, i cant remember what we decided on that

not a suggestion, just curious as to why you used data.frame instead of list, is it because it fills in the data frame and serialises each row and you dont have to wrap each branch name in scalar for example?

Err, I don't know. It just works. I don't even know how what the easy way to turn this into a list would be.

If I just replace data.frame by list and do no other transformation then the result is radically different. It would be one object with fields name, commit_hash, ... where each entry is an array of strings, which is not idiomatic JSON at all. Using data.frame gives the expected result because jsonlite has special support for it. Orderly2 does this all the time when serializing, eg. the list of files in a packet.

The other way would be to do the pivot myself and generate a list of lists, where each nested list contains a bunch of scalars. Doable but way more tedious and verbose.

also are we not worrying about the new line characters? rich was worrying about them for some reason in the past, i cant remember what we decided on that

I never understood this concern. JSON can serialize newline characters just fine. But yeah you are right this breaks with the outpack_server API. I can go back to using a list of strings if you prefer, but it makes no sense to me.

yh i mean feel free to include newline characters, we can always fix it pretty easily if needed, do you think its worth having the type as an array of 1 though? just thinking about the fact that we may have to migrate the database to be just string and then if we implement a newline char change in the future itll have to be migrated again to string arrays, idm either way

just thinking about the fact that we may have to migrate the database to be just string and then if we implement a newline char change in the future itll have to be migrated again to string arrays

This doesn't show up in the database does it?

maybe not, im not 100% sure but it shows up in the data class for the response, should be easy enough to change but might wanna make a quick ticket for it so we dont forget!

inst/schema/repository_branches.json

This API is a replacement for the one currently provided by outpack_server. A single instance of orderly.runner API is able to operate over an arbitrary number of Git repositories. The first time a URL is fetched, a `git clone` is performed. Subsequent fetches are incremental using `git fetch`. Repositories are each stored in their own directory, named using the hash of the URL. Using the hash makes it easy to support arbitrary upstreams, without having to worry about encoding the URL characters as file names. The existing endpoints still operate on the "root" repository, which is assumed to exist before the API is launched. Eventually all those APIs will be migrated to have a `url` argument.

M-Kusumgar

I approve

M-Kusumgar · 2025-01-14T11:30:21Z

R/main.R

@@ -1,6 +1,6 @@
 parse_main <- function(args = commandArgs(TRUE)) {
  usage <- "Usage:
-orderly.runner.server [options] <path>
+orderly.runner.server [options] <path> <repositories>


here too right?

Suggested change

orderly.runner.server [options] <path> <repositories>

orderly.runner.server [options] <path> <repositories-base-path>

plietar force-pushed the mrc-6120 branch 7 times, most recently from f0d9b3a to 9da5588 Compare January 10, 2025 09:22

plietar commented Jan 10, 2025

View reviewed changes

plietar force-pushed the mrc-6120 branch 3 times, most recently from d601784 to fa82fc8 Compare January 10, 2025 09:41

plietar commented Jan 10, 2025

View reviewed changes

docker/test/run-test Outdated Show resolved Hide resolved

plietar requested a review from M-Kusumgar January 10, 2025 09:43

plietar marked this pull request as ready for review January 10, 2025 09:43

plietar force-pushed the mrc-6120 branch from fa82fc8 to 883012f Compare January 10, 2025 09:47

M-Kusumgar reviewed Jan 10, 2025

View reviewed changes

plietar added 2 commits January 13, 2025 15:43

Code review

9353f30

plietar force-pushed the mrc-6120 branch from 597ae91 to 9353f30 Compare January 13, 2025 15:45

Use random queue ID

dfcdb13

plietar requested a review from M-Kusumgar January 13, 2025 17:02

M-Kusumgar approved these changes Jan 14, 2025

View reviewed changes

plietar merged commit b467256 into main Jan 14, 2025
7 checks passed

plietar deleted the mrc-6120 branch January 15, 2025 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an API to fetch repositories and list their branches. #12

Add an API to fetch repositories and list their branches. #12

plietar commented Jan 9, 2025

M-Kusumgar left a comment

M-Kusumgar Jan 10, 2025

plietar Jan 11, 2025

M-Kusumgar Jan 13, 2025

plietar Jan 13, 2025

M-Kusumgar Jan 10, 2025

plietar Jan 11, 2025

M-Kusumgar Jan 13, 2025

plietar Jan 13, 2025

M-Kusumgar Jan 14, 2025

M-Kusumgar left a comment

M-Kusumgar Jan 14, 2025

	orderly.runner.server [options] <path> <repositories>
	orderly.runner.server [options] <path> <repositories-base-path>

Add an API to fetch repositories and list their branches. #12

Add an API to fetch repositories and list their branches. #12

Conversation

plietar commented Jan 9, 2025

M-Kusumgar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

M-Kusumgar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment