Skip to content

Commit

Permalink
Commit for that issue thing.
Browse files Browse the repository at this point in the history
  • Loading branch information
LTLA committed Jun 8, 2024
1 parent dd7edff commit 9ae8184
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 13 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,20 @@ we do not impose schemas on the metadata format,
and we re-use the existing storage facilities on the HPC cluster.
SewerRat can be considered a much more relaxed version of the [Gobbler](https://github.com/ArtifactDB/gobbler) that federates the storage across users.

For convenience, we'll assume that the URL to the SewerRat API is present in an environment variable named `SEWER_RAT_URL`.
Readers should obtain an appropriate URL for their SewerRat deployment before trying the code examples below.
Alternatively, readers can spin up their own instance on `localhost` by running the binaries [here](https://github.com/ArtifactDB/SewerRat/releases)
or building the executable from source with the usual `go build .` command.

## Registering a directory

### Step-by-step

Any directory can be indexed as long as (i) the requesting user has write access to it and (ii) the account running the SewerRat service has read access to it.
To demonstrate, let's make a directory containing JSON-formatted metadata files.
Other files may be present, of course, but SewerRat only cares about the metadata.
Metadata files can be anywhere in this directory (including within subdirectories) and they can have any base name (here, `A.json` and `B.json`).
If subdirectories are present, these will be searched recursively for metadata files, though any subdirectory starting with `.` will not be searched.
The base names of the metadata files are left to the user's discretion - here, `A.json` and `B.json`.

```shell
mkdir test
Expand All @@ -31,12 +37,6 @@ echo '{ "authors": { "first": "Aaron", "last": "Lun" } }' > test/sub/A.json
echo '{ "foo": "bar", "gunk": [ "stuff", "blah" ] }' > test/sub/B.json
```

For convenience, we'll store the SewerRat API in an environment variable.

```shell
export SEWER_RAT_URL=<INSERT URL HERE> # get this from your SewerRat admin.
```

To start the registration process, we make a POST request to the `/register/start` endpoint.
This should have a JSON-encoded request body that contains the `path`, the absolute path to our directory that we want to register.

Expand All @@ -54,8 +54,8 @@ curl -X POST -L ${SEWER_RAT_URL}/register/start \
On success, this returns a `PENDING` status with a verification code.
The caller is expected to verify that they have write access to the specified directory by creating a file with the same name as the verification code (i.e., `.sewer_XXX`) inside that directory.
Once this is done, we call the `/register/finish` endpoint with a request body that contains the same directory `path`.
The body may also contain `base`, an array of strings containing the names of the files to register within the directory -
if this is not provided, only files named `metadata.json` will be registered.
The body may also contain `base`, an array of strings containing the names of the metadata files in the directory to be indexed -
if this is not provided, only files named `metadata.json` will be indexed.

```shell
curl -X POST -L ${SEWER_RAT_URL}/register/finish \
Expand All @@ -67,7 +67,7 @@ curl -X POST -L ${SEWER_RAT_URL}/register/finish \
## }
```

On success, the files in the specified directory will be registered in the SQLite index.
On success, the metadata files in the specified directory will be incorporated into the SQLite index.
We can then [search on the contents of these files](#querying-the-index) or [fetch the contents of any file](#fetching-file-contents) in the registered directory.
On error, the response usually has the `application-json` content type, where the body encodes a JSON object with an `ERROR` status and a `reason` string property explaining the reason for the failure.
Note that some error types (e.g., 404, 405) may instead return a `text/plain` content type with the reason directly in the response body.
Expand Down
8 changes: 7 additions & 1 deletion list.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"io/fs"
"os"
"path/filepath"
"strings"
)

func listFiles(dir string, recursive bool) ([]string, error) {
Expand Down Expand Up @@ -59,7 +60,12 @@ func listMetadata(dir string, base_names []string) (map[string]fs.FileInfo, []st
}

if d.IsDir() {
return nil
base := filepath.Base(path)
if strings.HasPrefix(base, ".") {
return fs.SkipDir
} else {
return nil
}
}

if _, ok := curnames[filepath.Base(path)]; !ok {
Expand Down
37 changes: 35 additions & 2 deletions list_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -188,13 +188,13 @@ func TestListMetadata(t *testing.T) {
t.Fatal(err)
}

err = os.Symlink(subdir, filepath.Join(dir, "bar.json"))
err = os.Symlink(subdir, filepath.Join(dir, "symlinked"))
if err != nil {
t.Fatal(err)
}

t.Run("symlink", func(t *testing.T) {
found, fails, err := listMetadata(dir, []string{ "foo.json", "bar.json" })
found, fails, err := listMetadata(dir, []string{ "foo.json", "B.json" })
if err != nil {
t.Fatal(err)
}
Expand All @@ -203,6 +203,7 @@ func TestListMetadata(t *testing.T) {
t.Fatal("unexpected failures")
}

// B.json in the linked directory should be ignored as we don't recurse into them.
if len(found) != 1 {
t.Fatal("expected exactly one file")
}
Expand All @@ -215,4 +216,36 @@ func TestListMetadata(t *testing.T) {
t.Fatal("expected file info from link target")
}
})

// Throwing in a hidden directory.
dotsubdir := filepath.Join(dir, ".git")
err = os.Mkdir(dotsubdir, 0755)
if err != nil {
t.Fatalf("failed to create a temporary subdirectory; %v", err)
}

dotsubpath2 := filepath.Join(dotsubdir, "B.json")
err = os.WriteFile(dotsubpath2, []byte(""), 0644)
if err != nil {
t.Fatalf("failed to create a mock file; %v", err)
}

t.Run("symlink", func(t *testing.T) {
found, fails, err := listMetadata(dir, []string{ "B.json" })
if err != nil {
t.Fatal(err)
}
if len(fails) > 0 {
t.Fatal("unexpected failures")
}

// B.json in the linked directory should be ignored as we don't recurse into them.
if len(found) != 1 {
t.Fatal("expected exactly one file")
}
_, ok := found[filepath.Join(dir, "sub/B.json")]
if !ok {
t.Fatal("missing file")
}
})
}

0 comments on commit 9ae8184

Please sign in to comment.