Skip to content

Commit

Permalink
Reorganized tests and docs to be clearer.
Browse files Browse the repository at this point in the history
  • Loading branch information
LTLA committed Jun 8, 2024
1 parent 9ae8184 commit 32b8129
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 29 deletions.
51 changes: 31 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,11 @@ or building the executable from source with the usual `go build .` command.

## Registering a directory

### Step-by-step
### Initialization

Any directory can be indexed as long as (i) the requesting user has write access to it and (ii) the account running the SewerRat service has read access to it.
To demonstrate, let's make a directory containing JSON-formatted metadata files.
Other files may be present, of course, but SewerRat only cares about the metadata.
If subdirectories are present, these will be searched recursively for metadata files, though any subdirectory starting with `.` will not be searched.
The base names of the metadata files are left to the user's discretion - here, `A.json` and `B.json`.

```shell
mkdir test
Expand All @@ -53,9 +51,12 @@ curl -X POST -L ${SEWER_RAT_URL}/register/start \

On success, this returns a `PENDING` status with a verification code.
The caller is expected to verify that they have write access to the specified directory by creating a file with the same name as the verification code (i.e., `.sewer_XXX`) inside that directory.
Once this is done, we call the `/register/finish` endpoint with a request body that contains the same directory `path`.
The body may also contain `base`, an array of strings containing the names of the metadata files in the directory to be indexed -
if this is not provided, only files named `metadata.json` will be indexed.

### Verification

Once this is done, we call the `/register/finish` endpoint with a JSON-encoded request body that contains the same directory path in `path`.
The body may also contain `base`, an array of strings containing the names of the metadata files in the directory to be indexed.
If `base` is not provided, only files named `metadata.json` will be indexed.

```shell
curl -X POST -L ${SEWER_RAT_URL}/register/finish \
Expand All @@ -67,35 +68,30 @@ curl -X POST -L ${SEWER_RAT_URL}/register/finish \
## }
```

On success, the metadata files in the specified directory will be incorporated into the SQLite index.
We can then [search on the contents of these files](#querying-the-index) or [fetch the contents of any file](#fetching-file-contents) in the registered directory.
On error, the response usually has the `application-json` content type, where the body encodes a JSON object with an `ERROR` status and a `reason` string property explaining the reason for the failure.
Note that some error types (e.g., 404, 405) may instead return a `text/plain` content type with the reason directly in the response body.
In either case, the verification code file is no longer needed after a response is received and can be deleted from the directory to reduce clutter.

We provide some small utility functions from [`scripts/functions.sh`](scripts/functions.sh) to perform the registration from the command line.
The process should still be simple enough to implement equivalent functions in any language.

### Behind the scenes

Once verified in `/register/finish`, SewerRat will walk recursively through the specified directory.
It will identify all files with the specified `base` names (i.e., `A.json` and `B.json` in our example above), parsing them as JSON for indexing.
Upon receiving a valid request, SewerRat will walk recursively through the directory specified in `path`.
It will identify all metadata files with the specified `base` names (i.e., `A.json` and `B.json` in our example above), parsing them as JSON for indexing.
SewerRat will skip any problematic files that cannot be indexed due to, e.g., invalid JSON, insufficient permissions.
The causes of any failures are reported in the `comments` array in the HTTP response.
Subdirectories with names starting with `.` are also skipped during the recursive walk, so any metadata files therein will be ignored.

Symbolic links in the specified directory are treated differently depending on their target.
If the directory contains symbolic links to files, the contents of the target files can be indexed as long as the link has one of the `base` names.
All file information (e.g., modification time, owner) is taken from the link target, not the link itself;
SewerRat effectively treats the symbolic link as a proxy for the target file.
If the directory contains symbolic links to other directories, these will not be recursively traversed.

On success, the metadata files in the specified directory will be incorporated into the SQLite index.
We can then [search on the contents of these files](#querying-the-index) or [fetch the contents of any file](#fetching-file-contents) in the registered directory.

### Automatic updates

SewerRat will periodically update the index by inspecting all of its registered directories for new content.
If we added or modified a file with one of the registered names (e.g., `A.json`), SewerRat will (re-)index that file.
Similarly, if we deleted a file, SewerRat will remove it from the index.
This ensures that the information in the index reflects the directory contents on the filesystem.
Users can also manually update a directory by repeating the process above to re-index the directory's contents.

As an aside: updates and symbolic links can occasionally interact in strange ways.
Updates and symbolic links can occasionally interact in strange ways.
Specifically, updates to the indexed information for symbolic links are based on the modification time of the link target.
One can imagine a pathological case where a symbolic link is changed to a different target with the same modification time as the previous target, which will not be captured by SewerRat.
Currently, this can only be resolved by deleting all affected symbolic links, re-registering the directory, and then restoring the links and re-registering again.
Expand All @@ -106,6 +102,21 @@ To remove files from the index, we use the same procedure as above but replacing
The only potential difference is when the caller requests deregistration of a directory that does not exist.
In this case, `/deregister/start` may return a `SUCCESS` status instead of `PENDING`, after which `/deregister/finish` does not need to be called.

### Other comments

If an error is encountered in the `/register/*` or `/deregister/*` endpoints, the response usually has the `application-json` content type.
The body encodes a JSON object with an `ERROR` status and a `reason` string property explaining the reason for the failure.
That said, some error types (e.g., 404, 405) may instead return a `text/plain` content type with the reason directly in the response body.

Any failure to parse specific JSON files is not considered an error and will only show up in the `comments` of a successful response from `/register/finish`.
This provides some robustness to partial writes or invalid files inside directories with complex internal structure.

Regardless of whether the registration is successful or not, the verification code file is no longer needed after a response is received.
This can be deleted from the directory to reduce clutter.

We provide some small utility functions from [`scripts/functions.sh`](scripts/functions.sh) to perform the registration from the command line.
The process should still be simple enough to implement equivalent functions in any language.

## Querying the index

### Making the request
Expand Down
51 changes: 42 additions & 9 deletions list_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -181,14 +181,34 @@ func TestListMetadata(t *testing.T) {
t.Fatal("unexpected file")
}
})
}

func TestListMetadataSymlink(t *testing.T) {
dir, err := os.MkdirTemp("", "")
if (err != nil) {
t.Fatalf("failed to create a temporary directory; %v", err)
}

path := filepath.Join(dir, "A.json")
err = os.WriteFile(path, []byte(""), 0644)
if err != nil {
t.Fatalf("failed to create a mock file; %v", err)
}

hostdir, err := os.MkdirTemp("", "")
hostpath := filepath.Join(hostdir, "B.json")
err = os.WriteFile(hostpath, []byte(""), 0644)
if err != nil {
t.Fatalf("failed to create a mock file; %v", err)
}

// Throwing in some symbolic links.
err = os.Symlink(path, filepath.Join(dir, "foo.json"))
if err != nil {
t.Fatal(err)
}

err = os.Symlink(subdir, filepath.Join(dir, "symlinked"))
err = os.Symlink(hostdir, filepath.Join(dir, "symlinked"))
if err != nil {
t.Fatal(err)
}
Expand Down Expand Up @@ -216,34 +236,47 @@ func TestListMetadata(t *testing.T) {
t.Fatal("expected file info from link target")
}
})
}

func TestListMetadataDot(t *testing.T) {
dir, err := os.MkdirTemp("", "")
if (err != nil) {
t.Fatalf("failed to create a temporary directory; %v", err)
}

path := filepath.Join(dir, "A.json")
err = os.WriteFile(path, []byte(""), 0644)
if err != nil {
t.Fatalf("failed to create a mock file; %v", err)
}

// Throwing in a hidden directory.
dotsubdir := filepath.Join(dir, ".git")
err = os.Mkdir(dotsubdir, 0755)
subdir := filepath.Join(dir, ".git")
err = os.Mkdir(subdir, 0755)
if err != nil {
t.Fatalf("failed to create a temporary subdirectory; %v", err)
}

dotsubpath2 := filepath.Join(dotsubdir, "B.json")
err = os.WriteFile(dotsubpath2, []byte(""), 0644)
subpath1 := filepath.Join(subdir, "A.json")
err = os.WriteFile(subpath1, []byte(""), 0644)
if err != nil {
t.Fatalf("failed to create a mock file; %v", err)
}

t.Run("symlink", func(t *testing.T) {
found, fails, err := listMetadata(dir, []string{ "B.json" })
t.Run("dot", func(t *testing.T) {
found, fails, err := listMetadata(dir, []string{ "A.json" })
if err != nil {
t.Fatal(err)
}
if len(fails) > 0 {
t.Fatal("unexpected failures")
}

// B.json in the linked directory should be ignored as we don't recurse into them.
// A.json in the subdirectory should be ignored as we don't recurse into dots.
if len(found) != 1 {
t.Fatal("expected exactly one file")
}
_, ok := found[filepath.Join(dir, "sub/B.json")]
_, ok := found[filepath.Join(dir, "A.json")]
if !ok {
t.Fatal("missing file")
}
Expand Down

0 comments on commit 32b8129

Please sign in to comment.