Skip to content

Commit

Permalink
Isolate database queries from one-another (#235)
Browse files Browse the repository at this point in the history
* A version that won't build

* Uncomment

* Newline

* Newline

* Compiles

* No need for only temp

* Compiles AND runs

* Move toward nsjail

* Make room for nsjail, but still as a noop

* Bring in #234

* Works end-to-end (need to implement 'touch' for new DBs)

* Create DB file in create_database

* Move isolated runner into original crate as second binary, dynamically determine path to it

* Remove hosted_db_runner

* Move nsjail builder to scripts dir

* fmt

* tokio typo

* New AybError variants

* Code review part 1

* Update docs, remove binary, add nsjail build step

* Testing docs and fmt

* Fix build command

* nsjail requirements

* More nsjail requirements

* Docs cleanup

* Clippy and code review

* Warn if not fully isolated

* Clean up for clarity
  • Loading branch information
marcua authored Dec 28, 2023
1 parent 7bcd087 commit 785d09f
Show file tree
Hide file tree
Showing 18 changed files with 341 additions and 18 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,5 +37,9 @@ jobs:
run: cargo fmt --check
- name: Ensure clippy finds no issues
run: cargo clippy
- name: Install nsjail requirements
run: sudo apt-get install -y libprotobuf-dev protobuf-compiler libnl-route-3-dev
- name: Build nsjail
run: scripts/build_nsjail.sh && mv nsjail tests/
- name: Run tests
run: cargo test --verbose
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@ tests/ayb_data_postgres
tests/ayb_data_sqlite
tests/smtp_data_10025
tests/smtp_data_10026
tests/nsjail
13 changes: 11 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ description = "ayb makes it easy to create, host, and share embedded databases l
homepage = "https://github.com/marcua/ayb"
documentation = "https://github.com/marcua/ayb#readme"
license = "Apache-2.0"
default-run = "ayb"

[dependencies]
actix-web = { version = "4.4.0" }
Expand All @@ -19,14 +20,14 @@ fernet = { version = "0.2.1" }
lettre = { version = "0.10.4", features = ["tokio1-native-tls"] }
quoted_printable = { version = "0.5.0" }
reqwest = { version = "0.11.22", features = ["json"] }
rusqlite = { version = "0.27.0", features = ["bundled"] }
rusqlite = { version = "0.27.0", features = ["bundled", "limits"] }
regex = { version = "1.10.2"}
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0.108" }
serde_repr = { version = "0.1.17" }
sqlx = { version = "0.6.3", features = ["runtime-actix-native-tls", "postgres", "sqlite"] }
toml = { version = "0.8.8" }
tokio = { version = "1.35.1", features = ["macros", "rt"] }
tokio = { version = "1.35.1", features = ["macros", "process", "rt"] }
prefixed-api-key = { version = "0.1.0", features = ["sha2"]}
prettytable-rs = { version = "0.10.0"}
urlencoding = { version = "2.1.3" }
Expand All @@ -36,3 +37,11 @@ url = { version = "2.5.0", features = ["serde"] }
[dev-dependencies]
assert_cmd = "2.0"
assert-json-diff = "2.0.2"

[[bin]]
name = "ayb"
path = "src/bin/ayb.rs"

[[bin]]
name = "ayb_isolated_runner"
path = "src/bin/ayb_isolated_runner.rs"
52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,58 @@ $ curl -w "\n" -X POST http://127.0.0.1:5433/v1/marcua/test.sqlite/query -H "aut
{"fields":["name","score"],"rows":[["PostgreSQL","10"],["SQLite","9"],["DuckDB","9"]]}
```

### Isolation
`ayb` allows multiple users to run queries against databases that are
stored on the same machine. Isolation enables you to prevent one user
from accessing another user's data, and allows you to restrict the
resources any one user is able to utilize.

By default, `ayb` uses
[SQLITE_DBCONFIG_DEFENSIVE](https://www.sqlite.org/c3ref/c_dbconfig_defensive.html)
flag and sets
[SQLITE_LIMIT_ATTACHED](https://www.sqlite.org/c3ref/c_limit_attached.html#sqlitelimitattached)
to `0` in order to prevent users from corrupting the database or
attaching to other databases on the filesystem.

For further isolation, `ayb` uses [nsjail](https://nsjail.dev/) to
isolate each query's filesystem access and resources. When this form
of isolation is enabled, `ayb` starts a new `nsjail`-managed process
to execute the query against the database. We have not yet benchmarked
the performance overhead of this approach.

To enable isolation, you must first build `nsjail`, which you can do
through [scripts/build_nsjail.sh](scripts/build_nsjail.sh). Note that
`nsjail` depends on a few other packages. If you run into issues
building it, it might be helpful to see its
[Dockerfile](https://github.com/google/nsjail/blob/master/Dockerfile)
to get a sense of those requirements.

Once you have a path to the
`nsjail` binary, add the following to your `ayb.toml`:

```toml
[isolation]
nsjail_path = "path/to/nsjail"
```

## Testing
`ayb` is largely tested through [end-to-end
tests](tests/e2e.rs) that mimic as realistic an environment as
possible. Individual modules may also provide more specific unit
tests. To run the tests, type:

```bash
cargo test --verbose
```

Because the tests cover [isolation](#isolation), an `nsjail` binary is
required for running the end-to-end tests. To build and place `nsjail`
in the appropriate directory, run:

```bash
scripts/build_nsjail.sh && mv nsjail tests/
```

## FAQ

### Who is `ayb` for?
Expand Down
8 changes: 8 additions & 0 deletions scripts/build_nsjail.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash

git clone https://github.com/google/nsjail.git nsjail-checkout
cd nsjail-checkout
make
mv nsjail ..
cd ..
rm -rf nsjail-checkout
File renamed without changes.
23 changes: 23 additions & 0 deletions src/bin/ayb_isolated_runner.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
use ayb::hosted_db::sqlite::query_sqlite;
use std::env;
use std::path::PathBuf;

/// This binary runs a query against a database and returns the
/// result in QueryResults format. To run it, you would type:
/// $ ayb_isolated_runner database.sqlite SELECT xyz FROM ...
///
/// This command is meant to be run inside a sandbox that isolates
/// parallel invocations of the command from accessing each
/// others' data, memory, and resources. That sandbox can be found
/// in src/hosted_db/sandbox.rs.
fn main() -> Result<(), serde_json::Error> {
let args: Vec<String> = env::args().collect();
let db_file = &args[1];
let query = (args[2..]).to_vec();
let result = query_sqlite(&PathBuf::from(db_file), &query.join(" "));
match result {
Ok(result) => println!("{}", serde_json::to_string(&result)?),
Err(error) => eprintln!("{}", serde_json::to_string(&error)?),
}
Ok(())
}
15 changes: 11 additions & 4 deletions src/hosted_db.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
pub mod paths;
mod sqlite;
mod sandbox;
pub mod sqlite;

use crate::ayb_db::models::DBType;
use crate::error::AybError;
use crate::hosted_db::sqlite::run_sqlite_query;
use crate::hosted_db::sqlite::potentially_isolated_sqlite_query;
use crate::http::structs::AybConfigIsolation;
use prettytable::{format, Cell, Row, Table};
use serde::{Deserialize, Serialize};
use std::path::PathBuf;
Expand Down Expand Up @@ -53,9 +55,14 @@ impl QueryResult {
}
}

pub fn run_query(path: &PathBuf, query: &str, db_type: &DBType) -> Result<QueryResult, AybError> {
pub async fn run_query(
path: &PathBuf,
query: &str,
db_type: &DBType,
isolation: &Option<AybConfigIsolation>,
) -> Result<QueryResult, AybError> {
match db_type {
DBType::Sqlite => Ok(run_sqlite_query(path, query)?),
DBType::Sqlite => Ok(potentially_isolated_sqlite_query(path, query, isolation).await?),
_ => Err(AybError::Other {
message: "Unsupported DB type".to_string(),
}),
Expand Down
17 changes: 13 additions & 4 deletions src/hosted_db/paths.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,22 @@ pub fn database_path(
entity_slug: &str,
database_slug: &str,
data_path: &str,
create_database: bool,
) -> Result<PathBuf, AybError> {
let mut path: PathBuf = [data_path, entity_slug].iter().collect();
if let Err(e) = fs::create_dir_all(&path) {
return Err(AybError::Other {
message: format!("Unable to create entity path for {}: {}", entity_slug, e),
});
if create_database {
if let Err(e) = fs::create_dir_all(&path) {
return Err(AybError::Other {
message: format!("Unable to create entity path for {}: {}", entity_slug, e),
});
}
}

path.push(database_slug);

if create_database && !path.exists() {
fs::File::create(path.clone())?;
}

Ok(path)
}
146 changes: 146 additions & 0 deletions src/hosted_db/sandbox.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
/* Retrieved and modified from
https://raw.githubusercontent.com/Defelo/sandkasten/83f629175d02ebc70fbb16b8b9e05663ea67ccc7/src/sandbox.rs
On December 6, 2023.
Original license:
MIT License
Copyright (c) 2023 Defelo
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/

use crate::error::AybError;
use serde::{Deserialize, Serialize};
use std::env::current_exe;
use std::fs::canonicalize;
use std::{
path::{Path, PathBuf},
process::Stdio,
};
use tokio::io::{AsyncReadExt, BufReader};

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct RunResult {
/// The exit code of the processes.
pub status: i32,
/// The stdout output the process produced.
pub stdout: String,
/// The stderr output the process produced.
pub stderr: String,
}

pub async fn run_in_sandbox(
nsjail: &Path,
db_path: &PathBuf,
query: &str,
) -> Result<RunResult, AybError> {
let mut cmd = tokio::process::Command::new(nsjail);

cmd.arg("--really_quiet") // log fatal messages only
.arg("--iface_no_lo")
.args(["--mode", "o"]) // run once
.args(["--hostname", "ayb"])
.args(["--bindmount_ro", "/lib:/lib"])
.args(["--bindmount_ro", "/lib64:/lib64"])
.args(["--bindmount_ro", "/usr:/usr"]);

// Set resource limits for the process. In the future, we will
// allow entities to control the resources they dedicate to
// different databases/queries.
cmd.args(["--mount", "none:/tmp:tmpfs:size=100000000"]) // ~95 MB tmpfs
.args(["--max_cpus", "1"]) // One CPU
.args(["--rlimit_as", "64"]) // 64 MB memory limit
.args(["--time_limit", "10"]) // 10 second maximum run
.args(["--rlimit_fsize", "75"]) // 75 MB file size limit
.args(["--rlimit_nofile", "10"]) // 10 files maximum
.args(["--rlimit_nproc", "2"]); // 2 processes maximum

// Generate a /local/path/to/file:/tmp/file mapping.
let absolute_db_path = canonicalize(db_path)?;
let db_file_name = absolute_db_path
.file_name()
.ok_or(AybError::Other {
message: format!(
"Could not parse file name from path: {}",
absolute_db_path.display()
),
})?
.to_str()
.ok_or(AybError::Other {
message: format!(
"Could not convert path to string: {}",
absolute_db_path.display()
),
})?;
let tmp_db_path = Path::new("/tmp").join(db_file_name);
let db_file_mapping = format!("{}:{}", absolute_db_path.display(), tmp_db_path.display());
cmd.args(["--bindmount", &db_file_mapping]);

// Generate a /local/path/to/ayb_isolated_runner:/tmp/ayb_isolated_runner mapping.
// We assume `ayb` and `ayb_isolated_runner` will always be in the same directory,
// so we see what the path to the current `ayb` executable is to build the path.
let ayb_path = current_exe()?;
let isolated_runner_path = ayb_path
.parent()
.ok_or(AybError::Other {
message: format!(
"Unable to find parent directory of ayb from {}",
ayb_path.display()
),
})?
.join("ayb_isolated_runner");
cmd.args([
"--bindmount_ro",
&format!(
"{}:/tmp/ayb_isolated_runner",
isolated_runner_path.display()
),
]);

let mut child = cmd
.arg("--")
.arg("/tmp/ayb_isolated_runner")
.arg(tmp_db_path)
.arg(query)
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()?;

let mut stdout_reader = BufReader::new(child.stdout.take().unwrap());
let mut stderr_reader = BufReader::new(child.stderr.take().unwrap());

let output = child.wait_with_output().await?;

// read stdout and stderr from process
let mut stdout = Vec::new();
let mut stderr = Vec::new();
stdout_reader.read_to_end(&mut stdout).await?;
stderr_reader.read_to_end(&mut stderr).await?;
let stdout = String::from_utf8_lossy(&stdout).into_owned();
let stderr = String::from_utf8_lossy(&stderr).into_owned();

Ok(RunResult {
status: output.status.code().ok_or(AybError::Other {
message: "Process exited with signal".to_string(),
})?,
stdout,
stderr,
})
}
Loading

0 comments on commit 785d09f

Please sign in to comment.