Skip to content

Commit

Permalink
Initial work
Browse files Browse the repository at this point in the history
  • Loading branch information
pavlospt authored and nikoshet committed Mar 14, 2024
1 parent a99ea00 commit fbcce3f
Show file tree
Hide file tree
Showing 39 changed files with 2,877 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* @pavlospt
* @nikoshet
51 changes: 51 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: CI Pipeline

on:
push:
pull_request:
branches:
- main

concurrency:
group: '${{ github.workflow }} @ ${{ github.head_ref || github.ref }}'
cancel-in-progress: true

jobs:
build-library:
name: build library
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Cargo Build
run: |
cargo build
build-client:
name: build client
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Cargo Build
working-directory: "rust-pgdatadiff-client"
run: |
cargo build
test:
name: cargo test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rust-lang/setup-rust-toolchain@v1
- run: cargo test --all
format:
name: cargo fmt && cargo clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- name: Rustfmt Check
uses: actions-rust-lang/rustfmt@v1
- name: Lint with Clippy
run: cargo clippy --all
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,12 @@ Cargo.lock

# MSVC Windows builds of rustc generate these, which store debugging information
*.pdb


# Added by cargo

/target
.idea/
.DS_Store
postgres-data1/
postgres-data2/
39 changes: 39 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
workspace = { members = ["rust-pgdatadiff-client"] }
[package]
name = "rust-pgdatadiff"
version = "0.1.1"
edition = "2021"
license = "MIT"
description = "Rust library for comparing two PostgreSQL databases"
readme = "README.md"
homepage = "https://github.com/pavlospt/rust-pgdatadiff"
repository = "https://github.com/pavlospt/rust-pgdatadiff"
keywords = ["postgres", "postgresql", "diff"]
documentation = "https://docs.rs/rust-pgdatadiff"

[dependencies]
anyhow = "1.0.81"
tokio = { version = "1.36.0", features = ["full"] }
sqlx = { version = "0.7", features = ["runtime-tokio", "tls-native-tls", "postgres"] }
colored = "2.1.0"
futures = { version = "0.3.30", default-features = true, features = ["async-await"] }
env_logger = "0.11.3"
log = "0.4.21"
async-trait = "0.1.77"
pretty_assertions = "1.4.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }

[dependencies.clap]
version = "4.5.2"
features = ["derive"]

[dev-dependencies]
mockall = "0.12.1"
tokio = { version = "1.36.0", features = ["rt-multi-thread", "macros"] }

[lib]
test = true
edition = "2021"
crate-type = ["lib"]
name = "rust_pgdatadiff"
133 changes: 133 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Rust PGDataDiff

`rust-pgdatadiff` is a re-write of the Python version of [pgdatadiff](https://github.com/dmarkey/pgdatadiff)

## What makes it different?

* It is schema aware right from the get-go, as when we had to use the original
`pgdatadiff` we ended up having different schemas that we needed to perform checks on.

* It runs DB operations in a parallel fashion,
making it at least 3x faster in comparison to the original `pgdatadiff` which performs the checks sequentially.

* It is written in Rust, which means that it is memory safe and has a very low overhead.

* It provides both a library and a client, which means that you can use it as a standalone tool
and in your own projects.

### Python (sequential)
![python-timings](images/python.png)

### Rust (parallel)
![rust-timings](images/rust.png)

## Installation (Client)

In case you want to use this as a client you can install it through `cargo`:

```shell
cargo install rust-pgdatadiff-client
```

## Installation (Library)

In case you want to use this as a library you can add it to your `Cargo.toml`:

```shell
cargo add rust-pgdatadiff
```

or

```toml
[dependencies]
rust-pgdatadiff = "0.1.0"
```

## Usage (Client)

```
Usage: rust-pgdatadiff-client diff [OPTIONS] <FIRST_DB> <SECOND_DB>
Arguments:
<FIRST_DB> postgres://postgres:postgres@localhost:5438/example
<SECOND_DB> postgres://postgres:postgres@localhost:5439/example
Options:
--only-data Only compare data, exclude sequences
--only-sequences Only compare sequences, exclude data
--only-count Do a quick test based on counts alone
--chunk-size <CHUNK_SIZE> The chunk size when comparing data [default: 10000]
--max-connections <MAX_CONNECTIONS> Max connections for Postgres pool [default: 100]
-i, --included-tables [<INCLUDED_TABLES>...] Tables included in the comparison
--schema-name <SCHEMA_NAME> Schema name [default: public]
-h, --help Print help
-V, --version Print version
```

## Usage (Library)

```rust
use rust_pgdatadiff::diff::diff_ops::Differ;
use rust_pgdatadiff::diff::diff_payload::DiffPayload;

#[tokio::main]
async fn main() -> Result<()> {
let first_db = "postgres://postgres:postgres@localhost:5438/example";
let second_db = "postgres://postgres:postgres@localhost:5439/example";

let payload = DiffPayload::new(
first_db.to_owned(),
second_db.to_owned(),
*only_data,
*only_sequences,
*only_count,
*chunk_size,
*max_connections,
included_tables.to_vec(),
schema_name.clone(),
);
let diff_result = Differ::diff_dbs(payload).await;
// Handle `diff_result` in any way it fits your use case
Ok(())
}
```

## Examples

You can spin up two databases already prefilled with data through Docker Compose.

```shell
docker compose up --build
```

Prefilled databases include a considerable amount of data + rows, so you can run benchmarks against them to check the
performance of it. You can modify a few of the generated data in order to see it in action.

You can find an example of using it as a library in the [`examples`](./examples) directory.

Run the example with the following command, after Docker Compose has started:

```shell
cargo run --example example_diff diff \
"postgresql://localhost:5438?dbname=example&user=postgres&password=postgres" \
"postgresql://localhost:5439?dbname=example&user=postgres&password=postgres"
```

You can also enable Rust related logs by exporting the following:

```shell
export RUST_LOG=rust_pgdatadiff=info
```

Switching from `info` to `debug` will give you more detailed logs. Also since we are utilizing
`sqlx` under the hood, you can enable `sqlx` logs by exporting the following:

```shell
export RUST_LOG=rust_pgdatadiff=info,sqlx=debug
```

# Authors

* [Pavlos-Petros Tournaris](https://github.com/pavlospt)
* [Nikolaos Nikitas](https://github.com/nikoshet)
Loading

0 comments on commit fbcce3f

Please sign in to comment.