Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial work #1

Merged
merged 1 commit into from
Mar 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* @pavlospt
* @nikoshet
48 changes: 48 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: CI Pipeline

on:
pull_request:
branches:
- main

concurrency:
group: '${{ github.workflow }} @ ${{ github.head_ref || github.ref }}'
cancel-in-progress: true

jobs:
build:
name: cargo build
runs-on: ubuntu-latest
strategy:
fail-fast: true
matrix:
include:
- name: "library"
path: "."
- name: "client"
path: "rust-pgdatadiff-client"
steps:
- uses: actions/checkout@v4
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Cargo Build ${{ matrix.name }}
run: cargo build
working-directory: ${{ matrix.path }}
test:
name: cargo test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rust-lang/setup-rust-toolchain@v1
- run: cargo test --all
format-and-clippy:
name: Cargo format & Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- name: Rustfmt Check
uses: actions-rust-lang/rustfmt@v1
- name: Lint with Clippy
run: cargo clippy --all
30 changes: 30 additions & 0 deletions .github/workflows/git.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Git Checks

on: [pull_request]

jobs:
block-fixup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Block Fixup Commit Merge
uses: alexkappa/block-fixup-merge-action@v2
add-assignee:
runs-on: ubuntu-latest
steps:
- uses: actions/github-script@v7
with:
script: |
const issue = await github.rest.issues.get({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number
});
if (issue.data.assignees.length === 0) {
await github.rest.issues.addAssignees({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
assignees: [context.actor]
});
}
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,12 @@ Cargo.lock

# MSVC Windows builds of rustc generate these, which store debugging information
*.pdb


# Added by cargo

/target
.idea/
.DS_Store
postgres-data1/
postgres-data2/
39 changes: 39 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
[package]
name = "rust-pgdatadiff"
version = "0.1.2"
edition = "2021"
license = "MIT"
description = "Rust library for comparing two PostgreSQL databases"
readme = "README.md"
homepage = "https://github.com/pavlospt/rust-pgdatadiff"
repository = "https://github.com/pavlospt/rust-pgdatadiff"
keywords = ["postgres", "postgresql", "diff"]
documentation = "https://docs.rs/rust-pgdatadiff"

[dependencies]
anyhow = "1.0.81"
tokio = { version = "1.36.0", features = ["full"] }
sqlx = { version = "0.7", features = ["runtime-tokio", "tls-native-tls", "postgres"] }
colored = "2.1.0"
futures = { version = "0.3.30", default-features = true, features = ["async-await"] }
env_logger = "0.11.3"
log = "0.4.21"
async-trait = "0.1.77"
pretty_assertions = "1.4.0"

[dependencies.clap]
version = "4.5.2"
features = ["derive"]

[dev-dependencies]
mockall = "0.12.1"
tokio = { version = "1.36.0", features = ["rt-multi-thread", "macros"] }

[lib]
test = true
edition = "2021"
crate-type = ["lib"]
name = "rust_pgdatadiff"

[workspace]
members = ["rust-pgdatadiff-client"]
136 changes: 136 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Rust PGDataDiff

`rust-pgdatadiff` is a re-write of the Python version of [pgdatadiff](https://github.com/dmarkey/pgdatadiff)

## What makes it different?

* It is schema aware right from the get-go, as when we had to use the original
`pgdatadiff` we ended up having different schemas that we needed to perform checks on.

* It runs DB operations in a parallel fashion,
making it at least 3x faster in comparison to the original `pgdatadiff` which performs the checks sequentially.

* It is written in Rust, which means that it is memory safe and has a very low overhead.

* It provides both a library and a client, which means that you can use it as a standalone tool
and in your own projects.

_The benchmarks below are based on DBs with 5 tables and 1M rows each. The results are as follows:_

## Python (sequential)
![python-timings](images/python.png)

## Rust (parallel)
![rust-timings](images/rust.png)

# Installation (Client)

In case you want to use this as a client you can install it through `cargo`:

```shell
cargo install rust-pgdatadiff-client
```

# Installation (Library)

In case you want to use this as a library you can add it to your `Cargo.toml`:

```shell
cargo add rust-pgdatadiff
```

or

```toml
[dependencies]
rust-pgdatadiff = "0.1.0"
```

# Usage (Client)

```
Usage: rust-pgdatadiff-client diff [OPTIONS] <FIRST_DB> <SECOND_DB>

Arguments:
<FIRST_DB> postgres://postgres:postgres@localhost:5438/example
<SECOND_DB> postgres://postgres:postgres@localhost:5439/example

Options:
--only-tables Only compare data, exclude sequences
--only-sequences Only compare sequences, exclude data
--only-count Do a quick test based on counts alone
--chunk-size <CHUNK_SIZE> The chunk size when comparing data [default: 10000]
--max-connections <MAX_CONNECTIONS> Max connections for Postgres pool [default: 100]
-i, --include-tables [<INCLUDE_TABLES>...] Tables included in the comparison
-e, --exclude-tables [<EXCLUDE_TABLES>...] Tables excluded from the comparison
--schema-name <SCHEMA_NAME> Schema name [default: public]
-h, --help Print help
-V, --version Print version
```

# Usage (Library)

```rust
use rust_pgdatadiff::diff::diff_ops::Differ;
use rust_pgdatadiff::diff::diff_payload::DiffPayload;

#[tokio::main]
async fn main() -> Result<()> {
let first_db = "postgres://postgres:postgres@localhost:5438/example";
let second_db = "postgres://postgres:postgres@localhost:5439/example";

let payload = DiffPayload::new(
first_db.to_owned(),
second_db.to_owned(),
*only_data,
*only_sequences,
*only_count,
*chunk_size,
*max_connections,
included_tables.to_vec(),
schema_name.clone(),
);
let diff_result = Differ::diff_dbs(payload).await;
// Handle `diff_result` in any way it fits your use case
Ok(())
}
```

# Examples

You can spin up two databases already prefilled with data through Docker Compose.

```shell
docker compose up --build
```

Prefilled databases include a considerable amount of data + rows, so you can run benchmarks against them to check the
performance of it. You can modify a few of the generated data in order to see it in action.

You can find an example of using it as a library in the [`examples`](./examples) directory.

Run the example with the following command, after Docker Compose has started:

```shell
cargo run --example example_diff diff \
"postgresql://localhost:5438?dbname=example&user=postgres&password=postgres" \
"postgresql://localhost:5439?dbname=example&user=postgres&password=postgres"
```

You can also enable Rust related logs by exporting the following:

```shell
export RUST_LOG=rust_pgdatadiff=info
```

Switching from `info` to `debug` will give you more detailed logs. Also since we are utilizing
`sqlx` under the hood, you can enable `sqlx` logs by exporting the following:

```shell
export RUST_LOG=rust_pgdatadiff=info,sqlx=debug
```

# Authors

* [Pavlos-Petros Tournaris](https://github.com/pavlospt)
* [Nikolaos Nikitas](https://github.com/nikoshet)
Loading