Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(rules): config structure #47

Merged
merged 20 commits into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 15 additions & 34 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,7 @@ There are three possible ways to pseudonymize RDF triples:
2. Pseudonymize values for specific subject-predicate combinations.
3. Pseudonymize any value for a given predicate.

By using all three ways together, we're able to get an RDF file with sensitive
information:

<details>
<summary><b>Click to show input</b></summary>
By combining these, can process an RDF file with sensitive information:

```ntriples
<http://example.org/Alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
Expand All @@ -40,15 +36,12 @@ information:
<http://example.org/Bank> <http://schema.org/name> "Bank" .
```

</details>

And pseudonymize the sensitive information such as people's names, personal and
secret information while keeping the rest as is:
into a pseudonymized file where the sensitive information such as people's names, personal and
secret information is hashed to protect privacy:

<details>
<summary><b>Click to show output</b></summary>

```
```ntriples
<http://example.org/af321bbc> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/af321bbc> <http://xmlns.com/foaf/0.1/holdsAccount> <http://example.org/bs2313bc> .
<http://example.org/bs2313bc> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/OnlineAccount> .
Expand All @@ -59,55 +52,49 @@ secret information while keeping the rest as is:
<http://example.org/Bank> <http://schema.org/name> "Bank" .
```

</details>

The next subsections break down each of the three pseudonymization approaches to
better understand how they operate.

### 1. Pseudonymize the URI of nodes with `rdf:type`

<details>
<summary><b>Click to show</b></summary>

Given the following config:

```yaml
replace_uri_of_nodes_with_type:
subjects:
of_type:
- "http://xmlns.com/foaf/0.1/Person"
```

The goal is to pseudonymize all instaces of `rdf:type` Person. The following
input file:

```
```ntriples
<http://example.org/Alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
```

Would become:

```
```ntriples
<http://example.org/af321bbc> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
```

</details>

### 2. Pseudonymize values for specific subject-predicate combinations

<details>
<summary><b>Click to show</b></summary>

Given the following config:

```yaml
replace_values_of_subject_predicate:
"http://xmlns.com/foaf/0.1/Person":
objects:
on_type_predicate:
"http://xmlns.com/foaf/0.1/Person":
- "http://schema.org/name"
```

The goal is to pseudonymize only the instances of names when they're associated
to Person. The following input file:

```
```ntriples
<http://example.org/Alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://schema.org/name> "Alice" .
<http://example.org/Bank> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Organization> .
Expand All @@ -116,19 +103,15 @@ to Person. The following input file:

Would become:

```
```ntriples
<http://example.org/Alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://schema.org/name> "af321bbc" .
<http://example.org/Bank> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Organization> .
<http://example.org/Bank> <http://schema.org/name> "Bank" .
```

</details>

### 3. Pseudonymize any value for a given predicate

<details>
<summary><b>Click to show</b></summary>

Given the following config:

Expand All @@ -140,7 +123,7 @@ replace_value_of_predicate:
The goal is to pseudonymize any values associated to name. The following input
file:

```
```ntriples
<http://example.org/Alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://schema.org/name> "Alice" .
<http://example.org/Bank> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Organization> .
Expand All @@ -149,11 +132,9 @@ file:

Would become:

```
```ntriples
<http://example.org/Alice> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://example.org/Alice> <http://schema.org/name> "af321bbc" .
<http://example.org/Bank> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Organization> .
<http://example.org/Bank> <http://schema.org/name> "38a3dd71" .
```

</details>
File renamed without changes.
14 changes: 4 additions & 10 deletions src/main.rs
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
// Define the module.
mod crypto;
mod index;
mod io;
mod log;
mod model;
mod pass_first;
mod pass_second;
mod pseudo;
mod rdf_types;
mod rules;

// Define the imports.
use crate::{
index::create_type_map,
log::{create_logger, info},
pass_first::create_type_map,
pass_second::pseudonymize_graph,
pseudo::pseudonymize_graph,
};

use clap::{Args, Parser, Subcommand};
Expand Down Expand Up @@ -51,11 +51,6 @@ struct PseudoArgs {
#[arg(default_value = "-")]
input: PathBuf,

/// Invert the matching rules for the subject and the object.
/// Disabled by default
#[arg(short = 'v', long)]
invert_match: bool,

/// The config file descriptor to use for defining RDF elements to pseudonymize.
/// Format: yaml
#[arg(short, long)]
Expand Down Expand Up @@ -103,7 +98,6 @@ fn main() {
&args.output,
&args.index,
&args.secret,
&args.invert_match,
)
}
}
Expand Down
34 changes: 2 additions & 32 deletions src/pass_second.rs → src/pseudo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,35 +10,10 @@ use crate::{
crypto::{new_pseudonymizer, Pseudonymize},
io,
log::Logger,
model::TripleMask,
rdf_types::*,
rules::{
match_predicate_rule, match_subject_predicate_rule, match_type_rule_object,
match_type_rule_subject, Rules,
},
rules::{match_rules, Rules},
};

fn match_rules(
triple: Triple,
rules: &Rules,
type_map: &HashMap<String, String>,
invert_match: &bool,
) -> TripleMask {
// Check each field of the triple against the rules
let mut mask = TripleMask::default();

mask = match_type_rule_subject(&triple.subject, mask, type_map, rules);
mask = match_type_rule_object(&triple.object, mask, type_map, rules);
mask = match_predicate_rule(&triple.predicate, mask, rules);
mask = match_subject_predicate_rule(&triple.subject, &triple.predicate, mask, type_map, rules);

if *invert_match {
mask = mask.invert();
}

return mask;
}

// mask and encode input triple
// NOTE: This will need the type-map to perform masking
fn process_triple(
Expand All @@ -47,9 +22,8 @@ fn process_triple(
node_to_type: &HashMap<String, String>,
out: &mut impl Write,
hasher: &dyn Pseudonymize,
invert_match: &bool,
) {
let mask = match_rules(triple.clone(), rules_config, node_to_type, invert_match);
let mask = match_rules(&triple, rules_config, node_to_type);

let r = || -> std::io::Result<()> {
out.write_all(hasher.pseudo_triple(&triple, mask).to_string().as_bytes())?;
Expand Down Expand Up @@ -86,7 +60,6 @@ pub fn pseudonymize_graph(
output: &Path,
index: &Path,
secret_path: &Option<PathBuf>,
invert_match: &bool,
) {
let buf_input = io::get_reader(input);
let buf_index = io::get_reader(index);
Expand All @@ -110,7 +83,6 @@ pub fn pseudonymize_graph(
&node_to_type,
&mut buf_output,
&pseudonymizer,
invert_match,
);
Result::<(), TurtleError>::Ok(())
})
Expand Down Expand Up @@ -139,15 +111,13 @@ mod tests {
let output_path = dir.path().join("output.nt");
let type_map_path = Path::new("tests/data/type_map.nt");
let key = None;
let invert_match = false;
pseudonymize_graph(
&logger,
&input_path,
&config_path,
&output_path,
&type_map_path,
&key,
&invert_match,
);
}
}
Loading