Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(index): reduce memory usage #55

Closed
cmdoret opened this issue Aug 19, 2024 · 2 comments · Fixed by #57
Closed

perf(index): reduce memory usage #55

cmdoret opened this issue Aug 19, 2024 · 2 comments · Fixed by #57
Assignees
Labels
enhancement New feature or request

Comments

@cmdoret
Copy link
Member

cmdoret commented Aug 19, 2024

The index is consuming too much memory in high load situations. It is currently a simple HashMap<String, String> mapping instance-> type. This implies high redundancy as #instances >> #types.

Objective: change index structure to optimize memory usage while keeping O(1) lookup time.

Proposal:

File format:

types: [Person, Account, Organization]
map:
  hash(urn:alice): 0
  hash(urn:acme):  2
  hash(urn:test-account): 1
  hash(urn:bob): 0

Struct:

//pseudocode

struct Index {
  types: Vec<str>,
  map: HashMap<[u8], usize>,
}

impl Index {
  fn get(&self, instance: &str) -> String {
    type_idx = self.map.get(instance.hash())
    
    return self.types[type_idx]
  }
}
@cmdoret
Copy link
Member Author

cmdoret commented Aug 19, 2024

Further optimizations are possible:

  • Allow usize types in Rules to support int comparison instead of String
    • e.g. by using enum Type {Str(String), Int(usize)}
  • Binary index format

@cmdoret cmdoret self-assigned this Aug 20, 2024
@cmdoret cmdoret added the enhancement New feature or request label Aug 20, 2024
@cmdoret
Copy link
Member Author

cmdoret commented Aug 20, 2024

Note: need to use map hash->enum {usize, Vec}
Also we can filter out unneeded types from the index at pseudonymization step

@cmdoret cmdoret linked a pull request Aug 26, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant