Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace BloomFilter underlying BitSet with cats-collections immutable BitSet #774

Closed
wants to merge 5 commits into from

Conversation

regadas
Copy link
Collaborator

@regadas regadas commented Jan 19, 2020

See #716 for context.

Update: #716 (comment)

@regadas regadas requested a review from johnynek May 7, 2020 20:21
@johnynek
Copy link
Collaborator

johnynek commented May 7, 2020

If we are going to make a new implementation, I'd like to explore a pattern @non and I have found for parameterized monoids:

final class BloomFilter[A](hashes: Int, width: Int)(implicit val hasher: Hasher[A]) {
  sealed class BF[A] {
    def mightContain(a: A): Boolean
  }

  final case class BFInstance[A](toValue: A) extends BF[A]
  final case class BFSet(toBitSet: BitSet) extends BF[A]
  // ...

  def apply(a: A): BF[A] = ...

  implicit val monoid: Monoid[BF[A]] = ...
}

Then you use it with:

val bloomFilter: BloomFilter[A] = new BloomFilter(4, 1024)

val itemA1: bloomFilter.BF[A] = bloomFilter(a1)
val itemA2: bloomFilter.BF[A] = bloomFilter(a2)

val a3 = Monoid.plus(itemA1, itemA2)

Now, the type system prevents us from confusing bloom filter values from different parameter sets. Also, the serializers can be inside BloomFilter which knows the parmeters, so you don't need to serialize the parameters with each value.

@johnynek
Copy link
Collaborator

johnynek commented May 7, 2020

what do you think?

@regadas
Copy link
Collaborator Author

regadas commented May 7, 2020

Sounds good! ... I've been meaning to pick this up again! I'll explore that!

A few notes around this PR:

  • I moved the current impl to a legacy package ... I'll probably move it back (keep things as is) and add the new impl in a new package. (it will be opt in)

  • I really think the BitSet impl as potential and as I mentioned in the comment, perf wise is on par and slightly better in some cases. When it comes to createBloomFilterUsingFold and createBloomFilterAggregator from More benchmarks for creating bloom filters #672 I saw significant improvements as well.

  • I think you and @non already did some ground work in Improvements to BitSet performance and benchmarks typelevel/cats-collections#184 some things in this impl are a mix of that with the impl from master. Not sure what the move is ... if we cleanup that impl and merge it (we would depend on cats-collections) or we copy the impl (this gives us more freedom to tap into private mutable methods when needed to)

Copy link
Collaborator Author

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing this in favour of #840

@regadas regadas closed this Jul 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants