-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counted Set #132
base: main
Are you sure you want to change the base?
Counted Set #132
Changes from 1 commit
7ec883e
88524bf
4375f7c
e43e7d2
0c5fd75
a379f8c
1c567ea
89fd08f
e77d82c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
The custom implementation of count adds the multiplicities of the elements together, which is far more efficient than the default approach of iterating through the set manually. underestimatedCount has also been customized: it exposes the count of the underlying dictionary, which is guaranteed to be less than or equal to the cardinality of the counted set itself. This may be used to avoid trapping for sets with very large multiplicities.
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,6 +75,15 @@ extension CountedSet: Collection { | |
return keyPair.key | ||
} | ||
|
||
/// The number of elements in the set. | ||
/// | ||
/// - Complexity: O(*k*), where *k* is the number of unique elements in the | ||
/// set. | ||
@inlinable | ||
public var count: Int { | ||
Int(rawValue.values.reduce(.zero, +)) | ||
} | ||
|
||
@inlinable | ||
public var startIndex: Index { | ||
Index(storageIndex: rawValue.startIndex, position: 0) | ||
|
@@ -84,4 +93,13 @@ extension CountedSet: Collection { | |
public var endIndex: Index { | ||
Index(storageIndex: rawValue.endIndex, position: 0) | ||
} | ||
|
||
/// A value equal to the number of unique elements in the set. | ||
/// | ||
/// - Complexity: O(*k*), where *k* is the number of unique elements in the | ||
/// set. | ||
@inlinable | ||
public var underestimatedCount: Int { | ||
rawValue.count | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is neat, but if we decide to add a running total instead of using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we change it to reference By the way, would it be possible to add a more efficient conditional implementation of Swift Algorithms’ Unique methods for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a good idea in general, and
The way to do this right now would require Algorithms to import Collections, which would be a bad idea. However, Swift does have support for cross-module overlays, which are modules that automatically get loaded whenever some module A and module B are both imported to a source file. (Apple's SDKs use this to add, e.g., MapKit-related functionality to SwiftUI when a file imports both of these modules.) If this functionality was cleaned up & promoted to a full-blown language feature, including SwiftPM support for defining cross-import modules, then we could use that to provide uniquing overloads that take CountedSets. Another alternative is to define a set of additional collection protocols in a separate package, and have swift-collections and swift-algorithms both depend on that. This would let swift-algorithms provide generic overloads for specific algorithms to speed them up when possible. I don't think we need to do this for CountedMultiset right now, but it's something to think about for later. (FWIW, I think we'll eventually want to define some sort of a dictionary protocol at least, and possibly a UniqueCollection protocol.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, the easiest way is to simply provide the uniquing methods directly on the counted multiset type, without importing Algorithms. This would work, but perhaps it would be even better to provide a (I'm not sure we need to do this though -- we are already exposing the storage dictionary, after all, so folks can simply access |
||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine per the Collection protocol, but it seems like a missed opportunity.
What if we kept a running count of items in this set, updating it on every mutation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing the count would mean this is no longer an implicit data structure. I’m not really sure that’s a worthwhile tradeoff.
Is there precedent for doing that with similar data structures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh definitely --
Set
andDictionary
do it, for example, to prevent having to scan their bitmap to figure out their element count.I don't think having to store one extra integer matters much in this case -- the hash table in the underlying dictionary already comes with way too much memory overhead to call this a truly implicit data structure. 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it was an implicit data structure for a dictionary, but okay then.