-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge new find_with_marked intp perf v2 #7385
Merge new find_with_marked intp perf v2 #7385
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also faster on my machine, feel free to merge.
…:realm/realm-core into fsa/experimental-find-first-optimization
@@ -791,7 +790,11 @@ constexpr uint32_t inverse_width[65] = { | |||
|
|||
inline int first_field_marked(int width, uint64_t vector) | |||
{ | |||
#if REALM_WINDOWS | |||
int lz = (int)_tzcnt_u64(vector); // TODO: not clear if this is ok on all platforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@finnschiermer this is a tmp fix, just to please the builders.
Some small bug to address in object store for frozen objects, but we can merge this and I will fix it later. Perf wise we are closer to master (worst path ~+50% , best path -20%), but we are not using a valid data set and neither we are implementing any heuristic. |
* idea: subword parallel search * better subword search * better naming * new methods for reading unaligned word from array of bitfields * perf work on array with find based on parallel values comparison * major cleanup of bitfield scanning * de-templatified bit field search * more tests and code generalization * more tests * new iterator optimized for linear scan * eliminated last use of templates in subword parallel search * optimization of some subword search methods * working EQ cmp with parallel subword check * fix in all_fields_NE * make populate handle negative values * commented out bypass which disabled subword search * fix in fix of populate() * bugfix and direct methods for signed GT and GE * fix for GT condition * enabled array perf tests (outside debug mode) * fixed inner search loop * made some perf tests non concurrent and silenced warnings * moved call to match() into inner loop in subword parallel search * Perf v2, find_with_marked for packed interger arrays (#7385) * made find_first_marked() branch free * various optimizations of find_first_marked, best one selected * for some reason this is much bettergit add . * no warnings * made search method selection more explicit and clear * bunch of fixes.. * restore subword loop * fix object store tests + use subword cmp always (which is faster on my machine) --------- Co-authored-by: Finn Schiermer Andersen <[email protected]> * Perf work for array flex (still missing timestamps) (#7397) * WIP perf work for array flex * more small stuff, nothing important * parallel subword for eq and neq * move find parallel inside loop for eq and neq * LT parallel subword cmp * GT find for array flex * Int equality as good as Packed * code review --------- Co-authored-by: Finn Schiermer Andersen <[email protected]> Co-authored-by: Finn Schiermer Andersen <[email protected]>
No description provided.