-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy path0920R2_precalculated_hash_values_in_lookup.bs
370 lines (316 loc) · 13.6 KB
/
0920R2_precalculated_hash_values_in_lookup.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
<pre class='metadata'>
Title: Precalculated hash values in lookup
Shortname: P0920
Revision: 2
Status: P
Date: 2019-02-22
Group: WG21
Audience: LWG
URL: http://wg21.link/P0920r2
Editor: Mateusz Pusz, Epam Systems http://www.epam.com, [email protected], https://www.train-it.eu
Abstract: This proposal extends the interface of unordered containers with the member function overloads that
have one additional argument taking a precalculated hash value for the value being queried.
Repository: mpusz/wg21_papers
!Source: <a href="https://github.com/mpusz/wg21_papers/blob/master/src/0920_precalculated_hash_values_in_lookup.bs">github.com/mpusz/wg21_papers/blob/master/src/0920_precalculated_hash_values_in_lookup.bs</a>
Markup Shorthands: markdown on
</pre>
Motivation and Scope {#motivation-and-scope}
============================================
In business scenarios it often happens that we have to look for the same keyword in more than one container at a time.
Doing that is expensive right now as it forces hash value recalculation on every lookup.
With the changes proposed by this paper the following code will calculate the hash value only once per each run
of the function `update()`:
```cpp
std::array<std::unordered_map<std::string, int>, array_size> maps;
void update(const std::string& user)
{
const auto hash = maps.front().hash_function()(user);
for(auto& m : maps) {
auto it = m.find(user, hash);
// ...
}
}
```
Prior Work {#prior-work}
========================
Proposed feature was implemented in the [tsl::hopscotch_map](https://github.com/Tessil/hopscotch-map) and proved
to deliver significant performance improvements.
Impact On The Standard {#impact}
================================
This proposal modifies the unordered associative containers in `<unordered_map>` and `<unordered_set>` by
overloading the lookup member functions with member function templates having one additional parameter.
There are no language changes.
All existing C++17 code is unaffected.
Considered Alternatives {#alternatives}
=======================================
Stateful hash object {#stateful-hash}
-------------------------------------
Similar, although a bit slower, behavior can be obtained with usage of a stateful hash object that introduces additional
branch on every lookup:
```cpp
template<typename Key, typename Hash>
struct hash_cache {
inline static std::pair<Key, std::size_t> cache;
size_t operator()(const Key& k) const
{
std::size_t val{};
if (k != cache.first) {
cache.first = k;
cache.second = Hash()(k);
}
val = cache.second;
return val;
}
};
```
However, the case complicates in a multithreaded environment where synchronization has to be introduced to
such a `hash_cache_sync` helper class:
```cpp
template<typename Key, typename Hash>
struct hash_cache_sync {
inline static std::mutex m;
inline static std::pair<Key, std::size_t> cache;
size_t operator()(const Key& k) const
{
std::size_t val{};
{
std::scoped_lock lock(m);
if (k != cache.first) {
cache.first = k;
cache.second = Hash()(k);
}
val = cache.second;
}
return val;
}
};
```
Such synchronization nearly negates all benefits of having a cache.
Another problem with that solution happens in the case of the heterogeneous lookup introduced by [[!p0919r3]]:
```cpp
struct string_hash {
using transparent_key_equal = std::equal_to<>;
std::pair<???, std::size_t> cache;
std::size_t operator()(std::string_view txt) const;
std::size_t operator()(const std::string& txt) const;
std::size_t operator()(const char* txt) const;
};
```
In such a case there is no one good `Key` type to be used for storage in a cache. Additional conversions and object
constructions will always be involved which negates all benefits of having the heterogeneous lookup feature.
Proposed Wording {#wording}
===========================
The proposed changes are relative to the working draft of the standard as of [[!n4791]].
Modify **21.2.7 [unord.req]** paragraph 11 as follows:
Add new paragraph 11.23 in **21.2.7 [unord.req]**:
<blockquote>
<ins>
- `hk` and `hke` denote values of type `size_t`,
</ins>
</blockquote>
Modify table 70 in section **21.2.7 [unord.req]** as follows:
<blockquote>
<table>
<tr>
<th>Expression</th>
<th>Return type</th>
<th>Assertion/note pre-/post-condition</th>
<th>Complexity</th>
</tr>
<tr>
<td>`b.find(k)`</td>
<td>`iterator`; `const_iterator` for const `b`.</td>
<td><i>Returns:</i> an iterator pointing to an element with key equivalent to `k`, or `b.end()`
if no such element exists.</td>
<td>Average case O(1), worst case O(`b.size()`).</td>
</tr>
<tr>
<td><ins>`b.find(k, hk)`</ins></td>
<td><ins>`iterator`; `const_iterator` for const `b`.</ins></td>
<td><ins><i>Expects:</i> `b.hash_function()(k)` equals `hk`,<br/>
<i>Returns:</i> an iterator pointing to an element with key equivalent to `k`, or `b.end()`
if no such element exists.</ins></td>
<td><ins>Average case O(1), worst case O(`b.size()`).</ins></td>
</tr>
<tr>
<td>`a_tran.find(ke)`</td>
<td>`iterator`; `const_iterator` for const `a_tran`.</td>
<td><i>Returns:</i> an iterator pointing to an element with key equivalent to `ke`, or `a_tran.end()` if no such
element exists.</td>
<td>Average case O(1), worst case O(`a_tran.size()`).</td>
</tr>
<tr>
<td><ins>`a_tran.find(ke, hke)`</ins></td>
<td><ins>`iterator`; `const_iterator` for const `a_tran`.</ins></td>
<td><ins><i>Expects:</i> `a_tran.hash_function()(ke)` equals `hke`,<br/>
<i>Returns:</i> an iterator pointing to an element with key equivalent to `ke`, or `a_tran.end()` if no such
element exists.</ins></td>
<td><ins>Average case O(1), worst case O(`a_tran.size()`).</ins></td>
</tr>
<tr>
<td>`b.count(k)`<br/></td>
<td>`size_type`</td>
<td><i>Returns:</i> the number of elements with key equivalent to `k`.</td>
<td>Average case O(`b.count(k)`), worst case O(`b.size()`).</td>
</tr>
<tr>
<td><ins>`b.count(k, hk)`</ins></td>
<td>`size_type`</td>
<td><ins><i>Expects:</i> `b.hash_function()(k)` equals `hk`,<br/>
<i>Returns:</i> the number of elements with key equivalent to `k`.</ins></td>
<td><ins>Average case O(`b.count(k)`), worst case O(`b.size()`).</ins></td>
</tr>
<tr>
<td>`a_tran.count(ke)`</td>
<td>`size_type`</td>
<td><i>Returns:</i> the number of elements with key equivalent to `ke`.</td>
<td>Average case O(`a_tran.count(ke)`), worst case O(`a_tran.size()`).</td>
</tr>
<tr>
<td><ins>`a_tran.count(ke, hke)`</ins></td>
<td>`size_type`</td>
<td><ins><i>Expects:</i> `a_tran.hash_function()(ke)` equals `hke`,<br/>
<i>Returns:</i> the number of elements with key equivalent to `ke`.</ins></td>
<td><ins>Average case O(`a_tran.count(ke)`), worst case O(`a_tran.size()`).</ins></td>
</tr>
<tr>
<td>`b.contains(k)`</td>
<td>bool</td>
<td><ins><i>Effects:</i></ins> Equivalent to `b.find(k) != b.end()`</td>
<td>Average case O(1), worst case O(`b.size()`)</td>
</tr>
<tr>
<td><ins>`b.contains(k, hk)`</ins></td>
<td><ins>bool</ins></td>
<td><ins><i>Expects:</i> `b.hash_function()(k)` equals `hk`,<br/>
<i>Effects:</i> Equivalent to `b.find(k, hk) != b.end()`</ins></td>
<td><ins>Average case O(1), worst case O(`b.size()`)</ins></td>
</tr>
<tr>
<td>`a_tran.contains(ke)`</td>
<td>bool</td>
<td><ins><i>Effects:</i></ins> Equivalent to `a_tran.find(ke) != a_tran.end()`</td>
<td>Average case O(1), worst case O(`a_tran.size()`)</td>
</tr>
<tr>
<td><ins>`a_tran.contains(ke, hke)`</ins></td>
<td><ins>bool</ins></td>
<td><ins><i>Expects:</i> `a_tran.hash_function()(ke)` equals `hke`,<br/>
<i>Effects:</i> Equivalent to `a_tran.find(ke, hke) != a_tran.end()`</ins></td>
<td><ins>Average case O(1), worst case O(`a_tran.size()`)</ins></td>
</tr>
<tr>
<td>`b.equal_range(k)`</td>
<td>`pair<iterator, iterator>`; `pair<const_iterator, const_iterator>` for const `b`.</td>
<td><i>Returns:</i> a range containing all elements with keys equivalent to `k`. Returns
`make_pair(b.end(), b.end())` if no such elements exist.</td>
<td>Average case O(`b.count(k)`), worst case O(`b.size()`).</td>
</tr>
<tr>
<td><ins>`b.equal_range(k, hk)`</ins></td>
<td><ins>`pair<iterator, iterator>`; `pair<const_iterator, const_iterator>` for const `b`.</ins></td>
<td><ins><i>Expects:</i> `b.hash_function()(k)` equals `hk`,<br/>
<i>Returns:</i> a range containing all elements with keys equivalent to `k`. Returns
`make_pair(b.end(), b.end())` if no such elements exist.</ins></td>
<td><ins>Average case O(`b.count(k)`), worst case O(`b.size()`).</ins></td>
</tr>
<tr>
<td>`a_tran.equal_range(ke)`</td>
<td>`pair<iterator, iterator>`; `pair<const_iterator, const_iterator>` for const `a_tran`.</td>
<td><i>Returns:</i> a range containing all elements with keys equivalent to `ke`. Returns
`make_pair(a_tran.end(), a_tran.end())` if no such elements exist.</td>
<td>Average case O(`a_tran.count(ke)`), worst case O(`a_tran.size()`).</td>
</tr>
<tr>
<td><ins>`a_tran.equal_range(ke, hke)`</ins></td>
<td><ins>`pair<iterator, iterator>`; `pair<const_iterator, const_iterator>` for const `a_tran`.</ins></td>
<td><ins><i>Expects:</i> `a_tran.hash_function()(ke)` equals `hke`,<br/>
<i>Returns:</i> a range containing all elements with keys equivalent to `ke`. Returns
`make_pair(a_tran.end(), a_tran.end())` if no such elements exist.</ins></td>
<td><ins>Average case O(`a_tran.count(ke)`), worst case O(`a_tran.size()`).</ins></td>
</tr>
</table>
</blockquote>
Add the following changes to:
- **21.5.4.1 [unord.map.overview]**
- **21.5.5.1 [unord.multimap.overview]**
- **21.5.6.1 [unord.set.overview]**
- **21.5.7.1 [unord.multiset.overview]**
<blockquote>
<pre>
iterator find(const key_type& k);
const_iterator find(const key_type& k) const;
<ins>iterator find(const key_type& k, size_t hash);
const_iterator find(const key_type& k, size_t hash) const;</ins>
template <class K> iterator find(const K& k);
template <class K> const_iterator find(const K& k) const;
<ins>template <class K> iterator find(const K& k, size_t hash);
template <class K> const_iterator find(const K& k, size_t hash) const;</ins>
size_type count(const key_type& k) const;
<ins>size_type count(const key_type& k, size_t hash) const;</ins>
template <class K> size_type count(const K& k) const;
<ins>template <class K> size_type count(const K& k, size_t hash) const;</ins>
bool contains(const key_type& k) const;
<ins>bool contains(const key_type& k, size_t hash) const;</ins>
template <class K> bool contains(const K& k) const;
<ins>template <class K> bool contains(const K& k, size_t hash) const;</ins>
pair<iterator, iterator> equal_range(const key_type& k);
pair<const_iterator, const_iterator> equal_range(const key_type& k) const;
<ins>pair<iterator, iterator> equal_range(const key_type& k, size_t hash);
pair<const_iterator, const_iterator> equal_range(const key_type& k, size_t hash) const;</ins>
template <class K> pair<iterator, iterator> equal_range(const K& k);
template <class K> pair<const_iterator, const_iterator> equal_range(const K& k) const;
<ins>template <class K> pair<iterator, iterator> equal_range(const K& k, size_t hash);
template <class K> pair<const_iterator, const_iterator> equal_range(const K& k, size_t hash) const;</ins>
</pre>
</blockquote>
Feature Testing {#feature-testing}
==================================
Add the following row to a **Table 36** in **16.3.1 [support.limits.general]** paragraph 3:
<table>
<tr>
<th>Macro name</th>
<th>Value</th>
<th>Header(s)</th>
</tr>
<tr>
<td colspan="3">...</td>
</tr>
<tr>
<td>__cpp_lib_generic_associative_lookup</td>
<td>201304L</td>
<td><map> <set></td>
</tr>
<tr>
<td>__cpp_lib_generic_unordered_lookup</td>
<td>201811L</td>
<td><unordered_map> <unordered_set></td>
</tr>
<tr>
<td><ins>__cpp_lib_generic_unordered_hash_lookup</ins></td>
<td></td>
<td><ins><unordered_map> <unordered_set></ins></td>
</tr>
</table>
Implementation Experience {#implementation}
===========================================
Changes related to that proposal are partially implemented in [GitHub repo](https://github.com/mpusz/unordered_v2)
against [libc++ 7.0.0](https://libcxx.llvm.org).
Simple performance tests provided there proved nearly:
- 20% performance gain for short text
- 50% performance gain for long text
Revision History {#revision-history}
====================================
r1 ➡ r2 [[diff](https://github.com/mpusz/wg21-papers/commit/c2b056b9ce238eea28c25a4ae2bddbbe468ecefa)] {#r1r2}
-------------------------------------------------------------------------------------------------------------
- Table 70 updated according to [[!p0788r3]]
r0 ➡ r1 [[diff](https://github.com/mpusz/wg21-papers/commit/8a1ba0ea256efaf2ac65c3e136b60b0c8dea7d96)] {#r0r1}
-------------------------------------------------------------------------------------------------------------
- Rebased to [[!n4791]]
- Simplified wording by aggregating rows in a table (where possible) and providing overview wording once
for all the containers
- Feature test macro name changed
Acknowledgements {#acknowledgements}
====================================
Special thanks and recognition goes to [Epam Systems](http://www.epam.com) for supporting my
membership in the ISO C++ Committee and the production of this proposal.