-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documenting hashing technique #618
Conversation
I'm just gonna leave this here; it's something from my own implementation but I've changed mine because I needed to ignore tabs and newlines. You should not be aiming for the worse case scenario; which, the worse case scenario of this algorithm is not as bad as you think it is. If you have 10000 character of scheme characters, worse case is not 10000. Worse case would be sum of all of the special schemes which is just 21. I have a benchmark for this as well. /**
* @return 0 if unknown, otherwise return the port
*/
[[nodiscard]] constexpr stl::uint16_t known_port() const noexcept {
// NOLINTBEGIN(*-magic-numbers)
switch (this->size()) {
case 2:
if (this->operator[](0) == 'w' && this->operator[](1) == 's') {
return 80U;
}
break;
case 3:
if (*this == "wss") {
return 443U;
}
if (*this == "ftp") {
return 21U;
}
break;
case 4:
if (*this == "http") {
return 80U;
}
break;
case 5:
if (*this == "https") {
return 443U;
}
break;
default: break;
}
return 0U;
// NOLINTEND(*-magic-numbers)
}
|
@the-moisrex I understand that these functions may be bottlenecks in your applications. In ada, with the current code, identifying the protocol or or the special port, does not even register when profiling. It may be possible to gain 1% or 2% in some tests but we can rule out significant gains from the profiling data alone, at least in the tests that we have done so far. We are still open to revisiting design choices, but this particular component of the code is unlikely to become a priority. I should point out that the code routine you offer is a form of hashing: you use the key length as the hash function and then branch out. There is evidently a wide range of design choices, and I invite you to pursue your investigations in your projects. |
Co-authored-by: Yagiz Nizipli <[email protected]>
As remarked by @the-moisrex in issue 617, Premature optimiztion, the hashing technique to identify the protocol is unnecessary if the protocol string is predictable. By simply testing all six possible strings, starting from the most probable, it is likely that you can often get better performance in many practical cases.
It is a trade-off between best average or best-case performance and worst-case performance. A naive approach, in the worst case, may need to do six different string comparisons which could be potentially expensive. We made the choice to go with an option that is slightly less optimal in the case where we can predict, at compile time, the protocol, but has the benefit of having well understood worst-case performance. It is a good old "O(1)" performance vs "O(N)" performance where N is not too large.
This trade-off can be revisited at any time. It is worth documenting it in the code, however. This is what this PR does.
See also : #619