Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Premature optimiztion #617

Closed
the-moisrex opened this issue Mar 20, 2024 · 3 comments
Closed

Premature optimiztion #617

the-moisrex opened this issue Mar 20, 2024 · 3 comments

Comments

@the-moisrex
Copy link
Contributor

the-moisrex commented Mar 20, 2024

I saw your conference, in this slide you seem to be proud of this code:

constexpr ada::scheme::type get_scheme_type(
    std::string_view const scheme) noexcept {
  if (scheme.empty()) {
    return ada::scheme::NOT_SPECIAL;
  }
  auto const hash_value = static_cast<int>(
      (2 * scheme.size() + static_cast<unsigned>(scheme[0])) & 7U);
  const std::string_view target = details::is_special_list[hash_value];
  if ((target[0] == scheme[0]) && (target.substr(1) == scheme.substr(1))) {
    return static_cast<ada::scheme::type>(hash_value);
  }
  return ada::scheme::NOT_SPECIAL;
}

Here I'm gonna give you a simpler solution that is also faster:

static constexpr std::string_view is_special_list2[] = {
    "http", "https", "ws", "ftp", "wss", "file"};

// a little bit of re-ordering here:
enum class type2 : uint8_t {
  HTTP,
  HTTPS,
  WS,
  FTP,
  WSS,
  FILE,
  NOT_SPECIAL,
};

constexpr ada::scheme::type2
get_simple_scheme_type(std::string_view const scheme) noexcept {
  std::underlying_type_t<ada::scheme::type2> i = 0;
  for (; i != 6; ++i) { // hopefully it'll be unrolled
    auto const scheme_ith = details::is_special_list2[i];
    if (scheme == scheme_ith) {
      break;
    }
  }

  return static_cast<ada::scheme::type2>(i);
}

Why?:

  • loops get unrolled, hopefully (we can do it manually)
  • string_view::size_type is a 64bit integer, and you're multiplying it. Of course that's gonna be slower than comparing 8bit characters.

GCC:

2024-03-20T11:35:18-07:00
Running ./a.out
Run on (32 X 5500 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 2048 KiB (x16)
  L3 Unified 36864 KiB (x1)
Load Average: 0.25, 0.34, 0.53
--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
Ada                 1.60 ns         1.60 ns    447947046
Simple             0.976 ns        0.975 ns    713186229
LongStrAda           117 ns          116 ns      6137922
LongStrSimple        114 ns          114 ns      6077361

Clang:

2024-03-20T11:36:07-07:00
Running ./a.out
Run on (32 X 5191.05 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 2048 KiB (x16)
  L3 Unified 36864 KiB (x1)
Load Average: 0.50, 0.40, 0.54
--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
Ada                 1.52 ns         1.52 ns    453872906
Simple             0.897 ns        0.896 ns    765627214
LongStrAda           118 ns          118 ns      5999300
LongStrSimple        120 ns          120 ns      6019850

Unrelated note:

I don't think we really need that enum to be detailed, we just need these values because that's the only ones that affect the algorithms:

    enum struct scheme_type : stl::uint8_t {
        not_special,    // everything else
        special_scheme, // http(s), ws(s), ftp
        file,           // file scheme
    };

Benchmark Soruce

@the-moisrex
Copy link
Contributor Author

I didn't make a PR because I wasn't sure if changing the order of scheme::type will break any other algorithm or not. If it doesn't, let me know.

@lemire
Copy link
Member

lemire commented Mar 20, 2024

It is entirely possible for a branchy approach to do even slightly better than the mostly branchless approach that we use in some tests where the protocol is predictable. E.g., if you know that the protocol will be "http", first checking for "http" might save the hashing cost. However, the hashing approach has the benefit of providing consistent worst case performance: you always do at most one string comparison... as opposed to scanning the whole set in the worst case.

I will add a comment in the code.

@lemire
Copy link
Member

lemire commented Mar 20, 2024

See PR #618 where the design choice gets explained.

@ada-url ada-url locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants