Add benchmarks results for e6bf71871a6b9f601545dba8a42ce89c6069675c

luau-lang · Nov 9, 2024 · 0ed9439 · 0ed9439
1 parent e54fa07
commit 0ed9439
Show file tree

Hide file tree

Showing 5 changed files with 951 additions and 5 deletions.
diff --git a/analyze.json b/analyze.json
@@ -1,5 +1,5 @@
 {
-  "lastUpdate": 1731103069070,
+  "lastUpdate": 1731112783238,
   "repoUrl": "https://github.com/luau-lang/luau",
   "entries": {
     "luau-analyze": [
@@ -16234,6 +16234,72 @@
             "extra": "luau-analyze"
           }
         ]
+      },
+      {
+        "commit": {
+          "author": {
+            "email": "[email protected]",
+            "name": "Arseny Kapoulkine",
+            "username": "zeux"
+          },
+          "committer": {
+            "email": "[email protected]",
+            "name": "GitHub",
+            "username": "web-flow"
+          },
+          "distinct": true,
+          "id": "e6bf71871a6b9f601545dba8a42ce89c6069675c",
+          "message": "CodeGen: Rewrite dot product lowering using a dedicated IR instruction (#1512)\n\nInstead of doing the dot product related math in scalar IR, we lift the\r\ncomputation into a dedicated IR instruction.\r\n\r\nOn x64, we can use VDPPS which was more or less tailor made for this\r\npurpose. This is better than manual scalar lowering that requires\r\nreloading components from memory; it's not always a strict improvement\r\nover the shuffle+add version (which we never had), but this can now be\r\nadjusted in the IR lowering in an optimal fashion (maybe even based on\r\nCPU vendor, although that'd create issues for offline compilation).\r\n\r\nOn A64, we can either use naive adds or paired adds, as there is no\r\ndedicated vector-wide horizontal instruction until SVE. Both run at\r\nabout the same performance on M2, but paired adds require fewer\r\ninstructions and temporaries.\r\n\r\nI've measured this using mesh-normal-vector benchmark, changing the\r\nbenchmark to just report the time of the second loop inside\r\n`calculate_normals`, testing master vs #1504 vs this PR, also increasing\r\nthe grid size to 400 for more stable timings.\r\n\r\nOn Zen 4 (7950X), this PR is comfortably ~8% faster vs master, while I\r\nsee neutral to negative results in #1504.\r\nOn M2 (base), this PR is ~28% faster vs master, while #1504 is only\r\nabout ~10% faster.\r\n\r\nIf I measure the second loop in `calculate_tangent_space` instead, I\r\nget:\r\n\r\nOn Zen 4 (7950X), this PR is ~12% faster vs master, while #1504 is ~3%\r\nfaster\r\nOn M2 (base), this PR is ~24% faster vs master, while #1504 is only\r\nabout ~13% faster.\r\n\r\nNote that the loops in question are not quite optimal, as they store and\r\nreload various vectors to dictionary values due to inappropriate use of\r\nlocals. The underlying gains in individual functions are thus larger\r\nthan the numbers above; for example, changing the `calculate_normals`\r\nloop to use a local variable to store the normalized vector (but still\r\nsaving the result to dictionary value), I get a ~24% performance\r\nincrease from this PR on Zen4 vs master instead of just 8% (#1504 is\r\n~15% slower in this setup).",
+          "timestamp": "2024-11-08T16:23:09-08:00",
+          "tree_id": "c5bf52046b7ad7c495e930780fe8ba3a95d09432",
+          "url": "https://github.com/luau-lang/luau/commit/e6bf71871a6b9f601545dba8a42ce89c6069675c"
+        },
+        "date": 1731112783233,
+        "tool": "benchmarkluau",
+        "benches": [
+          {
+            "name": "map-nonstrict",
+            "value": 4.86579,
+            "unit": "4ms",
+            "range": "±0%",
+            "extra": "luau-analyze"
+          },
+          {
+            "name": "map-strict",
+            "value": 5.92236,
+            "unit": "5ms",
+            "range": "±0%",
+            "extra": "luau-analyze"
+          },
+          {
+            "name": "map-dcr",
+            "value": 26.9731,
+            "unit": "ms",
+            "range": "±0%",
+            "extra": "luau-analyze"
+          },
+          {
+            "name": "regex-nonstrict",
+            "value": 8.16724,
+            "unit": "8ms",
+            "range": "±0%",
+            "extra": "luau-analyze"
+          },
+          {
+            "name": "regex-strict",
+            "value": 10.6295,
+            "unit": "ms",
+            "range": "±0%",
+            "extra": "luau-analyze"
+          },
+          {
+            "name": "regex-dcr",
+            "value": 7742.64,
+            "unit": "ms",
+            "range": "±0%",
+            "extra": "luau-analyze"
+          }
+        ]
       }
     ]
   }

diff --git a/bench-codegen.json b/bench-codegen.json
@@ -1,5 +1,5 @@
 {
-  "lastUpdate": 1731103068754,
+  "lastUpdate": 1731112782915,
   "repoUrl": "https://github.com/luau-lang/luau",
   "entries": {
     "callgrind codegen": [
@@ -47856,6 +47856,268 @@
             "extra": "luau-codegen"
           }
         ]
+      },
+      {
+        "commit": {
+          "author": {
+            "email": "[email protected]",
+            "name": "Arseny Kapoulkine",
+            "username": "zeux"
+          },
+          "committer": {
+            "email": "[email protected]",
+            "name": "GitHub",
+            "username": "web-flow"
+          },
+          "distinct": true,
+          "id": "e6bf71871a6b9f601545dba8a42ce89c6069675c",
+          "message": "CodeGen: Rewrite dot product lowering using a dedicated IR instruction (#1512)\n\nInstead of doing the dot product related math in scalar IR, we lift the\r\ncomputation into a dedicated IR instruction.\r\n\r\nOn x64, we can use VDPPS which was more or less tailor made for this\r\npurpose. This is better than manual scalar lowering that requires\r\nreloading components from memory; it's not always a strict improvement\r\nover the shuffle+add version (which we never had), but this can now be\r\nadjusted in the IR lowering in an optimal fashion (maybe even based on\r\nCPU vendor, although that'd create issues for offline compilation).\r\n\r\nOn A64, we can either use naive adds or paired adds, as there is no\r\ndedicated vector-wide horizontal instruction until SVE. Both run at\r\nabout the same performance on M2, but paired adds require fewer\r\ninstructions and temporaries.\r\n\r\nI've measured this using mesh-normal-vector benchmark, changing the\r\nbenchmark to just report the time of the second loop inside\r\n`calculate_normals`, testing master vs #1504 vs this PR, also increasing\r\nthe grid size to 400 for more stable timings.\r\n\r\nOn Zen 4 (7950X), this PR is comfortably ~8% faster vs master, while I\r\nsee neutral to negative results in #1504.\r\nOn M2 (base), this PR is ~28% faster vs master, while #1504 is only\r\nabout ~10% faster.\r\n\r\nIf I measure the second loop in `calculate_tangent_space` instead, I\r\nget:\r\n\r\nOn Zen 4 (7950X), this PR is ~12% faster vs master, while #1504 is ~3%\r\nfaster\r\nOn M2 (base), this PR is ~24% faster vs master, while #1504 is only\r\nabout ~13% faster.\r\n\r\nNote that the loops in question are not quite optimal, as they store and\r\nreload various vectors to dictionary values due to inappropriate use of\r\nlocals. The underlying gains in individual functions are thus larger\r\nthan the numbers above; for example, changing the `calculate_normals`\r\nloop to use a local variable to store the normalized vector (but still\r\nsaving the result to dictionary value), I get a ~24% performance\r\nincrease from this PR on Zen4 vs master instead of just 8% (#1504 is\r\n~15% slower in this setup).",
+          "timestamp": "2024-11-08T16:23:09-08:00",
+          "tree_id": "c5bf52046b7ad7c495e930780fe8ba3a95d09432",
+          "url": "https://github.com/luau-lang/luau/commit/e6bf71871a6b9f601545dba8a42ce89c6069675c"
+        },
+        "date": 1731112782905,
+        "tool": "benchmarkluau",
+        "benches": [
+          {
+            "name": "base64",
+            "value": 11.54,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "chess",
+            "value": 52.008,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "life",
+            "value": 23.355,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "matrixmult",
+            "value": 9.335,
+            "unit": "9ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "mesh-normal-scalar",
+            "value": 13.055,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "mesh-normal-vector",
+            "value": 8.109,
+            "unit": "8ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "pcmmix",
+            "value": 1.36,
+            "unit": "1ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "qsort",
+            "value": 41.507,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "sha256",
+            "value": 4.57,
+            "unit": "4ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "ack",
+            "value": 40.015,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "binary-trees",
+            "value": 20.847,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "fannkuchen-redux",
+            "value": 3.892,
+            "unit": "3ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "fixpoint-fact",
+            "value": 48.87,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "heapsort",
+            "value": 7.716,
+            "unit": "7ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "mandel",
+            "value": 40.418,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "n-body",
+            "value": 9.707,
+            "unit": "9ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "qt",
+            "value": 24.975,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "queen",
+            "value": 0.805,
+            "unit": "0ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "scimark",
+            "value": 24.636,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "spectral-norm",
+            "value": 2.444,
+            "unit": "2ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "sieve",
+            "value": 84.552,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "3d-cube",
+            "value": 3.732,
+            "unit": "3ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "3d-morph",
+            "value": 3.747,
+            "unit": "3ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "3d-raytrace",
+            "value": 3.28,
+            "unit": "3ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "controlflow-recursive",
+            "value": 3.464,
+            "unit": "3ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "crypto-aes",
+            "value": 7.182,
+            "unit": "7ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "fannkuch",
+            "value": 6.167,
+            "unit": "6ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "math-cordic",
+            "value": 3.768,
+            "unit": "3ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "math-partial-sums",
+            "value": 1.917,
+            "unit": "1ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "n-body-oop",
+            "value": 13.739,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "tictactoe",
+            "value": 62.952,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "trig",
+            "value": 6.65,
+            "unit": "6ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          },
+          {
+            "name": "vector",
+            "value": null,
+            "unit": ":",
+            "range": "±+/-",
+            "extra": "on"
+          },
+          {
+            "name": "voxelgen",
+            "value": 27.659,
+            "unit": "ms",
+            "range": "±0.000%",
+            "extra": "luau-codegen"
+          }
+        ]
       }
     ]
   }