Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update slides - June 2024 #489

Merged
merged 1 commit into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 167 additions & 69 deletions docs/slides/intro/cubed-intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"source": [
"# Cubed: an introduction\n",
"\n",
"Tom White, November 2023"
"Tom White, June 2024"
]
},
{
Expand Down Expand Up @@ -207,9 +207,9 @@
"source": [
"# Example: `reduction`\n",
"\n",
"![`reduction`](../../images/reduction.svg)\n",
"![`reduction`](../../images/reduction_new.svg)\n",
"\n",
"Implemented using multiple rounds of calls to `blockwise` and `rechunk`."
"Implemented using multiple rounds of a tree reduce operation followed by a final aggregation."
]
},
{
Expand Down Expand Up @@ -239,72 +239,159 @@
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"163pt\" height=\"146pt\" viewBox=\"0.00 0.00 163.00 146.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 142)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-142 159,-142 159,4 -4,4\"/>\n",
"<text text-anchor=\"middle\" x=\"77.5\" y=\"-18\" font-family=\"Times,serif\" font-size=\"10.00\">num tasks: 4</text>\n",
"<text text-anchor=\"middle\" x=\"77.5\" y=\"-7\" font-family=\"Times,serif\" font-size=\"10.00\">max projected memory: 100.0 MB</text>\n",
"<!-- array&#45;001 -->\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"230pt\" height=\"319pt\" viewBox=\"0.00 0.00 229.75 318.75\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 314.75)\">\n",
"<polygon fill=\"white\" stroke=\"none\" points=\"-4,4 -4,-314.75 225.75,-314.75 225.75,4 -4,4\"/>\n",
"<text text-anchor=\"start\" x=\"8\" y=\"-39.5\" font-family=\"Times,serif\" font-size=\"10.00\">num tasks: 5</text>\n",
"<text text-anchor=\"start\" x=\"8\" y=\"-28.25\" font-family=\"Times,serif\" font-size=\"10.00\">max projected memory: 100.0 MB</text>\n",
"<text text-anchor=\"start\" x=\"8\" y=\"-17\" font-family=\"Times,serif\" font-size=\"10.00\">total nbytes written: 72 bytes</text>\n",
"<text text-anchor=\"start\" x=\"8\" y=\"-5.75\" font-family=\"Times,serif\" font-size=\"10.00\">optimized: True</text>\n",
"<!-- op&#45;001 -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>array-001</title>\n",
"<g id=\"a_node1\"><a xlink:title=\"shape: (3, 3)\n",
"chunks: (2, 2)\n",
"dtype: int64\n",
"chunk memory: 32 bytes\n",
"\n",
"<title>op-001</title>\n",
"<g id=\"a_node1\"><a xlink:title=\"name: op-001\n",
"op: asarray\n",
"calls: &lt;module&gt; -&gt; asarray\n",
"line: 2 in &lt;module&gt;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"68.5,-138 9.5,-138 9.5,-102 68.5,-102 68.5,-138\"/>\n",
"<text text-anchor=\"middle\" x=\"39\" y=\"-123\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-001</text>\n",
"<text text-anchor=\"middle\" x=\"39\" y=\"-112\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">asarray </text>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M43.25,-310.75C43.25,-310.75 13.25,-310.75 13.25,-310.75 7.25,-310.75 1.25,-304.75 1.25,-298.75 1.25,-298.75 1.25,-286.75 1.25,-286.75 1.25,-280.75 7.25,-274.75 13.25,-274.75 13.25,-274.75 43.25,-274.75 43.25,-274.75 49.25,-274.75 55.25,-280.75 55.25,-286.75 55.25,-286.75 55.25,-298.75 55.25,-298.75 55.25,-304.75 49.25,-310.75 43.25,-310.75\"/>\n",
"<text text-anchor=\"middle\" x=\"28.25\" y=\"-294.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">op-001</text>\n",
"<text text-anchor=\"middle\" x=\"28.25\" y=\"-283.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">asarray</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- array&#45;004 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>array-004</title>\n",
"<g id=\"a_node3\"><a xlink:title=\"shape: (3, 3)\n",
"<!-- array&#45;001 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>array-001</title>\n",
"<g id=\"a_node2\"><a xlink:title=\"name: array-001\n",
"variable: a\n",
"shape: (3, 3)\n",
"chunks: (2, 2)\n",
"dtype: int64\n",
"chunk memory: 32 bytes\n",
"\n",
"chunk memory: 32 bytes\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"56.5,-238.75 0,-238.75 0,-202.75 56.5,-202.75 56.5,-238.75\"/>\n",
"<text text-anchor=\"middle\" x=\"28.25\" y=\"-222.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-001</text>\n",
"<text text-anchor=\"middle\" x=\"28.25\" y=\"-211.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">a</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- op&#45;001&#45;&gt;array&#45;001 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>op-001-&gt;array-001</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M28.25,-274.45C28.25,-267.16 28.25,-258.48 28.25,-250.29\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"31.75,-250.37 28.25,-240.37 24.75,-250.37 31.75,-250.37\"/>\n",
"</g>\n",
"<!-- op&#45;004 -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>op-004</title>\n",
"<g id=\"a_node5\"><a xlink:title=\"name: op-004\n",
"op: blockwise\n",
"projected memory: 100.0 MB\n",
"tasks: 4\n",
"num input blocks: (1, 1)\n",
"calls: &lt;module&gt; -&gt; add -&gt; elemwise -&gt; blockwise\n",
"line: 1 in &lt;module&gt;\">\n",
"<polygon fill=\"#dcbeff\" stroke=\"black\" points=\"106.5,-66 47.5,-66 47.5,-30 106.5,-30 106.5,-66\"/>\n",
"<text text-anchor=\"middle\" x=\"77\" y=\"-51\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-004</text>\n",
"<text text-anchor=\"middle\" x=\"77\" y=\"-40\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">add (bw)</text>\n",
"<path fill=\"#dcbeff\" stroke=\"black\" d=\"M80.25,-166.75C80.25,-166.75 50.25,-166.75 50.25,-166.75 44.25,-166.75 38.25,-160.75 38.25,-154.75 38.25,-154.75 38.25,-137 38.25,-137 38.25,-131 44.25,-125 50.25,-125 50.25,-125 80.25,-125 80.25,-125 86.25,-125 92.25,-131 92.25,-137 92.25,-137 92.25,-154.75 92.25,-154.75 92.25,-160.75 86.25,-166.75 80.25,-166.75\"/>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-153.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">op-004</text>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-142\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">add</text>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-130.75\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">tasks: 4</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- array&#45;001&#45;&gt;array&#45;004 -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>array-001-&gt;array-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M48.39,-101.7C52.76,-93.64 58.06,-83.89 62.9,-74.98\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"66.02,-76.56 67.71,-66.1 59.87,-73.22 66.02,-76.56\"/>\n",
"<!-- array&#45;001&#45;&gt;op&#45;004 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>array-001-&gt;op-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M37.02,-202.48C40.85,-194.94 45.45,-185.87 49.82,-177.26\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"52.88,-178.97 54.28,-168.47 46.64,-175.8 52.88,-178.97\"/>\n",
"</g>\n",
"<!-- op&#45;002 -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>op-002</title>\n",
"<g id=\"a_node3\"><a xlink:title=\"name: op-002\n",
"op: asarray\n",
"calls: &lt;module&gt; -&gt; asarray\n",
"line: 1 in &lt;module&gt;\">\n",
"<path fill=\"none\" stroke=\"black\" d=\"M118.25,-310.75C118.25,-310.75 88.25,-310.75 88.25,-310.75 82.25,-310.75 76.25,-304.75 76.25,-298.75 76.25,-298.75 76.25,-286.75 76.25,-286.75 76.25,-280.75 82.25,-274.75 88.25,-274.75 88.25,-274.75 118.25,-274.75 118.25,-274.75 124.25,-274.75 130.25,-280.75 130.25,-286.75 130.25,-286.75 130.25,-298.75 130.25,-298.75 130.25,-304.75 124.25,-310.75 118.25,-310.75\"/>\n",
"<text text-anchor=\"middle\" x=\"103.25\" y=\"-294.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">op-002</text>\n",
"<text text-anchor=\"middle\" x=\"103.25\" y=\"-283.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">asarray</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- array&#45;002 -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>array-002</title>\n",
"<g id=\"a_node2\"><a xlink:title=\"shape: (3, 3)\n",
"<g id=\"a_node4\"><a xlink:title=\"name: array-002\n",
"variable: b\n",
"shape: (3, 3)\n",
"chunks: (2, 2)\n",
"dtype: int64\n",
"chunk memory: 32 bytes\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"131.5,-238.75 75,-238.75 75,-202.75 131.5,-202.75 131.5,-238.75\"/>\n",
"<text text-anchor=\"middle\" x=\"103.25\" y=\"-222.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-002</text>\n",
"<text text-anchor=\"middle\" x=\"103.25\" y=\"-211.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">b</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- op&#45;002&#45;&gt;array&#45;002 -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>op-002-&gt;array-002</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M103.25,-274.45C103.25,-267.16 103.25,-258.48 103.25,-250.29\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"106.75,-250.37 103.25,-240.37 99.75,-250.37 106.75,-250.37\"/>\n",
"</g>\n",
"<!-- array&#45;002&#45;&gt;op&#45;004 -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>array-002-&gt;op-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M94.24,-202.48C90.31,-194.94 85.58,-185.87 81.09,-177.26\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"84.24,-175.71 76.51,-168.47 78.03,-178.95 84.24,-175.71\"/>\n",
"</g>\n",
"<!-- array&#45;004 -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>array-004</title>\n",
"<g id=\"a_node6\"><a xlink:title=\"name: array-004\n",
"variable: c\n",
"shape: (3, 3)\n",
"chunks: (2, 2)\n",
"dtype: int64\n",
"chunk memory: 32 bytes\n",
"\n",
"calls: &lt;module&gt; -&gt; asarray\n",
"line: 1 in &lt;module&gt;\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"145.5,-138 86.5,-138 86.5,-102 145.5,-102 145.5,-138\"/>\n",
"<text text-anchor=\"middle\" x=\"116\" y=\"-123\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-002</text>\n",
"<text text-anchor=\"middle\" x=\"116\" y=\"-112\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">asarray </text>\n",
"nbytes: 72 bytes\">\n",
"<polygon fill=\"#ffd8b1\" stroke=\"black\" points=\"93.5,-89 37,-89 37,-53 93.5,-53 93.5,-89\"/>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-72.75\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">array-004</text>\n",
"<text text-anchor=\"middle\" x=\"65.25\" y=\"-61.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">c</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- array&#45;002&#45;&gt;array&#45;004 -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>array-002-&gt;array-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M106.36,-101.7C101.87,-93.64 96.44,-83.89 91.48,-74.98\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"94.45,-73.14 86.53,-66.1 88.34,-76.54 94.45,-73.14\"/>\n",
"<!-- op&#45;004&#45;&gt;array&#45;004 -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>op-004-&gt;array-004</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M65.25,-124.58C65.25,-117.19 65.25,-108.7 65.25,-100.73\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"68.75,-100.74 65.25,-90.74 61.75,-100.74 68.75,-100.74\"/>\n",
"</g>\n",
"<!-- create&#45;arrays -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>create-arrays</title>\n",
"<g id=\"a_node7\"><a xlink:title=\"name: create-arrays\n",
"op: create-arrays\n",
"projected memory: 100.0 MB\n",
"tasks: 1\">\n",
"<path fill=\"none\" stroke=\"black\" d=\"M209.75,-310.75C209.75,-310.75 160.75,-310.75 160.75,-310.75 154.75,-310.75 148.75,-304.75 148.75,-298.75 148.75,-298.75 148.75,-286.75 148.75,-286.75 148.75,-280.75 154.75,-274.75 160.75,-274.75 160.75,-274.75 209.75,-274.75 209.75,-274.75 215.75,-274.75 221.75,-280.75 221.75,-286.75 221.75,-286.75 221.75,-298.75 221.75,-298.75 221.75,-304.75 215.75,-310.75 209.75,-310.75\"/>\n",
"<text text-anchor=\"middle\" x=\"185.25\" y=\"-294.5\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">create-arrays</text>\n",
"<text text-anchor=\"middle\" x=\"185.25\" y=\"-283.25\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">tasks: 1</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- arrays -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>arrays</title>\n",
"<g id=\"a_node8\"><a xlink:title=\"name: arrays\" target=\"None\">\n",
"<polygon fill=\"none\" stroke=\"black\" points=\"212.25,-238.75 158.25,-238.75 158.25,-202.75 212.25,-202.75 212.25,-238.75\"/>\n",
"<text text-anchor=\"middle\" x=\"185.25\" y=\"-216.88\" font-family=\"Helvetica,sans-Serif\" font-size=\"10.00\">arrays</text>\n",
"</a>\n",
"</g>\n",
"</g>\n",
"<!-- create&#45;arrays&#45;&gt;arrays -->\n",
"<g id=\"edge6\" class=\"edge\">\n",
"<title>create-arrays-&gt;arrays</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M185.25,-274.45C185.25,-267.16 185.25,-258.48 185.25,-250.29\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"188.75,-250.37 185.25,-240.37 181.75,-250.37 188.75,-250.37\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
Expand Down Expand Up @@ -346,14 +433,26 @@
"source": [
"# Optimization\n",
"\n",
"Cubed will optimize the graph before computing it - by fusing blockwise (map) operations.\n",
"Cubed will automatically optimize the graph before computing it. For example by fusing blockwise (map) operations:\n",
"\n",
"<p float=\"left\">\n",
" <img src=\"fusion-unoptimized.png\" />\n",
" <img src=\"fusion.png\" />\n",
"</p>\n",
" <img src=\"toy-unoptimized.png\" height=\"600\" />\n",
" <img src=\"toy-optimized.png\" height=\"600\"/>\n",
"</p>"
]
},
{
"cell_type": "markdown",
"id": "925fff3c-5531-4953-891e-b382583de56b",
"metadata": {},
"source": [
"# Optimization: an advanced example\n",
"\n",
"In early 2024 we implemented more optimizations to give a **4.8x** performance improvement on the \"Quadratic Means\" climate workload running on Lithops with AWS Lambda, with a **1.5 TB** workload completing in around **100 seconds**\n",
"\n",
"This is a simple case - still lots of optimizations left to do."
"<img src=\"benchmarks-aws.png\" width=\"600\">\n",
"\n",
"More details in [Optimizing Cubed](https://medium.com/pangeo/optimizing-cubed-7a0b8f65f5b7)\n"
]
},
{
Expand Down Expand Up @@ -452,30 +551,30 @@
},
{
"cell_type": "markdown",
"id": "b1fb4379",
"id": "d5a1fddd",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* __Modal__: new serverless platform\n",
" * Very easy to set up since it builds the runtime automatically\n",
" * Tested with ~300 workers"
"* __Lithops__: multi-cloud serverless computing framework\n",
" * Slightly more work to get started since you have to build a runtime environment first\n",
" * Tested on AWS Lambda and Google Cloud Functions with ~1000 workers"
]
},
{
"cell_type": "markdown",
"id": "d5a1fddd",
"id": "b1fb4379",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* __Lithops__: multi-cloud serverless computing framework\n",
" * Slightly more work to get started since you have to build a runtime environment first\n",
" * Tested on AWS Lambda and Google Cloud Functions with ~1000 workers"
"* __Modal__: new serverless platform\n",
" * Very easy to set up since it builds the runtime automatically\n",
" * Tested with ~300 workers"
]
},
{
Expand Down Expand Up @@ -525,7 +624,7 @@
"* Retries\n",
" * Each task is tried three times before failing\n",
"* Stragglers\n",
" * A backup task will be launched if a task is taking significantly longer than average (off by default)"
" * A backup task will be launched if a task is taking significantly longer than average"
]
},
{
Expand All @@ -539,10 +638,10 @@
"source": [
"# Xarray integration\n",
"\n",
"* Tom Nicholas added [Generalize handling of chunked array types](https://github.com/pydata/xarray/pull/7019) to Xarray\n",
" * Xarray can use Cubed as its computation engine instead of Dask\n",
" * Also needs [cubed-xarray](https://github.com/xarray-contrib/cubed-xarray) integration package\n",
"* Examples at https://github.com/pangeo-data/distributed-array-examples"
"* Xarray can use Cubed as its computation engine instead of Dask\n",
" * Just install the [cubed-xarray](https://github.com/xarray-contrib/cubed-xarray) integration package\n",
"* Cubed can use [Flox](https://flox.readthedocs.io/en/latest/) for `groupby` operations\n",
" * Examples at https://flox.readthedocs.io/en/latest/user-stories/climatology-hourly-cubed.html"
]
},
{
Expand All @@ -554,13 +653,12 @@
}
},
"source": [
"# Next steps\n",
"# Try out Cubed!\n",
"\n",
"* Community\n",
"* Examples and use cases\n",
" * Pangeo\n",
" * sgkit\n",
"* [Optimizations](https://github.com/tomwhite/cubed/issues?q=is%3Aissue+is%3Aopen+label%3Aoptimization)"
"* Try it out on your use case\n",
" * Get started at https://cubed-dev.github.io/cubed/\n",
"* Some examples from the Pangeo community:\n",
" * https://github.com/pangeo-data/distributed-array-examples"
]
}
],
Expand Down
69 changes: 42 additions & 27 deletions docs/slides/intro/cubed-intro.slides.html

Large diffs are not rendered by default.

Binary file removed docs/slides/intro/fusion-unoptimized.png
Binary file not shown.
Binary file removed docs/slides/intro/fusion.png
Binary file not shown.
Binary file added docs/slides/intro/toy-optimized.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/slides/intro/toy-unoptimized.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading