Reduce overhead of function call and deprecate rarely used utilities #1024

ricardoV94 · 2024-10-09T11:51:24Z

We have way too much function overhead call. This PR tries to remove some of it.

I used this simple one Op function as a test:

import numpy as np
import pytensor
import pytensor.tensor as pt
from pytensor.tensor.random.type import random_generator_type

rng = np.random.default_rng(123)
def numpy_normal(rng):
    # We have an explicit expand dims in PyTensor
    return rng.normal(np.array([0]), np.array([1]), size=(100,)) 

print("Direct numpy call:")
%timeit numpy_normal(rng)

rng_pt = random_generator_type()
x = pt.random.normal(rng=rng_pt, size=(100,))
fn = pytensor.function([pytensor.In(rng_pt, mutable=True)], x)

print("Fn call without trust input:")
%timeit fn(rng)

fn.trust_input = True
print("Fn call with trust input: ")
%timeit fn(rng)

fn.input_storage[0].storage[0] = rng
print("VM call bypassing Fn:")
%timeit fn.vm()

Note the largest speedup comes from removing the extra checks in RandomVariable.perform. But you can see some minor speedup in the fn eval beyond what's observed by calling the vm directly.

Before the PR:

Direct numpy call:
14.1 μs ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Fn call without trust input:
41 μs ± 377 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Fn call with trust input: 
39.1 μs ± 889 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

VM call bypassing Fn:
30.3 μs ± 885 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

After the PR:

Direct numpy call:
14.5 μs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Fn call without trust input:
27.3 μs ± 788 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Fn call with trust input: 
26.4 μs ± 458 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

VM call bypassing Fn:
20.4 μs ± 334 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Some of the changes would be more visible in a function with more inputs (but still relatively fast).

It also deprecates a bunch of obscure options that should allow us to remove a few more conditionals in the future.

Related to #552

📚 Documentation preview 📚: https://pytensor--1024.org.readthedocs.build/en/1024/

ricardoV94 · 2024-10-10T10:32:49Z

pytensor/gradient.py

@@ -128,9 +128,6 @@ def fiter_variable(self, other):
            " a symbolic placeholder."
        )

-    def may_share_memory(a, b):


Removing this avoids having to check for aliasing in the first place, during the function call

ricardoV94 · 2024-10-10T10:36:55Z

pytensor/compile/function/types.py

+            # and can't be aliased.
+            if not (
+                isinstance(inp, In)
+                and inp.borrow


If it's mutable it must be borrow (this is asserted in In.__init__) so no need to check for both as before. Also the attribute always exists.

ricardoV94 · 2024-10-10T10:38:05Z

pytensor/compile/function/types.py

+                if any(
+                    inp.variable.type.is_super(other_inp.variable.type)
+                    or other_inp.variable.type.is_super(inp.variable.type)
+                    for other_inp in group
+                ):


This is more careful than the check before, which assumed a vector(shape=None) and vector(shape=(1,)) could not be aliased as their types were different, but they could. New test condition checks for this.

codecov · 2024-10-10T10:45:14Z

Codecov Report

Attention: Patch coverage is 91.93548% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.89%. Comparing base (a377c22) to head (720ecbc).

Files with missing lines	Patch %	Lines
pytensor/compile/function/types.py	90.90%	3 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1024      +/-   ##
==========================================
- Coverage   81.90%   81.89%   -0.01%     
==========================================
  Files         182      182              
  Lines       47879    47840      -39     
  Branches     8617     8608       -9     
==========================================
- Hits        39214    39180      -34     
+ Misses       6492     6487       -5     
  Partials     2173     2173

Files with missing lines	Coverage Δ
pytensor/gradient.py	`77.57% <ø> (+0.07%)`	⬆️
pytensor/graph/null_type.py	`64.70% <ø> (+1.54%)`	⬆️
pytensor/graph/op.py	`88.08% <100.00%> (+0.06%)`	⬆️
pytensor/graph/type.py	`93.65% <100.00%> (-0.20%)`	⬇️
pytensor/scalar/basic.py	`80.52% <ø> (+0.01%)`	⬆️
pytensor/tensor/random/op.py	`93.51% <100.00%> (-0.21%)`	⬇️
pytensor/tensor/type_other.py	`74.41% <ø> (+0.26%)`	⬆️
pytensor/compile/function/types.py	`79.47% <90.90%> (-0.47%)`	⬇️

tests/compile/function/test_types.py

pytensor/compile/function/types.py

Armavica · 2024-10-16T08:19:42Z

pytensor/compile/function/types.py

+        # Group indexes of inputs that are potentially aliased to each other
+        # Note: Historically, we only worried about aliasing inputs if they belonged to the same type,
+        #  even though there could be two distinct types that use the same kinds of underlying objects.
+        potential_aliased_input_groups = []


Perhaps this logic could be a utility function? It is reused almost identically below

What is reused almost identically? Also note that I want to remove this whole logic soon: #1026

Making groups from a list of elements based on an equivalence function

Maybe but the goal of the PR is to reduce the overhead of the function.__call__ which usually runs in a hot loop. Adding nested function calls may defeat the purpose. Perhaps I will just go ahead with #1026 and shed this whole logic.

I agree it's ugly to repeat the code

ricardoV94 · 2024-10-22T20:59:44Z

Marking as a draft because I want to add a note on the docstrings of pytensor.In about mutable and alias

ricardoV94 · 2024-10-25T09:20:28Z

Moved the last commit to a separate PR: #1049

Want to simmer a bit on that before

ricardoV94 force-pushed the perf_enh branch 3 times, most recently from 4e0e6d5 to 25147b8 Compare October 10, 2024 08:17

ricardoV94 added maintenance performance labels Oct 10, 2024

ricardoV94 force-pushed the perf_enh branch from 25147b8 to 02cea48 Compare October 10, 2024 08:27

ricardoV94 added the major label Oct 10, 2024

ricardoV94 changed the title ~~Improve overhead of function call~~ Improve overhead of function call and deprecate rarely used utilities Oct 10, 2024

ricardoV94 force-pushed the perf_enh branch from 02cea48 to f4d1e98 Compare October 10, 2024 10:22

ricardoV94 commented Oct 10, 2024

View reviewed changes

ricardoV94 marked this pull request as ready for review October 10, 2024 10:40

ricardoV94 requested review from jessegrabowski and Armavica October 10, 2024 10:40

ricardoV94 changed the title ~~Improve overhead of function call and deprecate rarely used utilities~~ Reduce overhead of function call and deprecate rarely used utilities Oct 10, 2024

ricardoV94 mentioned this pull request Oct 10, 2024

Reconsider checking for input alias during function calls #1026

Open

ricardoV94 force-pushed the perf_enh branch from f4d1e98 to 32ddbcf Compare October 10, 2024 12:57

Armavica reviewed Oct 16, 2024

View reviewed changes

ricardoV94 force-pushed the perf_enh branch 2 times, most recently from 7a2d4aa to 2faae23 Compare October 17, 2024 12:38

ricardoV94 added 5 commits October 17, 2024 14:38

Benchmark minimal random function call

48e2cd4

Speedup random perform

cf3a4f3

Speedup node eval

147c892

Cleanup Function.__call__

d77f26c

Deprecate rarely used Function functionality

4258475

ricardoV94 force-pushed the perf_enh branch 3 times, most recently from 23b4fe9 to 720ecbc Compare October 22, 2024 12:53

ricardoV94 requested a review from Armavica October 22, 2024 20:55

ricardoV94 marked this pull request as draft October 22, 2024 20:58

ricardoV94 force-pushed the perf_enh branch from 720ecbc to 4258475 Compare October 25, 2024 09:18

ricardoV94 marked this pull request as ready for review October 25, 2024 09:19

ricardoV94 mentioned this pull request Oct 25, 2024

Dont check input alias on Function call #1049

Draft

ricardoV94 requested a review from twiecki October 25, 2024 14:18

twiecki approved these changes Oct 28, 2024

View reviewed changes

ricardoV94 merged commit 7b13a95 into pymc-devs:main Oct 29, 2024
60 of 61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce overhead of function call and deprecate rarely used utilities #1024

Reduce overhead of function call and deprecate rarely used utilities #1024

ricardoV94 commented Oct 9, 2024 •

edited

Loading

ricardoV94 Oct 10, 2024

ricardoV94 Oct 10, 2024

ricardoV94 Oct 10, 2024

codecov bot commented Oct 10, 2024 •

edited

Loading

Armavica Oct 16, 2024

ricardoV94 Oct 16, 2024

Armavica Oct 16, 2024

ricardoV94 Oct 16, 2024

ricardoV94 commented Oct 22, 2024

ricardoV94 commented Oct 25, 2024

Reduce overhead of function call and deprecate rarely used utilities #1024

Reduce overhead of function call and deprecate rarely used utilities #1024

Conversation

ricardoV94 commented Oct 9, 2024 • edited Loading

ricardoV94 Oct 10, 2024

Choose a reason for hiding this comment

ricardoV94 Oct 10, 2024

Choose a reason for hiding this comment

ricardoV94 Oct 10, 2024

Choose a reason for hiding this comment

codecov bot commented Oct 10, 2024 • edited Loading

Codecov Report

Armavica Oct 16, 2024

Choose a reason for hiding this comment

ricardoV94 Oct 16, 2024

Choose a reason for hiding this comment

Armavica Oct 16, 2024

Choose a reason for hiding this comment

ricardoV94 Oct 16, 2024

Choose a reason for hiding this comment

ricardoV94 commented Oct 22, 2024

ricardoV94 commented Oct 25, 2024

ricardoV94 commented Oct 9, 2024 •

edited

Loading

codecov bot commented Oct 10, 2024 •

edited

Loading