Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce overhead of function call and deprecate rarely used utilities #1024

Merged
merged 5 commits into from
Oct 29, 2024

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Oct 9, 2024

We have way too much function overhead call. This PR tries to remove some of it.

I used this simple one Op function as a test:

import numpy as np
import pytensor
import pytensor.tensor as pt
from pytensor.tensor.random.type import random_generator_type

rng = np.random.default_rng(123)
def numpy_normal(rng):
    # We have an explicit expand dims in PyTensor
    return rng.normal(np.array([0]), np.array([1]), size=(100,)) 

print("Direct numpy call:")
%timeit numpy_normal(rng)

rng_pt = random_generator_type()
x = pt.random.normal(rng=rng_pt, size=(100,))
fn = pytensor.function([pytensor.In(rng_pt, mutable=True)], x)

print("Fn call without trust input:")
%timeit fn(rng)

fn.trust_input = True
print("Fn call with trust input: ")
%timeit fn(rng)

fn.input_storage[0].storage[0] = rng
print("VM call bypassing Fn:")
%timeit fn.vm()

Note the largest speedup comes from removing the extra checks in RandomVariable.perform. But you can see some minor speedup in the fn eval beyond what's observed by calling the vm directly.

Before the PR:

Direct numpy call:
14.1 μs ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Fn call without trust input:
41 μs ± 377 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Fn call with trust input: 
39.1 μs ± 889 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

VM call bypassing Fn:
30.3 μs ± 885 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

After the PR:

Direct numpy call:
14.5 μs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Fn call without trust input:
27.3 μs ± 788 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Fn call with trust input: 
26.4 μs ± 458 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

VM call bypassing Fn:
20.4 μs ± 334 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Some of the changes would be more visible in a function with more inputs (but still relatively fast).

It also deprecates a bunch of obscure options that should allow us to remove a few more conditionals in the future.

Related to #552


📚 Documentation preview 📚: https://pytensor--1024.org.readthedocs.build/en/1024/

@ricardoV94 ricardoV94 force-pushed the perf_enh branch 3 times, most recently from 4e0e6d5 to 25147b8 Compare October 10, 2024 08:17
@ricardoV94 ricardoV94 changed the title Improve overhead of function call Improve overhead of function call and deprecate rarely used utilities Oct 10, 2024
@@ -128,9 +128,6 @@ def fiter_variable(self, other):
" a symbolic placeholder."
)

def may_share_memory(a, b):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this avoids having to check for aliasing in the first place, during the function call

# and can't be aliased.
if not (
isinstance(inp, In)
and inp.borrow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's mutable it must be borrow (this is asserted in In.__init__) so no need to check for both as before. Also the attribute always exists.

Comment on lines +413 to +417
if any(
inp.variable.type.is_super(other_inp.variable.type)
or other_inp.variable.type.is_super(inp.variable.type)
for other_inp in group
):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more careful than the check before, which assumed a vector(shape=None) and vector(shape=(1,)) could not be aliased as their types were different, but they could. New test condition checks for this.

@ricardoV94 ricardoV94 marked this pull request as ready for review October 10, 2024 10:40
Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 91.93548% with 5 lines in your changes missing coverage. Please review.

Project coverage is 81.89%. Comparing base (a377c22) to head (720ecbc).

Files with missing lines Patch % Lines
pytensor/compile/function/types.py 90.90% 3 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1024      +/-   ##
==========================================
- Coverage   81.90%   81.89%   -0.01%     
==========================================
  Files         182      182              
  Lines       47879    47840      -39     
  Branches     8617     8608       -9     
==========================================
- Hits        39214    39180      -34     
+ Misses       6492     6487       -5     
  Partials     2173     2173              
Files with missing lines Coverage Δ
pytensor/gradient.py 77.57% <ø> (+0.07%) ⬆️
pytensor/graph/null_type.py 64.70% <ø> (+1.54%) ⬆️
pytensor/graph/op.py 88.08% <100.00%> (+0.06%) ⬆️
pytensor/graph/type.py 93.65% <100.00%> (-0.20%) ⬇️
pytensor/scalar/basic.py 80.52% <ø> (+0.01%) ⬆️
pytensor/tensor/random/op.py 93.51% <100.00%> (-0.21%) ⬇️
pytensor/tensor/type_other.py 74.41% <ø> (+0.26%) ⬆️
pytensor/compile/function/types.py 79.47% <90.90%> (-0.47%) ⬇️

@ricardoV94 ricardoV94 changed the title Improve overhead of function call and deprecate rarely used utilities Reduce overhead of function call and deprecate rarely used utilities Oct 10, 2024
tests/compile/function/test_types.py Outdated Show resolved Hide resolved
pytensor/compile/function/types.py Outdated Show resolved Hide resolved
# Group indexes of inputs that are potentially aliased to each other
# Note: Historically, we only worried about aliasing inputs if they belonged to the same type,
# even though there could be two distinct types that use the same kinds of underlying objects.
potential_aliased_input_groups = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this logic could be a utility function? It is reused almost identically below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is reused almost identically? Also note that I want to remove this whole logic soon: #1026

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making groups from a list of elements based on an equivalence function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe but the goal of the PR is to reduce the overhead of the function.__call__ which usually runs in a hot loop. Adding nested function calls may defeat the purpose. Perhaps I will just go ahead with #1026 and shed this whole logic.

I agree it's ugly to repeat the code

@ricardoV94 ricardoV94 force-pushed the perf_enh branch 2 times, most recently from 7a2d4aa to 2faae23 Compare October 17, 2024 12:38
@ricardoV94 ricardoV94 force-pushed the perf_enh branch 3 times, most recently from 23b4fe9 to 720ecbc Compare October 22, 2024 12:53
@ricardoV94 ricardoV94 requested a review from Armavica October 22, 2024 20:55
@ricardoV94 ricardoV94 marked this pull request as draft October 22, 2024 20:58
@ricardoV94
Copy link
Member Author

Marking as a draft because I want to add a note on the docstrings of pytensor.In about mutable and alias

@ricardoV94
Copy link
Member Author

Moved the last commit to a separate PR: #1049

Want to simmer a bit on that before

@ricardoV94 ricardoV94 requested a review from twiecki October 25, 2024 14:18
@ricardoV94 ricardoV94 merged commit 7b13a95 into pymc-devs:main Oct 29, 2024
60 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants