Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise when in place operations occur on leafs requiring grad #1458

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

beverlylytle
Copy link
Collaborator

Before submitting
  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #1284

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@beverlylytle beverlylytle marked this pull request as ready for review November 21, 2024 11:21
Copy link
Collaborator

@kshitij12345 kshitij12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good. We should add a small test to verify that this error raised when expected. Thanks @beverlylytle

thunder/tests/test_inplace_functionalization.py Outdated Show resolved Hide resolved
@@ -476,31 +476,27 @@ def f(xs, ys, z):
dtypes=NOTHING,
)
def test_inplace_to_tensors_with_grad(executor, device, _):
@torch.no_grad
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what exactly do we test for here and also why did we get rid of add_grad case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test came with the addition of the grad member variable for TensorProxy. The add_grad case is removed because I was having trouble making the test make sense for even pytorch operations and I don't see what it adds on top of the add_y case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this test should still work as is.

PyTorch allows update on leaf tensors under no_grad.

import torch

@torch.no_grad  # Commenting will lead to error
def add_y(x, y):
    x.add_(y, alpha=0.1)

x = torch.randn(3, 3, requires_grad=True)
y = torch.randn(3, 3, requires_grad=True)

add_y(x, y)

Also, I think add_grad is important as it was added in the PR that added support for grad attribute on TensorProxy: #1070, so it seems to be testing that as well. (We should probably add comment that this is happening).

@@ -2190,6 +2182,9 @@ def is_float_type(self, input):


def _copy__impl(copy_from, copy_to):
cd = get_compile_data()
if cd is not None and cd.is_grad_enabled and copy_to.is_leaf and copy_to.requires_grad:
raise RuntimeError("a leaf Variable that requires grad is being used in an in-place operation.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if Symbol copy_ in thunder/torch/__init__.py is more appropriate location for the check.

@torchsymbol(torch.Tensor.copy_, is_method=True) # , tags=(prims.OpTags.IN_PLACE,))
def copy_(a, b, /):
return prims.copy_(b, a)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a and b are proxies and it it not clear to me if a proxy knows that it is a leaf.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not. It's only a PyTorch concept that's available at runtime inside _copy__impl.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, previously I missed that the fix was in copy_impl. And since, it is happening at runtime, I am wondering if compile_data is actually available.

Quick test shows (see below) that it wouldn't be. So, we probably need a way to check if this copy was called under no_grad in users code (as PyTorch inplace of leaf tensors under no_grad, see comment).

Snippet to check if compile_data is available -

import torch
import thunder
from thunder.extend import OperatorExecutor
from thunder.core.compile_data import get_compile_data
from thunder.core.proxies import TensorProxy

ex = OperatorExecutor("ex")

def clone_impl(x):
    cd = get_compile_data()
    print(cd)  # None
    return x

clone = ex.register_operator("clone", meta=lambda x: TensorProxy(like=x), fn=clone_impl)

def fn(x):
    return clone(x)

x = torch.ones(3)

jfn = thunder.jit(fn)

jfn(x)
exec_trace = thunder.last_traces(jfn)[-1]
# print(exec_trace)

@@ -549,7 +545,8 @@ def single_tensor_adam(
ref_state_steps = [torch.tensor(1, device=device) for _ in range(2)]
single_tensor_adam(*ref_tensors, state_steps=ref_state_steps)

jitted = executor.make_callable(single_tensor_adam)
# torch.compile does not support accessing the ContextVariable compile data used in _copy__impl_
jitted = executor.make_callable(single_tensor_adam, torch_compile_fullgraph=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that torch.compile creates a graph break when calling get on ContextVariable.

import torch
from contextvars import ContextVar

_compile_data = ContextVar("compile_data", default=(None, None))

def fn(x):
    _compile_data.get()
    return x + 1

torch.compile(fn, fullgraph=False)(torch.randn(3, 3))  # Works with GraphBreak at _compile_data.get()
torch.compile(fn, fullgraph=True)(torch.randn(3, 3))  # Fails

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Thunder's Interpreter do? It probably fails

Copy link
Collaborator

@kshitij12345 kshitij12345 Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thunder just burns the value in computation trace (if used) without having a corresponding check in prologue. (Will file an issue for the same).

Eg.

import torch
import thunder
from contextvars import ContextVar

_compile_data = ContextVar("compile_data", default=1)

def fn(x):
    v = _compile_data.get()
    return x + v

jfn = thunder.jit(fn)
o = jfn(torch.ones(3,))
print(o)  # tensor([2., 2., 2.])

_compile_data.set((2,))
o = jfn(torch.ones(3,))
print(o)  # tensor([2., 2., 2.])

print(thunder.last_prologue_traces(jfn)[-1])
# @torch.no_grad()
# @no_autocast
# def prologue(*args, **kwargs):
#   # args: "Any"
#   check_len(args, 1)
#     # prims.check_len(args, 1)
#   # kwargs: "Any"
#   check_len(kwargs, 0)
#     # prims.check_len(kwargs, 0)
#   x: "cpu f32[3]" = args[0]
#   check_tensor_metadata(x, (3,), 'cpu', torch.float32, False)
#     # prims.check_tensor_shape_and_metadata(x, (3,), 'cpu', torch.float32, False)
#   cache_info: "Any" = thunder._get_cache_info()
#   cache_info_default_dtype: "<class 'torch.dtype'>" = cache_info['default_dtype']
#   check_literal_like(cache_info_default_dtype, torch.float32)
#     # prims.check_literal_like(cache_info_default_dtype, torch.float32)
#   cache_info_default_device: "<class 'torch.device'>" = cache_info['default_device']
#   check_literal_like(cache_info_default_device, torch.device("cpu"))
#     # prims.check_literal_like(cache_info_default_device, torch.device("cpu"))
#   cache_info_is_autocast_enabled: "bool False" = cache_info['is_autocast_enabled']
#   check_number_type_and_value(cache_info_is_autocast_enabled, False)
#     # prims.check_number_type_and_value(cache_info_is_autocast_enabled, False)
#   cache_info_no_grad_sync: "bool False" = cache_info['no_grad_sync']
#   check_number_type_and_value(cache_info_no_grad_sync, False)
#     # prims.check_number_type_and_value(cache_info_no_grad_sync, False)
#   cache_info_alias_tensor_indices: "str" = cache_info['alias_tensor_indices']
#   check_string_value(cache_info_alias_tensor_indices, '')
#     # prims.check_string_value(cache_info_alias_tensor_indices, '')
#   cache_info_is_grad_enabled: "bool True" = cache_info['is_grad_enabled']
#   check_number_type_and_value(cache_info_is_grad_enabled, True)
#     # prims.check_number_type_and_value(cache_info_is_grad_enabled, True)
#   return ((x,), ())

print(thunder.last_traces(jfn)[-1])
# @torch.no_grad()
# @no_autocast
# def computation(x):
#   # x: "cpu f32[3]"
#   t0 = torch.add(x, 1, alpha=1)  # t0: "cpu f32[3]"
#     # t0 = ltorch.add(x, 1, alpha=1)  # t0: "cpu f32[3]"
#       # _ = prims.convert_element_type(1, float)
#       # t0 = prims.add(x, 1.0)  # t0: "cpu f32[3]"
#   return t0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue filed at #1464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[inplace] Silently incorrect gradient when leaf variable is used in an inplace operation
3 participants