Raise when in place operations occur on leafs requiring grad #1458

beverlylytle · 2024-11-20T10:26:37Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

kshitij12345

The fix looks good. We should add a small test to verify that this error raised when expected. Thanks @beverlylytle

thunder/tests/test_inplace_functionalization.py

kshitij12345 · 2024-11-21T12:10:48Z

thunder/tests/test_inplace_functionalization.py

@@ -476,31 +476,27 @@ def f(xs, ys, z):
    dtypes=NOTHING,
 )
 def test_inplace_to_tensors_with_grad(executor, device, _):
-    @torch.no_grad


I am not sure what exactly do we test for here and also why did we get rid of add_grad case.

The test came with the addition of the grad member variable for TensorProxy. The add_grad case is removed because I was having trouble making the test make sense for even pytorch operations and I don't see what it adds on top of the add_y case.

Actually I think this test should still work as is.

PyTorch allows update on leaf tensors under no_grad.

import torch @torch.no_grad # Commenting will lead to error def add_y(x, y): x.add_(y, alpha=0.1) x = torch.randn(3, 3, requires_grad=True) y = torch.randn(3, 3, requires_grad=True) add_y(x, y)

Also, I think add_grad is important as it was added in the PR that added support for grad attribute on TensorProxy: #1070, so it seems to be testing that as well. (We should probably add comment that this is happening).

thunder/tests/test_inplace_copy.py

kshitij12345 · 2024-11-21T15:28:04Z

thunder/executors/torchex.py

@@ -2190,6 +2182,9 @@ def is_float_type(self, input):


 def _copy__impl(copy_from, copy_to):
+    cd = get_compile_data()
+    if cd is not None and cd.is_grad_enabled and copy_to.is_leaf and copy_to.requires_grad:
+        raise RuntimeError("a leaf Variable that requires grad is being used in an in-place operation.")


I am wondering if Symbol copy_ in thunder/torch/__init__.py is more appropriate location for the check.

lightning-thunder/thunder/torch/__init__.py

Lines 1961 to 1963 in 60f3ee1

@torchsymbol(torch.Tensor.copy_, is_method=True) # , tags=(prims.OpTags.IN_PLACE,))

def copy_(a, b, /):

return prims.copy_(b, a)

a and b are proxies and it it not clear to me if a proxy knows that it is a leaf.

They do not. It's only a PyTorch concept that's available at runtime inside _copy__impl.

Right, previously I missed that the fix was in copy_impl. And since, it is happening at runtime, I am wondering if compile_data is actually available.

Quick test shows (see below) that it wouldn't be. So, we probably need a way to check if this copy was called under no_grad in users code (as PyTorch inplace of leaf tensors under no_grad, see comment).

Snippet to check if compile_data is available -

import torch import thunder from thunder.extend import OperatorExecutor from thunder.core.compile_data import get_compile_data from thunder.core.proxies import TensorProxy ex = OperatorExecutor("ex") def clone_impl(x): cd = get_compile_data() print(cd) # None return x clone = ex.register_operator("clone", meta=lambda x: TensorProxy(like=x), fn=clone_impl) def fn(x): return clone(x) x = torch.ones(3) jfn = thunder.jit(fn) jfn(x) exec_trace = thunder.last_traces(jfn)[-1] # print(exec_trace)

kshitij12345 · 2024-11-21T15:29:13Z

thunder/tests/test_inplace_functionalization.py

@@ -549,7 +545,8 @@ def single_tensor_adam(
    ref_state_steps = [torch.tensor(1, device=device) for _ in range(2)]
    single_tensor_adam(*ref_tensors, state_steps=ref_state_steps)

-    jitted = executor.make_callable(single_tensor_adam)
+    # torch.compile does not support accessing the ContextVariable compile data used in _copy__impl_
+    jitted = executor.make_callable(single_tensor_adam, torch_compile_fullgraph=False)


Interesting that torch.compile creates a graph break when calling get on ContextVariable.

import torch from contextvars import ContextVar _compile_data = ContextVar("compile_data", default=(None, None)) def fn(x): _compile_data.get() return x + 1 torch.compile(fn, fullgraph=False)(torch.randn(3, 3)) # Works with GraphBreak at _compile_data.get() torch.compile(fn, fullgraph=True)(torch.randn(3, 3)) # Fails

What does Thunder's Interpreter do? It probably fails

thunder just burns the value in computation trace (if used) without having a corresponding check in prologue. (Will file an issue for the same).

Eg.

import torch import thunder from contextvars import ContextVar _compile_data = ContextVar("compile_data", default=1) def fn(x): v = _compile_data.get() return x + v jfn = thunder.jit(fn) o = jfn(torch.ones(3,)) print(o) # tensor([2., 2., 2.]) _compile_data.set((2,)) o = jfn(torch.ones(3,)) print(o) # tensor([2., 2., 2.]) print(thunder.last_prologue_traces(jfn)[-1]) # @torch.no_grad() # @no_autocast # def prologue(*args, **kwargs): # # args: "Any" # check_len(args, 1) # # prims.check_len(args, 1) # # kwargs: "Any" # check_len(kwargs, 0) # # prims.check_len(kwargs, 0) # x: "cpu f32[3]" = args[0] # check_tensor_metadata(x, (3,), 'cpu', torch.float32, False) # # prims.check_tensor_shape_and_metadata(x, (3,), 'cpu', torch.float32, False) # cache_info: "Any" = thunder._get_cache_info() # cache_info_default_dtype: "<class 'torch.dtype'>" = cache_info['default_dtype'] # check_literal_like(cache_info_default_dtype, torch.float32) # # prims.check_literal_like(cache_info_default_dtype, torch.float32) # cache_info_default_device: "<class 'torch.device'>" = cache_info['default_device'] # check_literal_like(cache_info_default_device, torch.device("cpu")) # # prims.check_literal_like(cache_info_default_device, torch.device("cpu")) # cache_info_is_autocast_enabled: "bool False" = cache_info['is_autocast_enabled'] # check_number_type_and_value(cache_info_is_autocast_enabled, False) # # prims.check_number_type_and_value(cache_info_is_autocast_enabled, False) # cache_info_no_grad_sync: "bool False" = cache_info['no_grad_sync'] # check_number_type_and_value(cache_info_no_grad_sync, False) # # prims.check_number_type_and_value(cache_info_no_grad_sync, False) # cache_info_alias_tensor_indices: "str" = cache_info['alias_tensor_indices'] # check_string_value(cache_info_alias_tensor_indices, '') # # prims.check_string_value(cache_info_alias_tensor_indices, '') # cache_info_is_grad_enabled: "bool True" = cache_info['is_grad_enabled'] # check_number_type_and_value(cache_info_is_grad_enabled, True) # # prims.check_number_type_and_value(cache_info_is_grad_enabled, True) # return ((x,), ()) print(thunder.last_traces(jfn)[-1]) # @torch.no_grad() # @no_autocast # def computation(x): # # x: "cpu f32[3]" # t0 = torch.add(x, 1, alpha=1) # t0: "cpu f32[3]" # # t0 = ltorch.add(x, 1, alpha=1) # t0: "cpu f32[3]" # # _ = prims.convert_element_type(1, float) # # t0 = prims.add(x, 1.0) # t0: "cpu f32[3]" # return t0

Issue filed at #1464

beverlylytle added 3 commits November 19, 2024 18:40

raise if is_leaf and require_grad in inplace operations

b56dd80

Merge branch 'main' into check_inplace_leafs

d79173c

restore wraps

e4ad972

mruberry requested review from kshitij12345 and IvanYashchuk November 20, 2024 14:09

beverlylytle marked this pull request as ready for review November 21, 2024 11:21

beverlylytle requested review from mruberry, lantiga and t-vi as code owners November 21, 2024 11:21

kshitij12345 reviewed Nov 21, 2024

View reviewed changes

add test and comment

551de30

kshitij12345 reviewed Nov 21, 2024

View reviewed changes

thunder/tests/test_inplace_copy.py Outdated Show resolved Hide resolved

kshitij12345 reviewed Nov 21, 2024

View reviewed changes

test thunder, not torch

a179d12

IvanYashchuk added autograd in-place labels Nov 22, 2024

kshitij12345 mentioned this pull request Nov 22, 2024

thunder may treat global (maybe nonlocal) value as constant in computation trace without a check in prologue #1464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise when in place operations occur on leafs requiring grad #1458

Raise when in place operations occur on leafs requiring grad #1458

beverlylytle commented Nov 20, 2024

kshitij12345 left a comment

kshitij12345 Nov 21, 2024

beverlylytle Nov 21, 2024

kshitij12345 Nov 21, 2024

kshitij12345 Nov 21, 2024

beverlylytle Nov 22, 2024

IvanYashchuk Nov 22, 2024

kshitij12345 Nov 22, 2024

kshitij12345 Nov 21, 2024

IvanYashchuk Nov 22, 2024

kshitij12345 Nov 22, 2024 •

edited

Loading

kshitij12345 Nov 22, 2024

	@torchsymbol(torch.Tensor.copy_, is_method=True) # , tags=(prims.OpTags.IN_PLACE,))
	def copy_(a, b, /):
	return prims.copy_(b, a)

Raise when in place operations occur on leafs requiring grad #1458

Are you sure you want to change the base?

Raise when in place operations occur on leafs requiring grad #1458

Conversation

beverlylytle commented Nov 20, 2024

What does this PR do?

PR review

Did you have fun?

kshitij12345 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kshitij12345 Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kshitij12345 Nov 22, 2024 •

edited

Loading