Tensor Derivatives #31

thomasahle · 2023-10-04T01:25:58Z

I would like to write an article about tensor derivatives, such as this derivative of the Hessian chain rule:

Is there already an article about this that I should contribute to? Or should I start one from scratch?
Also, I'm not sure if my notation of using x -> f for function applications, f(x), and -- for tensor contractions A -- x is standard. If there's a better notation, I can switch it out.

emstoudenmire · 2023-10-06T00:17:21Z

Hi Thomas, thanks for suggesting a contribution. I'm interested, but have a few questions about this topic and the notation. Mainly:

do you know if this notation relates to Penrose tensor diagram notation and how? I'm open to having other diagrammatic notations on the site, but I'm not familiar with particular one so I'd like to understand it better.
it's not totally clear to me that this computation is primarily about tensors i.e. linear functions on vector spaces. While derivatives are related to tensors, but the original object here being differentiated looks here to be the composition of two general functions. Could you please explain more about which objects here are tensors?

Oh I just saw what you wrote at the end. Correct, the notation used here is not standard in the tensor network field (meaning the one in quantum physics and in applied math) though papers in that field do fairly often introduce non-standard notations as long as they are clearly defined. Happy to discuss more.

thomasahle · 2023-10-06T00:32:01Z

The derivative notation (with circles around tensors) is directly from Penrose: https://en.wikipedia.org/wiki/Penrose_graphical_notation#Covariant_derivative
I think tensor networks are the only good way to write up the hessian chain rule. It's possible to do that without anything but tensors, since it's simply a mix of (higher order) hessians and jacobians. See e.g. Yaroslaw's work on optimizing the contraction of these tensors: https://community.wolfram.com/groups/-/m/t/2437093 .

However to actually derive the tensors using the chain rule, I think you need to show the function application as well, which is why I added them to my notation.
If you know / can think of any better way to do this, I would consider it a great win!

thomasahle · 2023-10-06T00:33:23Z

I have a bunch more examples of derivations using this notation here:
TensorDerivatives.pdf
Though it is not so well documented at this point.

emstoudenmire · 2023-10-06T01:11:25Z

I see, interesting. Ok I'm convinced then that this material does fit with the site. Here are some requests about the writeup:

Would you please add a little bit to the page about tensor diagram notation explaining the derivative notation i.e. just how it works and what the extra index is for? (I got it well now from the Wikipedia page. Good to know that was also by Penrose, and I did know a bit about birdtracks too.)
When you write it up, could you please add some brief context of why taking the gradient of these particular functions comes up, and in what fields or applications? For example, it's not clear to me off hand why A(x)_ij x_j is a common type of function pattern that one wishes to study. Does it come from general relativity?

Lastly, you might like this recent article by some people in my field. I'm sure it's rediscovering some things in the more introductory part of the article, but by the end they pull off some impressive calculations. I think the notation there is related, but with thick lines representing plugging in continuous variables instead of lines with arrows at the end:
https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.5.013156

emstoudenmire · 2023-10-06T01:13:36Z

Oh and, no there's not an article about this topic so you should start one from scratch. Feel free to make a new section of the site, though we could discuss how to organize it.

I'm open to however you want to make the figures. Usually I make mine in the Keynote presentation software, using a border size of 4 pixels for the shapes and lines, and then I just take screenshots to make the images. Primitive I know, but just thought I'd share that. I'm hoping in the future for some tensor diagramming software that will also generate high-quality images as output.

thomasahle · 2023-10-16T17:29:11Z

One challenge I'm having is what notation to use for function application.
In the above diagram I used arrows along tensor dimensions (instead of simple edges), but sometimes you may want to take a function of a scalar. Like the division in softmax, softmax(x) = exp(x)/sum(exp(x)).
This is causing me trouble, because a tensor graph, that represents a scalar, doesn't have any free edges.
I could just put an arrow coming out of some arbitrary node, but that seems confusing. I could also put a circle around the graph, and have an arrow coming out of that. Any other ideas?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Derivatives #31

Tensor Derivatives #31

thomasahle commented Oct 4, 2023 •

edited

Loading

emstoudenmire commented Oct 6, 2023

thomasahle commented Oct 6, 2023

thomasahle commented Oct 6, 2023

emstoudenmire commented Oct 6, 2023

emstoudenmire commented Oct 6, 2023

thomasahle commented Oct 16, 2023

Tensor Derivatives #31

Tensor Derivatives #31

Comments

thomasahle commented Oct 4, 2023 • edited Loading

emstoudenmire commented Oct 6, 2023

thomasahle commented Oct 6, 2023

thomasahle commented Oct 6, 2023

emstoudenmire commented Oct 6, 2023

emstoudenmire commented Oct 6, 2023

thomasahle commented Oct 16, 2023

thomasahle commented Oct 4, 2023 •

edited

Loading