Implement `inner` using BP #147

JoeyT1994 · 2024-03-26T18:04:54Z

This PR refactors the inner interface in ITensorNetworks.jl. Most importantly it adds algorithmic backends where "bp" and "exact" are currently supported.

For instance inner(x::AbstractITensorNetwork, A::AbstractITensorNetwork, y::AbstractITensorNetwork; alg = default_alg) and inner(x::AbstractITensorNetwork, y::AbstractITensorNetwork; alg = default_alg) will default to "bp" with a sensible partitioning if all the arguments are trees and "exact" otherwise.

The assumption in inner(...) is that the externalinds of the arguments are all aligned and thus the networks stack properly under disjoint_union and don't need mapping: although an optional kwarg link index map (which is defaulted to sim) is provided for x to avoid clashes. This is consistent with the interface for ITensors.jl at the moment.

Logarithmic versions of inner, i.e. loginner() are also provided. In the "bp" case this is neatly defined and helps to avoid overflow/ underflow. In the "exact" case it just calls log(inner(...)) so perhaps it is a bit redundant.

Testing is added via test_inner.jl and other standard tests which rely on inner will now direct to this and all pass. Additionally, tests in test_treetensornetworks will pass to this version and not the version in src/treetensornetworksabstracttreetensornetwork.jl unless the tensor network is specifically wrapped in the TTN type.

A few points of note/ things to think about resolving:

If all the elements of a tensor network x are 0 which means that norm_sqr(x::AbstractITensorNetwork) = 0 then the BP method will return NaN as it will return the ratio 0/0. Hence this is an edge case that isn't well determined by the BP code. This is because the current BP backend computes inner() via a global method. A local method is possible to define for a tree to solve this (it is how Benedikt's code in src/treetensornetworksabstracttreetensornetwork.jl works), and we could then define alg = "bp_global" and alg = "bp_local_tree" to let the user choose algorithms.
Currently inner(alg::Algorithm"exact", ...) passes to the inner_network(...) constructor which I cleaned up a bit: inner_network now forwards to flatten_networks but takes the dag on the first argument and its keyword arguments default to no flattening or combining (which are useful kwargs for "exact"). inner(alg::Algorithm"bp",...), instead, internally builds the relevant ITensorNetwork via disjoint_union. This is because the flatten_networks builder results in confusing vertex names for more than two networks and thus it is difficult to parse when trying to define a good default_partitioning.

@mtfishman @b-kloss

mtfishman · 2024-03-26T21:19:42Z

Can't the BP code detect if the norm is zero at some point? Presumably the message tensors become zero?

mtfishman · 2024-03-27T00:57:46Z

Note that I was advocating for removing hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) and check_hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) here: #144 (comment), in case that is relevant.

JoeyT1994 · 2024-03-27T12:22:59Z

Note that I was advocating for removing hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) and check_hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) here: #144 (comment), in case that is relevant.

Hmm yeah I'm not sure what the bug is here because I don't see it locally on my machine. I thought something had changed in ITensors.jl and so it was importing something that no longer exists. Let me try to find the issue.

JoeyT1994 · 2024-03-27T12:24:45Z

Can't the BP code detect if the norm is zero at some point? Presumably the message tensors become zero?

Yeah I wonder if the norm of the message tensors is zero then does that guarantee (assuming you didn't initialize the message tensors to 0) that the norm of the wave function is zero? I assume so? In that case we could have that catch in the inner(alg::Algorithm"bp", ...) code?

mtfishman · 2024-03-27T12:43:21Z

Can't the BP code detect if the norm is zero at some point? Presumably the message tensors become zero?

Yeah I wonder if the norm of the message tensors is zero then does that guarantee (assuming you didn't initialize the message tensors to 0) that the norm of the wave function is zero? I assume so? In that case we could have that catch in the inner(alg::Algorithm"bp", ...) code?

I suppose a message could end up accidentally being zero without the norm of the network being zero, however that seems unlikely and also seems like it would be a problem for BP anyway since then it would always remain zero and cause other messages to become zero during updates involving that message. I suppose a fix for that situation could be to randomize the message and see if it ends up back at zero or not. But my point is that this seems like something that we should try to handle at the level of the BP code, rather than in a higher level code like inner, since this could be a potential issue for many functions that use BP, not just inner.

mtfishman · 2024-03-27T12:52:40Z

I think probably for running BP on a norm network you can show that the only way a message can end up zero is if the network is zero or if it is initialized to zero, since each update is of the form of a CP-map.

mtfishman · 2024-03-27T16:19:15Z

Note that I was advocating for removing hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) and check_hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) here: #144 (comment), in case that is relevant.

Hmm yeah I'm not sure what the bug is here because I don't see it locally on my machine. I thought something had changed in ITensors.jl and so it was importing something that no longer exists. Let me try to find the issue.

It may be related to ITensor/ITensors.jl#1356, we moved all of the MPS/MPO code into an ITensorMPS submodule of ITensors.jl, which changed the namespaces of some internal functions, which probably included check_hascommoninds (i.e. check_hascommoninds probably isn't available from ITensors, and instead must be accessed from ITensors.ITensorMPS).

But if we have to circumvent issues around check_hascommoninds anyway, I think we may as well just remove it since I don't think it is serving a good purpose in ITensorNetworks.

JoeyT1994 · 2024-03-27T16:23:36Z

Okay I can remove it, shall I remove the calls to check_hascommoninds in the treetensornetwork code then? Presumably it is just being used for some assertions

mtfishman · 2024-03-27T16:25:15Z

Okay I can remove it, shall I remove the calls to check_hascommoninds in the treetensornetwork code then? Presumably it is just being used for some assertions

Yeah, thanks! Can you also remove hascommoninds(::typeof(siteinds), A::AbstractITensorNetwork, B::AbstractITensorNetwork) as well?

JoeyT1994 · 2024-03-27T16:26:10Z

@mtfishman actually this seems to be happening with more than just check_hascommoninds due to the ITensors.jl changes, I can try to hunt them down

mtfishman · 2024-03-27T16:38:09Z

Seems like we may as well just remove src/treetensornetworks/solvers/deprecated entirely instead of trying to update it.

mtfishman · 2024-03-27T16:41:02Z

Also src/treetensornetworks/deprecated_opsum_to_ttn.jl.

mtfishman · 2024-03-27T16:58:36Z

@JoeyT1994 reconsidering this, I'm concerned this PR is going to get messy with all of these other changes. Do you mind if I make a PR updating to the latest ITensor version, removing deprecated code, removing check_hascommoninds, etc. and then you can merge that into this PR?

JoeyT1994 · 2024-03-27T17:04:57Z

I think probably for running BP on a norm network you can show that the only way a message can end up zero is if the network is zero or if it is initialized to zero, since each update is of the form of a CP-map.

Yes you are right that if it is a CP-map a message being zero directly implies on of those two things. Perhaps in a non CP-map there is a possibility the outgoing message ends up being zero by virtue of the incoming message being a zero eigenvector of the local tensor - but that seems a weird case and likely implies the tensor contracts to 0 too.

In principle if a message is 0 then the BP code should rapidly converge to 0 in a few sweeps of the code (as all the other messages end up zero) and it implies something about the tensornetwork. So it seems to me this isn't actually something that the BP code needs to catch for and is part of the algorithm. The exception is if we are trying to normalize the message tensors, in which case we should flag an error?

examples/contract_bp.jl

src/caches/beliefpropagationcache.jl

src/inner.jl

JoeyT1994 · 2024-04-08T15:31:53Z

@mtfishman let me know if you have any further comments or if you are happy to merge

mtfishman · 2024-04-08T20:36:51Z

I'll take another look through the PR.

test/test_belief_propagation.jl

test/test_itensornetwork.jl

src/abstractitensornetwork.jl

mtfishman · 2024-04-12T01:04:43Z

@JoeyT1994 I merged #155 into main and then tried merging it into this branch and fixing conflicts but it seems like it broke some BP tests... I had to fix some issues in the tests in #155 because of changes to the normalization of certain tensor network constructors but I'm not sure if that is related.

JoeyT1994 · 2024-04-12T12:49:48Z

Okay no worries let me pull that merge to my local machine and try to resolve the test failures.

mtfishman · 2024-04-12T13:06:11Z

Thanks. That change was supposed to be pretty benign but perhaps the behavior of some of the constructors changed accidentally.

JoeyT1994 · 2024-04-12T13:31:01Z

Yeah the issue is that now scalar(alg::Algorithm"bp", tn::AbstractITensorNetwork, ...) seems to not be returning the correct answer any more. Can't quite figure out why yet.

JoeyT1994 · 2024-04-12T13:43:23Z

Okay actually I don't think its a BP issue:
The inner_network constructor (which passes to BiLinearFormNetwork and inserts an identity operator in the middle) is doing something different now than what it was, i.e.

ψ = random_tensornetwork(s; link_space=χ)
contract(inner_network(ψ, ψ)) == contract(ψ ⊗ prime(dag(ψ); sites=[]))

Now gives false.

JoeyT1994 · 2024-04-12T13:48:06Z

I see where the issue is, its the delta_network(union_all_inds(siteinds(ket), dual_site_index_map(siteinds(ket)))) constructor, it appears to be inserting matrices like:

1 0
0 0

on the vertices instead of identity matrices.

mtfishman · 2024-04-12T14:03:54Z

I think that may have been an incorrect usage of delta_network, see my messages on Slack.

JoeyT1994 added 9 commits March 21, 2024 11:04

Refactor inner interface. Add logscalar(tn) functionality

62ad944

Updated tests and examples to work with new inner interface

fe27c86

Merge remote-tracking branch 'upstream/main' into logcontract

2720491

Tree tests now pass to new inner function

1af45d8

Bug Fix.

eb3fa77

Argument rearrange DMRGx

e6e8c6f

Reinstate Tree Inner for TTN Type

c6f4dac

Format

372f9b4

Remove import of check_hascommoninds. Fix norm naming

fbe9063

Refactor flatten_networks

19a158c

Remove set nsite and ProjMPS

9971ff3

mtfishman mentioned this pull request Mar 27, 2024

Update for latest ITensors #148

Merged

JoeyT1994 added 4 commits March 27, 2024 14:14

Formatting

cb4ccc1

Modified inner and forms test for new BiLinearForm code

8efff49

Merge remote-tracking branch 'upstream/main'

5b1393f

Refactor tests to account for changes

74e5b0b