Speed up from_graph_tool and clean up a little #403

iosonofabio · 2021-05-28T00:32:43Z

Adding the graph-tool equivalent of #402, which originally addressed #401

I also edited @szhorvat 's code a little to avoid defaultdict. Although it's fine in principle, having an explicit list of keys makes the code easier to read I think.

Same principle applied to the from_graph_tool function. Thanks Szabolcs again for #402.

src/igraph/__init__.py

szhorvat · 2021-05-28T09:02:26Z

When converting from networkx multigraphs, would it make sense to preserve the edge keys in an extra edge attribute, to make full round-tripping less lossy?

This is important for some use cases, namely when we need to know which networkx edge corresponds to which igraph edge. E.g., when computing weighted shortest paths, it will be precisely one (and not the other) edge that is included in the path.

I guess that those who need this can already manually copy keys to an edge attribute on the networkx side. But perhaps it's nicer to do it automatically?

If we choose to do it, it would be nice to test it out through some practical use cases before settling on a solution.

src/igraph/__init__.py

iosonofabio · 2021-05-28T10:36:52Z

Thank you both. I've now gotten rid of the list(...), not sure why it was there in the first place. Yes if we don't need imports we may as well get rid of them.

I had thought about the multi-edge problem a little, but couldn't quite come up with a clean way, so I'd be happy to hear your opinions. I think that in networkx vertices must have a hashable "name", while edges are just pairs (as in 2-tuples) of the same hashables, e.g.

vertex1 -> 'hello'
vertex 2 -> 'ciao'

('hello', 'ciao') in g.edges -> True or False, depending if there's an edge there

For MultiGraphs, each edge gets a semi-secret index starting from 0. If you iterate over g.edges.data(), like we do now, that index is lost (!), while it is kept if you iterate simply over g.edges - funnily enough, it is lost if you iterate over edges(), which why can you call a dict-like structure in the first place, but ok.

So it seems to me that to store that information, we could get away with storing only the "secret index" of each edge iff the networkx structure is a MultiGraph or MultiDiGraph. The source and target vertices would provide the rest of the info.

Would you like me to add this feature into a property called "_nx_multiedge_index", in analogy with the current "_nx_name" for vertices?

ntamas · 2021-05-28T10:39:48Z

Sounds like a good idea -- however, it's not urgent and I'd like to merge this PR now so it does not deviate from the original topic. If you feel like you can put some time aside to work on the multi-edge issue, I very much appreciate the contribution, and please send a separate PR where we deal with it.

iosonofabio · 2021-05-28T10:40:43Z

yep sounds good, let's also hear from Szabolcs and the others in case they have opinions

szhorvat · 2021-05-28T11:42:45Z

I never meant it to be added in this PR, just brought up the topic.

This is not an "index" (i.e. it is not a number) but a "key". "Key" is the term networkx uses. It is also an arbitrary hashable. By default, networkx uses integers, starting from zero for the respective pair of vertices.

Useful doc links I found:

This is fundamental difference in how igraph and networkx handle multi-edges. (Mathematica does the same as networkx, but IMO in a rather flawed way.) igraph identifies edges by their numerical index, which is global to the graph. Thus, igraph has a guaranteed edge ordering. networkx identifies edges by their endpoints (as Fabio pointed out). This is insufficient to distinguish multi-edges (and be able to e.g. give them distinct attributes). To get around this, networkx adds an extra key, so MultiGraph edges are not just (u, v) pairs, but (u, v, key) triplets. I'm not 100% sure, but I think networkx doesn't have a guaranteed edge ordering at all. UPDATE: Indeed it doesn't guarantee ordering; instead it provides ordered versions of types, see https://networkx.org/documentation/stable/reference/classes/ordered.html

Currently, in from_networkx we iterate over g.edges(data=True), which returns (u, v, attrib_dict) triples for any graph type. MultiGraph and MultiDiGraph can do g.edges(data=True, keys=True), which will return (u, v, key, attrib_dict).

I agree with Fabio's proposal, except for the name. I suggest "_nx_edge_key", or if you like, "_nx_multiedge_key". The terminology that networkx itself uses is "edge key". This is also shorter to type, so I prefer it. On the other hand, "multiedge key" better describes what this key really is: it belongs not to the graph, but to the specific vertex pair (and must be unique for the vertex pair, not for the graph).

Additionally, before fully implementing this, we should try to test on real use cases. We should at least try a few small computations and make sure things work smoothly. Things to consider:

I have a weighted nx (multi)graph. I want to compute the the betweenness of edges with igraph, but otherwise I want to keep working in nx. How do I get the result in nx format? Since nx does not have a consistent edge ordering, it's not sufficient to just return a list of values, as igraph does. Instead we need a mapping (dictionary) from edges to values.
Also consider the opposite: I want to work within igraph, but I want to do one specific edge-related computation using networkx. I need a convenient way to convert the mapping that networkx might return to a form interpretable in igraph. How do we achieve this? For the sake of example, suppose that I want to compute the weighted edge betweenness, and I get a result as a mapping from (u,v,key) triplets to values. This is a dict (a mapping) so it is not ordered. How do I determine which value goes with which edge index in igraph? Perhaps the most practical way would be if to_networkx used igraph's own edge index as the networkx edge key. That makes the key unique for the entire graph (not just vertex pairs), which is overkill, but that's alright for as long as it makes the user's job easy. NOTE: nx.edge_betweenness() does not currently seem to distinguish between parallel edges, so this example is hypothetical. Networkx's support for multigraphs seems to be rather weak at this moment, and I wasn't able to find any algorithms which return edge-specific information for multigraphs. That doesn't mean that there aren't any currently and definitely doesn't mean that there won't be any in the future.
Can we do reasonably lossless round-tripping? Should to_networkx detect the presence of _nx_name and/or _nx_edge_key and restore the original vertex names and edges keys automatically? To take this thought further, should we store the original graph type (Graph, MultiGraph, including the Digraph and Ordered variants) in a graph attribute and automatically restore it as well? Yet another step further: networkx graph generators have a create_using keyword argument which controls which class to use to create the graph. We should have this in to_networkx, with some sort of "automatic" default value.

(Side note: I notice that to_networkx uses any(g.is_multiple()). Based on my knowledge of the C core, I think the efficient way is g.has_multiple(). @ntamas ?)

iosonofabio · 2021-05-28T11:52:45Z

All this is very interesting and indeed the topic of the pilot code in my own PR on transparent API. How to return data types for the practical examples you mention etc. Indeed, what you are describing is a transparent call to the igraph API from -> to a networkx object. I'd be happy to implement the additional key but also think a full support for these tires of functions is to be the goal of that PR, with some generic design in place...

szhorvat · 2021-05-28T11:52:49Z

I wrote a lot (= too much) in the above comment, here's the executive summary:

Before arriving to a final decision, we must test our ideas on practical use cases. These must work well:

Work primarily in igraph, and use a networkx function to compute node-specific or edge-specific information on my graph. I need to be able to associate the result with the appropriate igraph node or edge indices.
Work primarily in networkx and use igraph functions to compute edge specific information. I need to be able to associate the result with (u,v,key) networkx multigraph edge specifications. The same goes for node-specific information.

Specific proposals I made (and a few I didn't mention above):

to_networkx should have a create_using argument to choose which nx class it should use. All nx graph generators have this argument.
The default create_using should be "automatic" and behave just as it does right now, except use ordered graphs (see below).
Round tripping either as nx -> igraph -> nx or as igraph -> nx -> igraph should be reasonably lossless. Priority should be on igraph -> nx -> igraph as we first want to support users who work in igraph and reach out to nx.
Achieve nx -> igraph -> nx round-tripping by:
- Preserve and restore nx vertex name in igraph vertex attribute
- Preserve and restore nx multi-edge keys in igraph edge attribute
- Preserve and restore original nx graph class in igraph graph attribute
Achieve igraph -> nx -> igraph round-tripping by:
- Using ordered nx graph classes in to_networkx by default in order to be able to preserve the igraph edge index (edge ordering). We must investigate (1) if this has any downsides such as performance (2) if this fully solves the problem in practice.
The above two bullet points are not fully consistent with each other—any suggestion to reconcile them?

ntamas · 2021-05-28T11:57:39Z

This discussion should probably be moved to #373 instead of this PR; #373 is concerned with a transparent API between igraph and NetworkX (and maybe other tools). Or maybe we should open a discussion thread in our Discourse forum?

ntamas · 2021-05-28T11:58:33Z

Side note: I notice that to_networkx uses any(g.is_multiple()). Based on my knowledge of the C core, I think the efficient way is g.has_multiple().

Yes, that's correct, I'll change this in the main branch.

szhorvat · 2021-05-28T12:00:34Z

This discussion should probably be moved to #373 instead of this PR

I was just going to say the same. It would also be good to discuss it at one of the meetings, or to have a separate longer meeting where someone would help me fill the gaps in my Python knowledge, which must certainly show in my suggestions above :-)

Or maybe we should open a discussion thread in our Discourse forum?

If there's no way to move multiple message within GitHub, then that's better.

ntamas · 2021-05-28T12:03:51Z

I don't think there's a way to move comments in GitHub so let's do that then.

FYI, 0c0ebd7 adds support for g.to_networkx(create_using=...) and also replaces any(g.is_multiple()) with g.has_multiple().

szhorvat · 2021-05-28T12:10:14Z

Let us continue on Discourse:

https://igraph.discourse.group/t/interoperability-between-python-igraph-and-other-python-network-packages/766/

Speed up from_graph_tool and clean up a little

5f453da

szhorvat reviewed May 28, 2021

View reviewed changes

src/igraph/__init__.py Show resolved Hide resolved

ntamas reviewed May 28, 2021

View reviewed changes

src/igraph/__init__.py Outdated Show resolved Hide resolved

Skip list constructor

dd8c03a

ntamas merged commit 6915ff6 into igraph:master May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up from_graph_tool and clean up a little #403

Speed up from_graph_tool and clean up a little #403

iosonofabio commented May 28, 2021

szhorvat commented May 28, 2021

iosonofabio commented May 28, 2021 •

edited

Loading

ntamas commented May 28, 2021

iosonofabio commented May 28, 2021 •

edited

Loading

szhorvat commented May 28, 2021

iosonofabio commented May 28, 2021 •

edited

Loading

szhorvat commented May 28, 2021 •

edited

Loading

ntamas commented May 28, 2021

ntamas commented May 28, 2021

szhorvat commented May 28, 2021 •

edited

Loading

ntamas commented May 28, 2021

szhorvat commented May 28, 2021 •

edited

Loading

Speed up from_graph_tool and clean up a little #403

Speed up from_graph_tool and clean up a little #403

Conversation

iosonofabio commented May 28, 2021

szhorvat commented May 28, 2021

iosonofabio commented May 28, 2021 • edited Loading

ntamas commented May 28, 2021

iosonofabio commented May 28, 2021 • edited Loading

szhorvat commented May 28, 2021

iosonofabio commented May 28, 2021 • edited Loading

szhorvat commented May 28, 2021 • edited Loading

ntamas commented May 28, 2021

ntamas commented May 28, 2021

szhorvat commented May 28, 2021 • edited Loading

ntamas commented May 28, 2021

szhorvat commented May 28, 2021 • edited Loading

iosonofabio commented May 28, 2021 •

edited

Loading

iosonofabio commented May 28, 2021 •

edited

Loading

iosonofabio commented May 28, 2021 •

edited

Loading

szhorvat commented May 28, 2021 •

edited

Loading

szhorvat commented May 28, 2021 •

edited

Loading

szhorvat commented May 28, 2021 •

edited

Loading