Skip to content

Speed up from_graph_tool and clean up a little #403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2021

Conversation

iosonofabio
Copy link
Member

Adding the graph-tool equivalent of #402, which originally addressed #401

I also edited @szhorvat 's code a little to avoid defaultdict. Although it's fine in principle, having an explicit list of keys makes the code easier to read I think.

Same principle applied to the from_graph_tool function. Thanks Szabolcs again for #402.

@szhorvat
Copy link
Member

When converting from networkx multigraphs, would it make sense to preserve the edge keys in an extra edge attribute, to make full round-tripping less lossy?

This is important for some use cases, namely when we need to know which networkx edge corresponds to which igraph edge. E.g., when computing weighted shortest paths, it will be precisely one (and not the other) edge that is included in the path.

I guess that those who need this can already manually copy keys to an edge attribute on the networkx side. But perhaps it's nicer to do it automatically?

If we choose to do it, it would be nice to test it out through some practical use cases before settling on a solution.

@iosonofabio
Copy link
Member Author

iosonofabio commented May 28, 2021

Thank you both. I've now gotten rid of the list(...), not sure why it was there in the first place. Yes if we don't need imports we may as well get rid of them.

I had thought about the multi-edge problem a little, but couldn't quite come up with a clean way, so I'd be happy to hear your opinions. I think that in networkx vertices must have a hashable "name", while edges are just pairs (as in 2-tuples) of the same hashables, e.g.

vertex1 -> 'hello'
vertex 2 -> 'ciao'

('hello', 'ciao') in g.edges -> True or False, depending if there's an edge there

For MultiGraphs, each edge gets a semi-secret index starting from 0. If you iterate over g.edges.data(), like we do now, that index is lost (!), while it is kept if you iterate simply over g.edges - funnily enough, it is lost if you iterate over edges(), which why can you call a dict-like structure in the first place, but ok.

So it seems to me that to store that information, we could get away with storing only the "secret index" of each edge iff the networkx structure is a MultiGraph or MultiDiGraph. The source and target vertices would provide the rest of the info.

Would you like me to add this feature into a property called "_nx_multiedge_index", in analogy with the current "_nx_name" for vertices?

@ntamas
Copy link
Member

ntamas commented May 28, 2021

Sounds like a good idea -- however, it's not urgent and I'd like to merge this PR now so it does not deviate from the original topic. If you feel like you can put some time aside to work on the multi-edge issue, I very much appreciate the contribution, and please send a separate PR where we deal with it.

@ntamas ntamas merged commit 6915ff6 into igraph:master May 28, 2021
@iosonofabio
Copy link
Member Author

iosonofabio commented May 28, 2021

yep sounds good, let's also hear from Szabolcs and the others in case they have opinions

@szhorvat
Copy link
Member

I never meant it to be added in this PR, just brought up the topic.

This is not an "index" (i.e. it is not a number) but a "key". "Key" is the term networkx uses. It is also an arbitrary hashable. By default, networkx uses integers, starting from zero for the respective pair of vertices.

Useful doc links I found:

This is fundamental difference in how igraph and networkx handle multi-edges. (Mathematica does the same as networkx, but IMO in a rather flawed way.) igraph identifies edges by their numerical index, which is global to the graph. Thus, igraph has a guaranteed edge ordering. networkx identifies edges by their endpoints (as Fabio pointed out). This is insufficient to distinguish multi-edges (and be able to e.g. give them distinct attributes). To get around this, networkx adds an extra key, so MultiGraph edges are not just (u, v) pairs, but (u, v, key) triplets. I'm not 100% sure, but I think networkx doesn't have a guaranteed edge ordering at all. UPDATE: Indeed it doesn't guarantee ordering; instead it provides ordered versions of types, see https://networkx.org/documentation/stable/reference/classes/ordered.html

Currently, in from_networkx we iterate over g.edges(data=True), which returns (u, v, attrib_dict) triples for any graph type. MultiGraph and MultiDiGraph can do g.edges(data=True, keys=True), which will return (u, v, key, attrib_dict).


I agree with Fabio's proposal, except for the name. I suggest "_nx_edge_key", or if you like, "_nx_multiedge_key". The terminology that networkx itself uses is "edge key". This is also shorter to type, so I prefer it. On the other hand, "multiedge key" better describes what this key really is: it belongs not to the graph, but to the specific vertex pair (and must be unique for the vertex pair, not for the graph).

Additionally, before fully implementing this, we should try to test on real use cases. We should at least try a few small computations and make sure things work smoothly. Things to consider:

  • I have a weighted nx (multi)graph. I want to compute the the betweenness of edges with igraph, but otherwise I want to keep working in nx. How do I get the result in nx format? Since nx does not have a consistent edge ordering, it's not sufficient to just return a list of values, as igraph does. Instead we need a mapping (dictionary) from edges to values.

  • Also consider the opposite: I want to work within igraph, but I want to do one specific edge-related computation using networkx. I need a convenient way to convert the mapping that networkx might return to a form interpretable in igraph. How do we achieve this? For the sake of example, suppose that I want to compute the weighted edge betweenness, and I get a result as a mapping from (u,v,key) triplets to values. This is a dict (a mapping) so it is not ordered. How do I determine which value goes with which edge index in igraph? Perhaps the most practical way would be if to_networkx used igraph's own edge index as the networkx edge key. That makes the key unique for the entire graph (not just vertex pairs), which is overkill, but that's alright for as long as it makes the user's job easy. NOTE: nx.edge_betweenness() does not currently seem to distinguish between parallel edges, so this example is hypothetical. Networkx's support for multigraphs seems to be rather weak at this moment, and I wasn't able to find any algorithms which return edge-specific information for multigraphs. That doesn't mean that there aren't any currently and definitely doesn't mean that there won't be any in the future.

  • Can we do reasonably lossless round-tripping? Should to_networkx detect the presence of _nx_name and/or _nx_edge_key and restore the original vertex names and edges keys automatically? To take this thought further, should we store the original graph type (Graph, MultiGraph, including the Digraph and Ordered variants) in a graph attribute and automatically restore it as well? Yet another step further: networkx graph generators have a create_using keyword argument which controls which class to use to create the graph. We should have this in to_networkx, with some sort of "automatic" default value.

(Side note: I notice that to_networkx uses any(g.is_multiple()). Based on my knowledge of the C core, I think the efficient way is g.has_multiple(). @ntamas ?)

@iosonofabio
Copy link
Member Author

iosonofabio commented May 28, 2021

All this is very interesting and indeed the topic of the pilot code in my own PR on transparent API. How to return data types for the practical examples you mention etc. Indeed, what you are describing is a transparent call to the igraph API from -> to a networkx object. I'd be happy to implement the additional key but also think a full support for these tires of functions is to be the goal of that PR, with some generic design in place...

@szhorvat
Copy link
Member

szhorvat commented May 28, 2021

I wrote a lot (= too much) in the above comment, here's the executive summary:

Before arriving to a final decision, we must test our ideas on practical use cases. These must work well:

  • Work primarily in igraph, and use a networkx function to compute node-specific or edge-specific information on my graph. I need to be able to associate the result with the appropriate igraph node or edge indices.
  • Work primarily in networkx and use igraph functions to compute edge specific information. I need to be able to associate the result with (u,v,key) networkx multigraph edge specifications. The same goes for node-specific information.

Specific proposals I made (and a few I didn't mention above):

  • to_networkx should have a create_using argument to choose which nx class it should use. All nx graph generators have this argument.
  • The default create_using should be "automatic" and behave just as it does right now, except use ordered graphs (see below).
  • Round tripping either as nx -> igraph -> nx or as igraph -> nx -> igraph should be reasonably lossless. Priority should be on igraph -> nx -> igraph as we first want to support users who work in igraph and reach out to nx.
  • Achieve nx -> igraph -> nx round-tripping by:
    • Preserve and restore nx vertex name in igraph vertex attribute
    • Preserve and restore nx multi-edge keys in igraph edge attribute
    • Preserve and restore original nx graph class in igraph graph attribute
  • Achieve igraph -> nx -> igraph round-tripping by:
    • Using ordered nx graph classes in to_networkx by default in order to be able to preserve the igraph edge index (edge ordering). We must investigate (1) if this has any downsides such as performance (2) if this fully solves the problem in practice.
  • The above two bullet points are not fully consistent with each other—any suggestion to reconcile them?

@ntamas
Copy link
Member

ntamas commented May 28, 2021

This discussion should probably be moved to #373 instead of this PR; #373 is concerned with a transparent API between igraph and NetworkX (and maybe other tools). Or maybe we should open a discussion thread in our Discourse forum?

@ntamas
Copy link
Member

ntamas commented May 28, 2021

Side note: I notice that to_networkx uses any(g.is_multiple()). Based on my knowledge of the C core, I think the efficient way is g.has_multiple().

Yes, that's correct, I'll change this in the main branch.

@szhorvat
Copy link
Member

szhorvat commented May 28, 2021

This discussion should probably be moved to #373 instead of this PR

I was just going to say the same. It would also be good to discuss it at one of the meetings, or to have a separate longer meeting where someone would help me fill the gaps in my Python knowledge, which must certainly show in my suggestions above :-)

Or maybe we should open a discussion thread in our Discourse forum?

If there's no way to move multiple message within GitHub, then that's better.

@ntamas
Copy link
Member

ntamas commented May 28, 2021

I don't think there's a way to move comments in GitHub so let's do that then.

FYI, 0c0ebd7 adds support for g.to_networkx(create_using=...) and also replaces any(g.is_multiple()) with g.has_multiple().

@szhorvat
Copy link
Member

szhorvat commented May 28, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants