Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithms should be in terms of typeclasses not data structures #262

Open
infinity0 opened this issue May 17, 2020 · 15 comments
Open

Algorithms should be in terms of typeclasses not data structures #262

infinity0 opened this issue May 17, 2020 · 15 comments

Comments

@infinity0
Copy link

It's a bit of a shame that things like bfs and dfs are duplicated for every single AdjacencyMap representation. For example Algebra.Graph.Labelled.AdjacencyMap does not even have a DFS at the moment. They should be rewritten and unified in terms of ToGraph instead.

Of course not all the instances of ToGraph have efficient implementations of things like postSet, but that could also be improved, by splitting the class up and having Graph and AdjacencyMap implement different subsets of it, with utilities functions called, e.g. postSetSlow that provide a shim to the full set of methods. It seems like nearly all of the typical traditional graph algorithms need an efficient postSet method for traversing the graph, whereas very few of them need an efficient connect method.

@infinity0
Copy link
Author

To clarify, although ToGraph contains class methods dfs etc, the implementations are separate, though some re-use others after converting the graph into a different representation. My point is that with a single typeclass-based implementation of dfs none of this boilerplate would be necessary, including the conversions between representations.

Algorithms are generally defined in terms of the operations you can perform on the graph, not the underlying representation, so it would be a simpler design to put these basic operations in the typeclass (as is already done) but leave the algorithm out of the typeclass (as is not done, resulting in duplicate definitions and boilerplate). If an efficient version of those operations are not available, then the representation should not implement that typeclass, and then the user knows clearly that they must perform an explicit conversion into a different representation in order to run the algorithm.

@snowleopard
Copy link
Owner

@infinity0 I found that trying to squeeze graph algorithms into type classes falls apart pretty quickly, but I'm open for experiments. As an example, I had to ditch type classes from the testsuite because it was just horrible, replacing them with a somewhat less horrible data type:

data API g c where
API :: ( Arbitrary (g Int), Num (g Int), Ord (g Int), Ord (g (Int, Int))
, Ord (g (Int, Char)), Ord (g [Int]), Ord (g [Char])
, Ord (g (Int, (Int, Int))), Ord (g ((Int, Int), Int))
, Show (g Int)) =>
{ empty :: forall a. c a => g a
, vertex :: forall a. c a => a -> g a
, edge :: forall a. c a => a -> a -> g a
, overlay :: forall a. c a => g a -> g a -> g a
, connect :: forall a. c a => g a -> g a -> g a
, vertices :: forall a. c a => [a] -> g a
, edges :: forall a. c a => [(a, a)] -> g a
, overlays :: forall a. c a => [g a] -> g a
, connects :: forall a. c a => [g a] -> g a
, toGraph :: forall a. c a => g a -> G.Graph a
, foldg :: forall a. c a => forall r. r -> (a -> r) -> (r -> r -> r) -> (r -> r -> r) -> g a -> r
, isSubgraphOf :: forall a. c a => g a -> g a -> Bool
, structEq :: forall a. c a => g a -> g a -> Bool
, isEmpty :: forall a. c a => g a -> Bool
, size :: forall a. c a => g a -> Int
, hasVertex :: forall a. c a => a -> g a -> Bool
, hasEdge :: forall a. c a => a -> a -> g a -> Bool
, vertexCount :: forall a. c a => g a -> Int
, edgeCount :: forall a. c a => g a -> Int
, vertexList :: forall a. c a => g a -> [a]
, edgeList :: forall a. c a => g a -> [(a, a)]
, vertexSet :: forall a. c a => g a -> Set a
, vertexIntSet :: g Int -> IntSet
, edgeSet :: forall a. c a => g a -> Set (a, a)
, preSet :: forall a. c a => a -> g a -> Set a
, preIntSet :: Int -> g Int -> IntSet
, postSet :: forall a. c a => a -> g a -> Set a
, postIntSet :: Int -> g Int -> IntSet
, neighbours :: forall a. c a => a -> g a -> Set a
, adjacencyList :: forall a. c a => g a -> [(a, [a])]
, adjacencyMap :: forall a. c a => g a -> Map a (Set a)
, adjacencyIntMap :: g Int -> IntMap IntSet
, adjacencyMapTranspose :: forall a. c a => g a -> Map a (Set a)
, adjacencyIntMapTranspose :: g Int -> IntMap IntSet
, bfsForest :: forall a. c a => [a] -> g a -> Forest a
, bfs :: forall a. c a => [a] -> g a -> [[a]]
, dfsForest :: forall a. c a => g a -> Forest a
, dfsForestFrom :: forall a. c a => [a] -> g a -> Forest a
, dfs :: forall a. c a => [a] -> g a -> [a]
, reachable :: forall a. c a => a -> g a -> [a]
, topSort :: forall a. c a => g a -> Either (NonEmpty a) [a]
, isAcyclic :: forall a. c a => g a -> Bool
, toAdjacencyMap :: forall a. c a => g a -> AM.AdjacencyMap a
, toAdjacencyIntMap :: g Int -> AIM.AdjacencyIntMap
, toAdjacencyMapTranspose :: forall a. c a => g a -> AM.AdjacencyMap a
, toAdjacencyIntMapTranspose :: g Int -> AIM.AdjacencyIntMap
, isDfsForestOf :: forall a. c a => Forest a -> g a -> Bool
, isTopSortOf :: forall a. c a => [a] -> g a -> Bool
, path :: forall a. c a => [a] -> g a
, circuit :: forall a. c a => [a] -> g a
, clique :: forall a. c a => [a] -> g a
, biclique :: forall a. c a => [a] -> [a] -> g a
, star :: forall a. c a => a -> [a] -> g a
, stars :: forall a. c a => [(a, [a])] -> g a
, tree :: forall a. c a => Tree a -> g a
, forest :: forall a. c a => Forest a -> g a
, mesh :: forall a b. (c a, c b) => [a] -> [b] -> g (a, b)
, torus :: forall a b. (c a, c b) => [a] -> [b] -> g (a, b)
, deBruijn :: forall a. c a => Int -> [a] -> g [a]
, removeVertex :: forall a. c a => a -> g a -> g a
, removeEdge :: forall a. c a => a -> a -> g a -> g a
, replaceVertex :: forall a. c a => a -> a -> g a -> g a
, mergeVertices :: forall a. c a => (a -> Bool) -> a -> g a -> g a
, splitVertex :: forall a. c a => a -> [a] -> g a -> g a
, transpose :: forall a. c a => g a -> g a
, gmap :: forall a b. (c a, c b) => (a -> b) -> g a -> g b
, bind :: forall a b. (c a, c b) => g a -> (a -> g b) -> g b
, induce :: forall a. c a => (a -> Bool) -> g a -> g a
, induceJust :: forall a. c a => g (Maybe a) -> g a
, simplify :: forall a. c a => g a -> g a
, compose :: forall a. c a => g a -> g a -> g a
, box :: forall a b. (c a, c b) => g a -> g b -> g (a, b)
, closure :: forall a. c a => g a -> g a
, reflexiveClosure :: forall a. c a => g a -> g a
, symmetricClosure :: forall a. c a => g a -> g a
, transitiveClosure :: forall a. c a => g a -> g a
, consistent :: forall a. c a => g a -> Bool
, fromAdjacencySets :: forall a. c a => [(a, Set a)] -> g a
, fromAdjacencyIntSets :: [(Int, IntSet)] -> g Int } -> API g c

Can you propose a concrete design, porting existing implementations of dfs, bfs, topSort and scc into a type class based implementation? To me, this seems like a pretty big, difficult and not very fun task, with a high risk of failure and unclear benefits. That's why I personally do not invest time into it. But if this is something you are interesting in exploring, I'm happy to advise/give feedback.

@infinity0
Copy link
Author

infinity0 commented May 18, 2020

Thanks. I'll have a go at some point - my main motivation was to implement a max-flow-min-cost algorithm, which I could do in terms of ToGraph but then it looks out-of-place because everything else is implemented as a class-method rather than a top-level method with a class-constraint. For example fgl implements its algorithms in terms of their Graph typeclass including "more fundamental" ones like bfs/dfs.

AFAICT, and correct me if I'm wrong, but there are effectively two kinds of representations here - one based on adjacency maps, and one based on your algebraic graphs concept; however the "traditional" traversal-based algorithms tend to work better on adjacency maps. fgl uses essentially an adjacency-map representation (1, 2) which makes it fairly straightforward to implement their match function, which is part of their inductive graph API, and all their algorithms are implemented in terms of that.

I actually think that (at least conceptually) both these libraries could be merged together, and then we could have functions converting between algebraic vs inductive APIs, with concrete representations implementing only one of the APIs. Certain algorithms inherently work better on one or the other kind, and should only assume one of the constraints. Users that have a representation not implementing a constraint, must then explicitly call a conversion function, to convert it into another representation with the right constraint.

One design aspect mentioned in the paper is that you thought fgl's partial functions were not great, which I agree, but I think that could be improved straightforwardly by making them either check for the precondition, or give a no-op, or be idempotent, or something along those lines - just like how Overlay is a no-op for certain cases rather than failing.

@snowleopard
Copy link
Owner

Thanks!

AFAICT, and correct me if I'm wrong, but there are effectively two kinds of representations here - one based on adjacency maps, and one based on your algebraic graphs concept; however the "traditional" traversal-based algorithms tend to work better on adjacency maps.

Yes. However, by combining the algebraic representation with sparsify and classic algorithms you can process some dense graphs very efficiently, in the time and memory proportional to the size of their algebraic representation, which can be quadratically smaller than the number of edges.

As of now, there are no algorithms for solving even basic problems like DFS/BFS directly on the algebraic representation, but I'm (slowly) working towards that goal.

One suggestion I have is to not build on top of ToGraph because it has some methods related to the algebraic representation: "Graph" in the name stands for the Graph data type from Algebra.Graph. Perhaps, you want something like Data.Graph.Traversal where you will provide an interface with a few primitives that are required for classic graph algorithms. It will have nothing to do with algebraic graphs and will therefore live in the Data.Graph namespace.

When comparing to fgl you should keep in mind that the type of vertices is fixed to Int internally, which simplifies some things. In this library, my goal was to provide a parametric interface, which makes it difficult to reconcile parametric representations from non-parametric like AdjacencyIntMap. I hope you'll manage to come up with an interface that can support efficient algorithms for both.

@dataopt
Copy link

dataopt commented Aug 31, 2022

Has anyone thought more about this? I find this issue quite interesting.

@snowleopard
Copy link
Owner

I'm not aware of any progress in the context of this library.

I remain a little sceptical of the goal of writing graph algorithms against an abstract interface because implementations of this interface differ dramatically in terms of the complexity of individual methods.

Having said that, I'm interested in seeing a proof of concept!

@dataopt
Copy link

dataopt commented Sep 1, 2022

I will be giving this some more thought. I am hoping that there is some sort of Haskell graph algorithms library similar to what LEDA offers. Maybe an abstract interface that comes with implementation hints/constraints (e.g. O(1) lookup for certain things) could be explored.

@snowleopard
Copy link
Owner

@dataopt Are you saying that LEDA implements graph algorithms against an abstract graph interface? If yes, could you point to an example?

@dataopt
Copy link

dataopt commented Sep 2, 2022

I haven't expressed myself clearly. LEDA has its own graph representation. What I meant to say was to be able to have an interface for the graph algorithms that LEDA offers against an abstract graph interface in Haskell. (Not sure if that's even possible.) In the meantime, I came across this old Haskell wiki article that I have not seen before. I'm also trying to decipher this master's thesis which mentions Alga.

@snowleopard
Copy link
Owner

snowleopard commented Sep 3, 2022

Got it. Thanks for the links, they are new to me too!

(Oops, actually, I did come across the master thesis before but never had a chance to read it in more detail. Will try this time!)

@dataopt
Copy link

dataopt commented Sep 3, 2022

I could be missing some important nuances but I am just thinking out loud in the following.

(Aside: I prefer to use the definitions of graphs and directed graphs in Bondy and Murty where a directed graph is a triple (V, E, f) where f is a function that maps each element of E (i.e. an edge) to an ordered pair (u,w) with u and w being elements of V. This definition permits parallel edges which could be helpful in certain situations.)

It seems to me that a reasonably comprehensive interface for graphs must provide at the minimum support for vertex and edge labels and the ability to add/remove vertices and edges as well as query incident edges given a vertex. With these, all sorts of graph algorithms can be implemented. Whether such implementations will be efficient is of course the million-dollar question.

As for efficiency, I am wondering if some sort of state (like a StatefulGraph type class?) can be used for memoizing expensive operations. For example, if the underlying graph representation doesn't use an adjacency list so that querying incident edges is an expensive operation, the stateful graph could "cache" the result of such a query. This approach saves one from converting between representations up front. Clearly, there are a lot of technical challenges to overcome with this approach.

@dataopt
Copy link

dataopt commented Sep 8, 2022

Looks like there is a library of graph algorithms that actually don't need a concrete definition of a graph: https://hackage.haskell.org/package/search-algorithms-0.3.2/docs/Algorithm-Search.html

@snowleopard
Copy link
Owner

@dataopt This all makes sense to me. I agree that it's interesting to do some experiments in this space. (If only I had time myself!)

I do like the idea behind the search-algorithms library but my understanding is that it doesn't do the caching tricks you described, so one essentially needs to use an efficient data structure to implement various functional queries. And I don't think there is any hope to make that library work efficiently with the algebraic graph representation, for instance, because the cost model is just too different.

@dataopt
Copy link

dataopt commented Sep 15, 2022

@snowleopard BTW, is there any connection between your work on algebraic graphs and GraphBLAS?

@snowleopard
Copy link
Owner

I don't think there is any connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants