Algorithms should be in terms of typeclasses not data structures #262

infinity0 · 2020-05-17T20:47:10Z

It's a bit of a shame that things like bfs and dfs are duplicated for every single AdjacencyMap representation. For example Algebra.Graph.Labelled.AdjacencyMap does not even have a DFS at the moment. They should be rewritten and unified in terms of ToGraph instead.

Of course not all the instances of ToGraph have efficient implementations of things like postSet, but that could also be improved, by splitting the class up and having Graph and AdjacencyMap implement different subsets of it, with utilities functions called, e.g. postSetSlow that provide a shim to the full set of methods. It seems like nearly all of the typical traditional graph algorithms need an efficient postSet method for traversing the graph, whereas very few of them need an efficient connect method.

The text was updated successfully, but these errors were encountered:

infinity0 · 2020-05-17T21:20:50Z

To clarify, although ToGraph contains class methods dfs etc, the implementations are separate, though some re-use others after converting the graph into a different representation. My point is that with a single typeclass-based implementation of dfs none of this boilerplate would be necessary, including the conversions between representations.

Algorithms are generally defined in terms of the operations you can perform on the graph, not the underlying representation, so it would be a simpler design to put these basic operations in the typeclass (as is already done) but leave the algorithm out of the typeclass (as is not done, resulting in duplicate definitions and boilerplate). If an efficient version of those operations are not available, then the representation should not implement that typeclass, and then the user knows clearly that they must perform an explicit conversion into a different representation in order to run the algorithm.

snowleopard · 2020-05-18T00:40:54Z

@infinity0 I found that trying to squeeze graph algorithms into type classes falls apart pretty quickly, but I'm open for experiments. As an example, I had to ditch type classes from the testsuite because it was just horrible, replacing them with a somewhat less horrible data type:

alga/test/Algebra/Graph/Test/API.hs

Lines 61 to 144 in b9714bb

    
           data API g c where 
        
               API :: ( Arbitrary (g Int), Num (g Int), Ord (g Int), Ord (g (Int, Int)) 
        
                      , Ord (g (Int, Char)), Ord (g [Int]), Ord (g [Char]) 
        
                      , Ord (g (Int, (Int, Int))), Ord (g ((Int, Int), Int)) 
        
                      , Show (g Int)) => 
        
                   { empty                      :: forall a. c a => g a 
        
                   , vertex                     :: forall a. c a => a -> g a 
        
                   , edge                       :: forall a. c a => a -> a -> g a 
        
                   , overlay                    :: forall a. c a => g a -> g a -> g a 
        
                   , connect                    :: forall a. c a => g a -> g a -> g a 
        
                   , vertices                   :: forall a. c a => [a] -> g a 
        
                   , edges                      :: forall a. c a => [(a, a)] -> g a 
        
                   , overlays                   :: forall a. c a => [g a] -> g a 
        
                   , connects                   :: forall a. c a => [g a] -> g a 
        
                   , toGraph                    :: forall a. c a => g a -> G.Graph a 
        
                   , foldg                      :: forall a. c a => forall r. r -> (a -> r) -> (r -> r -> r) -> (r -> r -> r) -> g a -> r 
        
                   , isSubgraphOf               :: forall a. c a => g a -> g a -> Bool 
        
                   , structEq                   :: forall a. c a => g a -> g a -> Bool 
        
                   , isEmpty                    :: forall a. c a => g a -> Bool 
        
                   , size                       :: forall a. c a => g a -> Int 
        
                   , hasVertex                  :: forall a. c a => a -> g a -> Bool 
        
                   , hasEdge                    :: forall a. c a => a -> a -> g a -> Bool 
        
                   , vertexCount                :: forall a. c a => g a -> Int 
        
                   , edgeCount                  :: forall a. c a => g a -> Int 
        
                   , vertexList                 :: forall a. c a => g a -> [a] 
        
                   , edgeList                   :: forall a. c a => g a -> [(a, a)] 
        
                   , vertexSet                  :: forall a. c a => g a -> Set a 
        
                   , vertexIntSet               :: g Int -> IntSet 
        
                   , edgeSet                    :: forall a. c a => g a -> Set (a, a) 
        
                   , preSet                     :: forall a. c a => a -> g a -> Set a 
        
                   , preIntSet                  :: Int -> g Int -> IntSet 
        
                   , postSet                    :: forall a. c a => a -> g a -> Set a 
        
                   , postIntSet                 :: Int -> g Int -> IntSet 
        
                   , neighbours                 :: forall a. c a => a -> g a -> Set a 
        
                   , adjacencyList              :: forall a. c a => g a -> [(a, [a])] 
        
                   , adjacencyMap               :: forall a. c a => g a -> Map a (Set a) 
        
                   , adjacencyIntMap            :: g Int -> IntMap IntSet 
        
                   , adjacencyMapTranspose      :: forall a. c a => g a -> Map a (Set a) 
        
                   , adjacencyIntMapTranspose   :: g Int -> IntMap IntSet 
        
                   , bfsForest                  :: forall a. c a => [a] -> g a -> Forest a 
        
                   , bfs                        :: forall a. c a => [a] -> g a -> [[a]] 
        
                   , dfsForest                  :: forall a. c a => g a -> Forest a 
        
                   , dfsForestFrom              :: forall a. c a => [a] -> g a -> Forest a 
        
                   , dfs                        :: forall a. c a => [a] -> g a -> [a] 
        
                   , reachable                  :: forall a. c a => a -> g a -> [a] 
        
                   , topSort                    :: forall a. c a => g a -> Either (NonEmpty a) [a] 
        
                   , isAcyclic                  :: forall a. c a => g a -> Bool 
        
                   , toAdjacencyMap             :: forall a. c a => g a -> AM.AdjacencyMap a 
        
                   , toAdjacencyIntMap          :: g Int -> AIM.AdjacencyIntMap 
        
                   , toAdjacencyMapTranspose    :: forall a. c a => g a -> AM.AdjacencyMap a 
        
                   , toAdjacencyIntMapTranspose :: g Int -> AIM.AdjacencyIntMap 
        
                   , isDfsForestOf              :: forall a. c a => Forest a -> g a -> Bool 
        
                   , isTopSortOf                :: forall a. c a => [a] -> g a -> Bool 
        
                   , path                       :: forall a. c a => [a] -> g a 
        
                   , circuit                    :: forall a. c a => [a] -> g a 
        
                   , clique                     :: forall a. c a => [a] -> g a 
        
                   , biclique                   :: forall a. c a => [a] -> [a] -> g a 
        
                   , star                       :: forall a. c a => a -> [a] -> g a 
        
                   , stars                      :: forall a. c a => [(a, [a])] -> g a 
        
                   , tree                       :: forall a. c a => Tree a -> g a 
        
                   , forest                     :: forall a. c a => Forest a -> g a 
        
                   , mesh                       :: forall a b. (c a, c b) => [a] -> [b] -> g (a, b) 
        
                   , torus                      :: forall a b. (c a, c b) => [a] -> [b] -> g (a, b) 
        
                   , deBruijn                   :: forall a. c a => Int -> [a] -> g [a] 
        
                   , removeVertex               :: forall a. c a => a -> g a -> g a 
        
                   , removeEdge                 :: forall a. c a => a -> a -> g a -> g a 
        
                   , replaceVertex              :: forall a. c a => a -> a -> g a -> g a 
        
                   , mergeVertices              :: forall a. c a => (a -> Bool) -> a -> g a -> g a 
        
                   , splitVertex                :: forall a. c a => a -> [a] -> g a -> g a 
        
                   , transpose                  :: forall a. c a => g a -> g a 
        
                   , gmap                       :: forall a b. (c a, c b) => (a -> b) -> g a -> g b 
        
                   , bind                       :: forall a b. (c a, c b) => g a -> (a -> g b) -> g b 
        
                   , induce                     :: forall a. c a => (a -> Bool) -> g a -> g a 
        
                   , induceJust                 :: forall a. c a => g (Maybe a) -> g a 
        
                   , simplify                   :: forall a. c a => g a -> g a 
        
                   , compose                    :: forall a. c a => g a -> g a -> g a 
        
                   , box                        :: forall a b. (c a, c b) => g a -> g b -> g (a, b) 
        
                   , closure                    :: forall a. c a => g a -> g a 
        
                   , reflexiveClosure           :: forall a. c a => g a -> g a 
        
                   , symmetricClosure           :: forall a. c a => g a -> g a 
        
                   , transitiveClosure          :: forall a. c a => g a -> g a 
        
                   , consistent                 :: forall a. c a => g a -> Bool 
        
                   , fromAdjacencySets          :: forall a. c a => [(a, Set a)] -> g a 
        
                   , fromAdjacencyIntSets       :: [(Int, IntSet)] -> g Int } -> API g c

Can you propose a concrete design, porting existing implementations of dfs, bfs, topSort and scc into a type class based implementation? To me, this seems like a pretty big, difficult and not very fun task, with a high risk of failure and unclear benefits. That's why I personally do not invest time into it. But if this is something you are interesting in exploring, I'm happy to advise/give feedback.

infinity0 · 2020-05-18T12:41:48Z

Thanks. I'll have a go at some point - my main motivation was to implement a max-flow-min-cost algorithm, which I could do in terms of ToGraph but then it looks out-of-place because everything else is implemented as a class-method rather than a top-level method with a class-constraint. For example fgl implements its algorithms in terms of their Graph typeclass including "more fundamental" ones like bfs/dfs.

AFAICT, and correct me if I'm wrong, but there are effectively two kinds of representations here - one based on adjacency maps, and one based on your algebraic graphs concept; however the "traditional" traversal-based algorithms tend to work better on adjacency maps. fgl uses essentially an adjacency-map representation (1, 2) which makes it fairly straightforward to implement their match function, which is part of their inductive graph API, and all their algorithms are implemented in terms of that.

I actually think that (at least conceptually) both these libraries could be merged together, and then we could have functions converting between algebraic vs inductive APIs, with concrete representations implementing only one of the APIs. Certain algorithms inherently work better on one or the other kind, and should only assume one of the constraints. Users that have a representation not implementing a constraint, must then explicitly call a conversion function, to convert it into another representation with the right constraint.

One design aspect mentioned in the paper is that you thought fgl's partial functions were not great, which I agree, but I think that could be improved straightforwardly by making them either check for the precondition, or give a no-op, or be idempotent, or something along those lines - just like how Overlay is a no-op for certain cases rather than failing.

snowleopard · 2020-05-19T08:47:43Z

Thanks!

AFAICT, and correct me if I'm wrong, but there are effectively two kinds of representations here - one based on adjacency maps, and one based on your algebraic graphs concept; however the "traditional" traversal-based algorithms tend to work better on adjacency maps.

Yes. However, by combining the algebraic representation with sparsify and classic algorithms you can process some dense graphs very efficiently, in the time and memory proportional to the size of their algebraic representation, which can be quadratically smaller than the number of edges.

As of now, there are no algorithms for solving even basic problems like DFS/BFS directly on the algebraic representation, but I'm (slowly) working towards that goal.

One suggestion I have is to not build on top of ToGraph because it has some methods related to the algebraic representation: "Graph" in the name stands for the Graph data type from Algebra.Graph. Perhaps, you want something like Data.Graph.Traversal where you will provide an interface with a few primitives that are required for classic graph algorithms. It will have nothing to do with algebraic graphs and will therefore live in the Data.Graph namespace.

When comparing to fgl you should keep in mind that the type of vertices is fixed to Int internally, which simplifies some things. In this library, my goal was to provide a parametric interface, which makes it difficult to reconcile parametric representations from non-parametric like AdjacencyIntMap. I hope you'll manage to come up with an interface that can support efficient algorithms for both.

dataopt · 2022-08-31T20:10:45Z

Has anyone thought more about this? I find this issue quite interesting.

snowleopard · 2022-08-31T22:42:17Z

I'm not aware of any progress in the context of this library.

I remain a little sceptical of the goal of writing graph algorithms against an abstract interface because implementations of this interface differ dramatically in terms of the complexity of individual methods.

Having said that, I'm interested in seeing a proof of concept!

dataopt · 2022-09-01T10:57:30Z

I will be giving this some more thought. I am hoping that there is some sort of Haskell graph algorithms library similar to what LEDA offers. Maybe an abstract interface that comes with implementation hints/constraints (e.g. O(1) lookup for certain things) could be explored.

snowleopard · 2022-09-02T11:49:51Z

@dataopt Are you saying that LEDA implements graph algorithms against an abstract graph interface? If yes, could you point to an example?

dataopt · 2022-09-02T12:24:24Z

I haven't expressed myself clearly. LEDA has its own graph representation. What I meant to say was to be able to have an interface for the graph algorithms that LEDA offers against an abstract graph interface in Haskell. (Not sure if that's even possible.) In the meantime, I came across this old Haskell wiki article that I have not seen before. I'm also trying to decipher this master's thesis which mentions Alga.

snowleopard · 2022-09-03T09:34:06Z

Got it. Thanks for the links, they are new to me too!

(Oops, actually, I did come across the master thesis before but never had a chance to read it in more detail. Will try this time!)

dataopt · 2022-09-03T12:22:40Z

I could be missing some important nuances but I am just thinking out loud in the following.

(Aside: I prefer to use the definitions of graphs and directed graphs in Bondy and Murty where a directed graph is a triple (V, E, f) where f is a function that maps each element of E (i.e. an edge) to an ordered pair (u,w) with u and w being elements of V. This definition permits parallel edges which could be helpful in certain situations.)

It seems to me that a reasonably comprehensive interface for graphs must provide at the minimum support for vertex and edge labels and the ability to add/remove vertices and edges as well as query incident edges given a vertex. With these, all sorts of graph algorithms can be implemented. Whether such implementations will be efficient is of course the million-dollar question.

As for efficiency, I am wondering if some sort of state (like a StatefulGraph type class?) can be used for memoizing expensive operations. For example, if the underlying graph representation doesn't use an adjacency list so that querying incident edges is an expensive operation, the stateful graph could "cache" the result of such a query. This approach saves one from converting between representations up front. Clearly, there are a lot of technical challenges to overcome with this approach.

dataopt · 2022-09-08T21:26:01Z

Looks like there is a library of graph algorithms that actually don't need a concrete definition of a graph: https://hackage.haskell.org/package/search-algorithms-0.3.2/docs/Algorithm-Search.html

snowleopard · 2022-09-08T23:55:09Z

@dataopt This all makes sense to me. I agree that it's interesting to do some experiments in this space. (If only I had time myself!)

I do like the idea behind the search-algorithms library but my understanding is that it doesn't do the caching tricks you described, so one essentially needs to use an efficient data structure to implement various functional queries. And I don't think there is any hope to make that library work efficiently with the algebraic graph representation, for instance, because the cost model is just too different.

dataopt · 2022-09-15T14:13:30Z

@snowleopard BTW, is there any connection between your work on algebraic graphs and GraphBLAS?

snowleopard · 2022-09-15T19:18:51Z

I don't think there is any connection.

snowleopard added algorithm design labels May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algorithms should be in terms of typeclasses not data structures #262

Algorithms should be in terms of typeclasses not data structures #262

infinity0 commented May 17, 2020

infinity0 commented May 17, 2020

snowleopard commented May 18, 2020

infinity0 commented May 18, 2020 •

edited

Loading

snowleopard commented May 19, 2020

dataopt commented Aug 31, 2022

snowleopard commented Aug 31, 2022

dataopt commented Sep 1, 2022

snowleopard commented Sep 2, 2022

dataopt commented Sep 2, 2022

snowleopard commented Sep 3, 2022 •

edited

Loading

dataopt commented Sep 3, 2022

dataopt commented Sep 8, 2022

snowleopard commented Sep 8, 2022

dataopt commented Sep 15, 2022

snowleopard commented Sep 15, 2022

Algorithms should be in terms of typeclasses not data structures #262

Algorithms should be in terms of typeclasses not data structures #262

Comments

infinity0 commented May 17, 2020

infinity0 commented May 17, 2020

snowleopard commented May 18, 2020

infinity0 commented May 18, 2020 • edited Loading

snowleopard commented May 19, 2020

dataopt commented Aug 31, 2022

snowleopard commented Aug 31, 2022

dataopt commented Sep 1, 2022

snowleopard commented Sep 2, 2022

dataopt commented Sep 2, 2022

snowleopard commented Sep 3, 2022 • edited Loading

dataopt commented Sep 3, 2022

dataopt commented Sep 8, 2022

snowleopard commented Sep 8, 2022

dataopt commented Sep 15, 2022

snowleopard commented Sep 15, 2022

infinity0 commented May 18, 2020 •

edited

Loading

snowleopard commented Sep 3, 2022 •

edited

Loading