Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The communitySigmaTot of the communityRDD is initialized by zero in the second iteration. #91

Open
ken57 opened this issue Jun 29, 2018 · 0 comments

Comments

@ken57
Copy link

ken57 commented Jun 29, 2018

Thanks for making your implementation public.
I compared results of the louvain method of the giraph version with that of the graphx version by using a graph as follows.
https://gist.github.com/ken57/6a090706a17735e8a333b931af451a49

A final q value of a result of giraph version was 0.429, and that of graphx version was 0.402.
Although I tried parameter tuning, a final q value of graphx version was still lower than that of giraph version.

I checked values of communityRDD in the second iteration by adding debug logs as follows.

println("totalEdgeWeight: " + totalGraphWeight.value)

// gather community information from each vertex's local neighborhood
// var communityRDD = louvainGraph.mapReduceTriplets(sendCommunityData, mergeCommunityMessages)
communityRDD.collect().foreach(println) // add to debug communityRDD

var activeMessages = communityRDD.count()

https://github.com/Sotera/distributed-graph-analytics/blob/master/dga-graphx/src/main/scala/com/soteradefense/dga/graphx/louvain/LouvainCore.scala#L852

I obtained debug logs as follows, and It is found that the communitySigmaTot of the communityRDD is initialized by zero in the second iteration.

(4,Map((2,0) -> 9, (23,0) -> 4, (25,0) -> 2))
(25,Map((27,0) -> 1, (2,0) -> 1, (23,0) -> 6, (4,0) -> 2))
(27,Map((23,0) -> 3, (25,0) -> 1))
(23,Map((27,0) -> 3, (2,0) -> 3, (25,0) -> 6, (4,0) -> 4))
(7,Map((2,0) -> 2, (5,0) -> 2))
(5,Map((7,0) -> 2, (2,0) -> 2))
(2,Map((5,0) -> 2, (4,0) -> 9, (25,0) -> 1, (23,0) -> 3, (7,0) -> 2))

So, I added partitionBy(PartitionStrategy.EdgePartition2D).groupEdges(_ + _) to the compressGraph as follows.

    val louvainGraph = compressedGraph.outerJoinVertices(nodeWeights)((vid, data, weightOption) => {
      val weight = weightOption.getOrElse(0L)
      data.communitySigmaTot = weight + data.internalWeight
      data.nodeWeight = weight
      data
    }).partitionBy(PartitionStrategy.EdgePartition2D).groupEdges(_ + _).cache()

https://github.com/Sotera/distributed-graph-analytics/blob/master/dga-graphx/src/main/scala/com/soteradefense/dga/graphx/louvain/LouvainCore.scala#L334

The communityRDD is initialized by correct values and I obtained final q value 0.417 by giraphx version.
I think that modifications as above are necessary.

I have two more things to contact you.
This is a minute thing. Very large logs are outputted by println as follows. I would be grateful if you delete it.
https://github.com/Sotera/distributed-graph-analytics/blob/master/dga-graphx/src/main/scala/com/soteradefense/dga/graphx/louvain/LouvainCore.scala#L102

I tried the PartitionStrategy.CanonicalRandomVertexCut instead of the PartitionStrategy.EdgePartition2D with a large graph (approximately 100,000,000 edges) and I obtained a result by approximately 10 times faster.
It is found that the optimal PartitionStrategy is dependent on an inputted graph structure.
I would be grateful if it be selectable.

thanks.

@ken57 ken57 changed the title The communitySigmaTot of the communityRDD are initialized by zero in the second iteration. The communitySigmaTot of the communityRDD is initialized by zero in the second iteration. Jun 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant