You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A final q value of a result of giraph version was 0.429, and that of graphx version was 0.402.
Although I tried parameter tuning, a final q value of graphx version was still lower than that of giraph version.
I checked values of communityRDD in the second iteration by adding debug logs as follows.
println("totalEdgeWeight: " + totalGraphWeight.value)
// gather community information from each vertex's local neighborhood
// var communityRDD = louvainGraph.mapReduceTriplets(sendCommunityData, mergeCommunityMessages)
communityRDD.collect().foreach(println) // add to debug communityRDD
var activeMessages = communityRDD.count()
The communityRDD is initialized by correct values and I obtained final q value 0.417 by giraphx version.
I think that modifications as above are necessary.
I tried the PartitionStrategy.CanonicalRandomVertexCut instead of the PartitionStrategy.EdgePartition2D with a large graph (approximately 100,000,000 edges) and I obtained a result by approximately 10 times faster.
It is found that the optimal PartitionStrategy is dependent on an inputted graph structure.
I would be grateful if it be selectable.
thanks.
The text was updated successfully, but these errors were encountered:
ken57
changed the title
The communitySigmaTot of the communityRDD are initialized by zero in the second iteration.
The communitySigmaTot of the communityRDD is initialized by zero in the second iteration.
Jun 29, 2018
Thanks for making your implementation public.
I compared results of the louvain method of the giraph version with that of the graphx version by using a graph as follows.
https://gist.github.com/ken57/6a090706a17735e8a333b931af451a49
A final q value of a result of giraph version was 0.429, and that of graphx version was 0.402.
Although I tried parameter tuning, a final q value of graphx version was still lower than that of giraph version.
I checked values of communityRDD in the second iteration by adding debug logs as follows.
https://github.com/Sotera/distributed-graph-analytics/blob/master/dga-graphx/src/main/scala/com/soteradefense/dga/graphx/louvain/LouvainCore.scala#L852
I obtained debug logs as follows, and It is found that the communitySigmaTot of the communityRDD is initialized by zero in the second iteration.
So, I added
partitionBy(PartitionStrategy.EdgePartition2D).groupEdges(_ + _)
to the compressGraph as follows.https://github.com/Sotera/distributed-graph-analytics/blob/master/dga-graphx/src/main/scala/com/soteradefense/dga/graphx/louvain/LouvainCore.scala#L334
The communityRDD is initialized by correct values and I obtained final q value 0.417 by giraphx version.
I think that modifications as above are necessary.
I have two more things to contact you.
This is a minute thing. Very large logs are outputted by println as follows. I would be grateful if you delete it.
https://github.com/Sotera/distributed-graph-analytics/blob/master/dga-graphx/src/main/scala/com/soteradefense/dga/graphx/louvain/LouvainCore.scala#L102
I tried the PartitionStrategy.CanonicalRandomVertexCut instead of the PartitionStrategy.EdgePartition2D with a large graph (approximately 100,000,000 edges) and I obtained a result by approximately 10 times faster.
It is found that the optimal PartitionStrategy is dependent on an inputted graph structure.
I would be grateful if it be selectable.
thanks.
The text was updated successfully, but these errors were encountered: