Scalable way to calculate betweenness centrality for a graph in spark

Question

I have a use-case to calculate betweenness centrality of nodes. I have tried graphx with spark-betweenness but it is a very long running job. Has anyone successfully calculated betweenness centrality of a large network with around 10 million vertices and 100 million edges?

Matthieu Latapy · Answer

Sorry, I do not think you can compute the exact betweenness centrality of nodes in a graph this size, as its complexity is $O(ncdot m)$ where $n$ is the number of nodes, $m$ the number of links.
The good news is that you may approximate it, and in a way that may take benefit from parallel computations. Indeed, computing betweenness centrality relies on counting the number of shortest paths from any node to any other. You may (randomly) select some nodes and compute the numbers of shortest path from each of them to all others, and use the obtained number to approximate the betweenness. The more nodes you select, the best the approximate will be, but it is empirically rather good even with a small sample set.

Scalable way to calculate betweenness centrality for a graph in spark

One Answer

Add your own answers!

Ask a Question