TransWikia.com

Scalable way to calculate betweenness centrality for a graph in spark

Data Science Asked by abhati on June 22, 2021

I have a use-case to calculate betweenness centrality of nodes. I have tried graphx with spark-betweenness but it is a very long running job. Has anyone successfully calculated betweenness centrality of a large network with around 10 million vertices and 100 million edges?

One Answer

Sorry, I do not think you can compute the exact betweenness centrality of nodes in a graph this size, as its complexity is $O(ncdot m)$ where $n$ is the number of nodes, $m$ the number of links.

The good news is that you may approximate it, and in a way that may take benefit from parallel computations. Indeed, computing betweenness centrality relies on counting the number of shortest paths from any node to any other. You may (randomly) select some nodes and compute the numbers of shortest path from each of them to all others, and use the obtained number to approximate the betweenness. The more nodes you select, the best the approximate will be, but it is empirically rather good even with a small sample set.

Answered by Matthieu Latapy on June 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP