TransWikia.com

Fraud risk propagation in large scale network

Data Science Asked on July 5, 2021

What’s the best approach to do some graph analytics and risk propagation in a network using python where multiple accounts are connected through a relationship and few of the accounts in the network are marked as bad accounts and the rest are unknown?

I tried using networkx but it seems to run forever. I have about 8MM edges and 40K nodes

2 Answers

You could try applying a graph convolutional network to do some semi-supervised learning. See Kipf and Welling's paper "Semi-Supervised Classification with Graph Convolutional Networks". It probably depends on how unbalanced your dataset is though. If the dataset is too large, you could train a sample of it, and train the GCN on that subset. I'd try to find some exemplar data points and create a train set from that.

Answered by Victor Ng on July 5, 2021

As Victor proposed, you probably need the graph convolution networks. 40K nodes is borderline too much for the memory, so you could consider GraphSAGE-alike approaches, which propose to sample subgraphs around target points and then run some sort of GCN or GAT (graph attention networks) for them. You could use library like DGL or pytorch geometric for that.

Other notable approach is Deep Walk, it generates some embedding by neighborhood. As a plus, it preserves the locality in the embedding. The minus, in my experience, it's not scales so well, but you can give it a try.

Answered by Kirill Fedyanin on July 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP