TransWikia.com

Algorithm to find Unique users from their transactions

Data Science Asked by Chenna Reddy on July 27, 2021

Scenario

An eCommerce site allow users to purchase items without creating an Account. However, it captures following attributes from the Purchases based on the type of payment and type of delivery:

  1. Name
  2. Address
  3. Email Address
  4. Card/Account Number

Note: It is possible one type of transaction only get one subset of attributes (e.g. Name and Address) and another type of transaction get another subset of attributes (e.g. Name and Card Number)

Expectation

To offer customized products or to prevent Fraud, it want to find individual users.

E.g. It should identify following three transactions are done by single user:

  1. name: Alice, email: [email protected]
  2. email: [email protected], Card Number: 123456
  3. name: Bob, Card Number: 123456 (Alice made purchase for Bob).

As you can see car-number/email is a stronger link than Name. So different attributes have different weights.

I have looked at couple classification algorithms (k-mean, k-mode..), but none of them is addressing this use case as I don’t want to classify users but identify individual users.

Is this a standard data science problem and which algorithm will best fit here? Thank you!

One Answer

You don't need any ML here: in the question you just gave the full algorithm to find the target output ;) At best a ML system would find just the same method after using a lot of annotated data.

The method is essentially what you said:

  • Group the users by their card number or email (easily done with a map for each), this gives you the cases of "sure" identical users.
  • Group the users by their name, this gives you the cases of "possible" identical users. For this part you might want to use string similarity measures.

Answered by Erwan on July 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP