Riskscore creation on Numerical Data

Question

I am working to create a Risk score on data where I have variables -
Invested_amount, Profit Amount, Age of Account in days, Total Trading Transactions, Profit per Transaction & Investment per transaction. Basically, I want to derive a method to calculate the Risk score, where a person's profit is more (most of the time he is winning). Here the score should be higher so that I can classify them as High Risk (Mostly wins and create a big profit), medium Risk, and Low-Risk customers (Always makes Loss).
Basically, my problem is to assign a Risk score to each customer, the higher the score more the customer is Risky. Once this score is derived then using this score we will segment the customers into three classes.
Below are the variables currently I am using…
|Variable|               Definition|
|username|               UserId to identify the records|
|Total_Freq_transaction| Total Number of trades done|
|Total_Freq_win|         Total number of trade won|
|Account_age_days|       Account activated date - Last trade date |
|Invest_Amount|          Total Amount invested in trade|
|total_profit|           Profit made on trade (pay out amount - Investment Amount)|
|Trade_per_day|          Total Number of trades done / Account age|
|Win_prob%|              Wining probability from total number of trades|

Any help would be greatly appreciated.

Santoshi M · Answer

There are traditional ways of creating a risk scorecard using linear regression techniques. It's a whole topic in itself. A good book for beginners to read in-depth about it would be this.

For sure, you can also tackle it as three-class classification problem if you have labelled data for that. Or consider it as binary classification problem and create three buckets out of class probability output by the classifier.

rnso · Answer

Since your winning_probability (Win_prob%) is a continuous numeric outcome variable, it is a supervised learning regression problem. For this there are many methods, both linear and non-linear (especially see scikit-learn: http://scikit-learn.org/stable/supervised_learning.html ). Which of these algorithm is best for you depends on your data. It is likely that you need to try multiple algorithms and see which one gives best results. Cross-validation ( http://scikit-learn.org/stable/modules/cross_validation.html ) is likely to be best way to determine this.

You may also try neural networks using other libraries, e.g. https://www.tensorflow.org/tutorials/keras/basic_regression or https://www.kaggle.com/xgdbigdata/keras-regression-tutorial# .

Once you have predicted winning_probability, it can be segmented into low, medium and high risk using any chosen criteria.

Hope this helps.

You may also try neural networks using other libraries, e.g. https://www.tensorflow.org/tutorials/keras/basic_regression or https://www.kaggle.com/xgdbigdata/keras-regression-tutorial# .

Once you have predicted winning_probability, it can be segmented into low, medium and high risk using any chosen criteria.

Hope this helps.

tehem · Answer

Since your winning probability is not the final deciding factor in whether a user is high or low risk because they could have a very high win probability but low profits. Which would actually make them a medium to low risk user. And you don't have any labelled class data or any final risk metric, this could be formulated as an unsupervised learning problem.
Simply cluster the user's into three groups and analyze the output to see if it is intuitively correct. Clean it up a bit, label it and use that to build future supervised learning classification models.

Riskscore creation on Numerical Data

3 Answers

Add your own answers!

Ask a Question