Data Science Asked on December 11, 2021
I’m given a Dataset of transactions and asked to find insights for businesses. I’m extremely new to ML / Data science and have only been experiencing with KMeans. The dataset has the following features
Ignoring NULL data, what type of analysis can I do on this data? I have done KMeans on if a customer will spend excessive amount (more than median amount).
you know, when they say the business team needs insights, it doesnt always imply machine learning.
You could also do a lot of exploratory analysis and visualize spending trends, seasonality among the demographic spending, when the customers are most active during the day, highlight the towns with the highest income growth rate, which age group is your largest customer base, your most profitable merchants in terms of volume as well as income. these are some of the insights the business team could use to make business strategies.
you could also cluster customers based on their spending patterns, age group, location etc to identify the most profitable customer groups.
lastly, this is basically time series data, so you could apply some traditional machine learning models like SARIMA or deep learning models like LSTMs or GRUs for time series forecasting the company earnings.
Answered by tehem on December 11, 2021
Maybe you can aggregate the data using fields such as Client ID, merchant ID so that you may also analyze the client and merchant data separately.
For example, you can aggregate the data on client id to get the sum or mean of the amount spent by the client. You can further analyze the data by plotting boxplots, distribution plots to find various insights.
I would highly recommend you to aggregate such transaction data.
You may also want to perform time series analysis using the transaction date to find the hidden trend seasonality. For this you may want to take a look at fbprophet (https://facebook.github.io/prophet/docs/quick_start.html).
Answered by Siddhi Kiran Bajracharya on December 11, 2021
On this data, you can perform a lot of supervised learning. If you know, supervised learning is when the machine learns with data which has labels. In supervised learning, there are two subsets. Those are regression
and classification
. Classification is when you predict on something which is discrete, such as male or female, or survived or not survived. On the basis of regression, you can predict non-discrete things, such as the price of a house, or GDP of a country.
Based on your dataset, I think you can do a lot of EDA(exploratory data analysis) with classification. Maybe you can predict which gender buys more. There are many things you can do with the dataset, but here are the algorithms you can use.
If you have a small dataset, Logistic Regression and Naive Bayes are the best algorithms. For starters, the k-NN(k-nearest neighbors) algorithm is the best though. If you start getting into complex data, then Decision Tree is the best algorithm.
Now after all of these, there is the most complex algorithm(which is basically a bunch of decision trees mixed together) which is Random Forest. This algorithm is for if you have a really huge dataset with many labels.
Hope this helps!
Answered by Sriswaroop Koundinya on December 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP