Data Science Asked by chris tan on June 6, 2021
Last few months, I had been exposed to Bayesian Inference
in ML course
With further investigation, I come to place where there is MCMC technique to simulate the posterior distribution.
It seems interesting. However, I am not sure if it is really useful in the industry?
Does anyone has experience with Bayesian Inference in practical?
Take customer_lifetime_value as X for example
Basically, my main question is how bayesian inference can be more useful than I just plotting frequency and cumulative frequency of historical X
Because with frequency
, i can estimate the mean X
,
With cumulative frequency
, i can estimate prob[X>x]
What is the advantage of trying to do bayesian inference
This may be an unpopular opinion to some, but in my experience Bayesian statistics is not particularly useful in data science in industry, for a couple of reasons:
A Bayesian approach is very useful when our questions are about statistical inference. However, in data science, more often than not, we are dealing with prediction. There may be some situations where a Bayesian approach works better than a frequentist approach, but I can't think of any off hand, apart from where a conjugate prior is available, in which case we are probably dealing with a very simple model.
Bayesian statistics usually requires a sampling, such as Markov Chain or Hamiltonian Monte Carlo, and this can be extremely computationally intensive, even for relatively small datasets. In industry we are often dealing with "big data" and a Bayesian model that requires MCMC or HMC just isn't practical.
Edit: To address a comment to this answer:
I have a question about this, how statistical inference is different from prediction. As for my understanding, prediction is usually about getting P(Y=y | X), where X is our data, which is similar with statistical inference
Prediction and inference are completely different. With prediction, all we really care about is the accuracy of the predictions, and that is a relatively easy thing to determine - because the test/validation procedure is minimising some measure of prediction error. On the other hand, with inference, we care about the coefficient estimates and often their standard errors. Often a researcher forms a hypothesis based on a theory of causes - eg, does drinking coffee cause cancer, and they will use a model to determine whether or not the data supports their theory, and to what extent. This is much, much more difficult than prediction. For one thing, in causal inference the basic requrement is that we do not want the estimates for $mathbf{X}$ to be biased - whereas in prediction we don't care if they are biased, provided that we get "good" predictions for $mathbf{y}$, and there are many, many sources of bias in a regression model - bias can and does arise from confounding, mediation, differential selection and colliders. Usually, the crux of the issue is to decide which variables to include in the model in order to eliminate or reduce these biases. With prediction we can use automated variable selection procedure in order to choose which variables to include (feature selection). With inference this is pretty much impossible because automated procedures cannot generally handle the above mentioned biases.
There is a detailed discussion of these issues in my answer to this question:
How Do DAGs Help To Reduce Bias In Causal Inference ?
Correct answer by Robert Long on June 6, 2021
I think these real-world industrial applications of Bayesian analysis might be helpful to you:
Also, Uber, in particular, has done a ton with Bayesian optimization, Bayesian forecasting methods (Orbit), and Bayesian hierarchical/multilevel models (probabilistic programming language Pyro/NumPyro). For example, see the following results returned by searching the Uber Engineering website:
Interested in more? In general, I track developments in Bayesian Analysis and its application in industry here:
I hope this has been a useful answer.
Answered by Michael T. on June 6, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP