Data Science Asked by ignatius on March 14, 2021
I’m working with a dataset $X$ (of length $N$) of count data, which looks like:
I developed a statistical model which can be improved, so I’m asking for any suggestions, for instance, differnet likelihoods or prior selection, different approach, anything…
My model
I’m trying to get the parameters of the likelihood of the data, so thaht I can get a posterior predictive density function, credible intervals and so on. Let’s say, I want to model the generative process of the data given some parameters, $f(X|theta)$
This data shows a large overdispersion ($bar X << var(X)$), thus a Poisson likelihood, $f(X|lambda) sim mathcal{Poisson}(lambda)$, is not a good choice.
Reading literature about count data with overdisperssion, I decided to model $f(X|lambda)$ as a Negative Binomial distribution, thus $f(X|lambda) sim mathcal NB(r, p)$
Parameter estimation
In order to not to end up with a very complex set-up, I’ve performed bayesian estiamtion of the hyperameter $p$, letting $r$ be computed from the data: in a Neagative Binomial distribution, $r$ is related to the first and second moments of the distribution following:
$
r = frac{mu^2}{sigma^2 – mu}, text then
$
$
hat r = frac{bar X^2}{var(X) – bar X}
$
The whole set-up is:
which returned the following posterior predictive distribution:
The first and second moments of the predictive posterior distribution are very close to those in the data (I’ve let the data have a huge impact in the posteriors since I’ve choosen a non-informative prior). Also, the point estimate posterior predictive (using $mu_p$) does not differ from an averaged predictive posterior distribution over all possible values of $p$.
Once again, any suggestions for improvement?
EDIT
What about a zero-truncated negative binomial distribution?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP