Data Science Asked by Rutger Mauritz on March 10, 2021
For a project, I need to create synthetic categorical data containing specific dependencies between the attributes. This can be done by sampling from a pre-defined Bayesian Network. After some exploration on the internet, I found that Pomegranate
is a good package for Bayesian Networks, however – as far as I’m concerned – it seems unpossible to sample from such a pre-defined Bayesian Network. As an example, model.sample()
raises a NotImplementedError
(despite this solution says so).
Does anyone know if there exists a library which provides a good interface for the construction and sampling of/from a Bayesian network?
Please use the function from_samples()
to build a Bayesian n/w from the data.
Answered by Sherin Varghese on March 10, 2021
There is an open issue in pomegranate
for this in github. In the issue, they mention an ongoing pull request that implements rejections sampling and Gibbs sampling; the last comment in the PR discussion is from 7 days ago (2020, May 17th), so it is not abandoned but actively developed. You could use the version of pomegranate
from that PR to sample from your Bayesian Network.
Answered by noe on March 10, 2021
Just to elucidate the above answers with a concrete example, so that it will be helpful for someone, let's start with the following simple dataset (with 4 variables and 5 data points):
import pandas as pd
df = pd.DataFrame({'A':[0,0,0,1,0], 'B':[0,0,1,0,0], 'C':[1,1,0,0,1], 'D':[0,1,0,1,1]})
df.head()
# A B C D
#0 0 0 1 0
#1 0 0 1 1
#2 0 1 0 0
#3 1 0 0 1
#4 0 0 1 1
Now, let's learn the Bayesian Network structure from the above data using the 'exact' algorithm with pomegranate (uses DP/A* to learn the optimal BN structure), using the following code snippet:
import numpy as np
from pomegranate import *
model = BayesianNetwork.from_samples(df.to_numpy(), state_names=df.columns.values, algorithm='exact')
# model.plot()
The BN structure that is learn is shown in the next figure along with the corresponding CPTs:
As can be seen from the above figure, it explains the data exactly. We can compute the log-likelihood of the data with the model as follows:
np.sum(model.log_probability(df.to_numpy()))
# -7.253364813857112
Once the BN structure is learnt, we can sample from the BN as follows:
model.sample()
# array([[0, 1, 0, 0]], dtype=int64)
As a side note, if we use algorithm='chow-liu'
instead (which finds a tree-like structure with fast approximation), we shall obtain the following BN:
The log-likelihood of the data this time is
np.sum(model.log_probability(df.to_numpy()))
# -8.386987635761297
which indicates the algorithm exact
finds better estimate.
Answered by Sandipan Dey on March 10, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP