TransWikia.com

Bayesian network in Python: both construction and sampling

Data Science Asked by Rutger Mauritz on March 10, 2021

For a project, I need to create synthetic categorical data containing specific dependencies between the attributes. This can be done by sampling from a pre-defined Bayesian Network. After some exploration on the internet, I found that Pomegranate is a good package for Bayesian Networks, however – as far as I’m concerned – it seems unpossible to sample from such a pre-defined Bayesian Network. As an example, model.sample() raises a NotImplementedError (despite this solution says so).

Does anyone know if there exists a library which provides a good interface for the construction and sampling of/from a Bayesian network?

3 Answers

Please use the function from_samples() to build a Bayesian n/w from the data.

Answered by Sherin Varghese on March 10, 2021

There is an open issue in pomegranate for this in github. In the issue, they mention an ongoing pull request that implements rejections sampling and Gibbs sampling; the last comment in the PR discussion is from 7 days ago (2020, May 17th), so it is not abandoned but actively developed. You could use the version of pomegranate from that PR to sample from your Bayesian Network.

Answered by noe on March 10, 2021

Just to elucidate the above answers with a concrete example, so that it will be helpful for someone, let's start with the following simple dataset (with 4 variables and 5 data points):

import pandas as pd
df = pd.DataFrame({'A':[0,0,0,1,0], 'B':[0,0,1,0,0], 'C':[1,1,0,0,1], 'D':[0,1,0,1,1]})
df.head()

#   A   B   C   D
#0  0   0   1   0
#1  0   0   1   1
#2  0   1   0   0
#3  1   0   0   1
#4  0   0   1   1 

Now, let's learn the Bayesian Network structure from the above data using the 'exact' algorithm with pomegranate (uses DP/A* to learn the optimal BN structure), using the following code snippet:

import numpy as np
from pomegranate import *
model = BayesianNetwork.from_samples(df.to_numpy(), state_names=df.columns.values, algorithm='exact')
# model.plot()

The BN structure that is learn is shown in the next figure along with the corresponding CPTs:

enter image description here

As can be seen from the above figure, it explains the data exactly. We can compute the log-likelihood of the data with the model as follows:

np.sum(model.log_probability(df.to_numpy()))
# -7.253364813857112

Once the BN structure is learnt, we can sample from the BN as follows:

model.sample()  
# array([[0, 1, 0, 0]], dtype=int64)

As a side note, if we use algorithm='chow-liu' instead (which finds a tree-like structure with fast approximation), we shall obtain the following BN:

enter image description here

The log-likelihood of the data this time is

np.sum(model.log_probability(df.to_numpy()))
# -8.386987635761297

which indicates the algorithm exact finds better estimate.

Answered by Sandipan Dey on March 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP