Visualize frequency of 5 Boolean variables together

Question

I have a data set with 5 variables,
a b c d e
1 0 0 1 0
0 1 0 1 1
0 1 1 0 0
0 0 0 1 0
1 1 1 0 0
0 1 1 0 1
1 0 1 0 0
1 0 0 1 1
0 1 0 1 1
0 0 1 1 0
I am only interested in the percentages of occurrence,
occurrence,
| a | b | c | d | e |
.4 | .5 | .5 | .6 | .4
BUT, I would like to visualize in such a way that I can see the overlap, or not, among all the different groups.
Any idea?

sai · Answer

Since the combinations are known, we can use some knowledge of binary numbers and use this to find come up with a frequency plot
Basically - convert the binary string to integer and get a frequency plot based on the integer values
import numpy as np
import pandas as pd
from itertools import product
import matplotlib.pyplot as plt

# test data, 1 of every 32 combinations
combs = np.array(map(list, product([0, 1], repeat=5)))
# store in dataframe
df = pd.DataFrame(data={'a': combs[:, 0], 'b': combs[:, 1], 'c': combs[:, 2], 'd': combs[:, 3], 'e': combs[:, 4]})
# concatenate the binary sequences to strings
df['concatenate'] = df[list('abcde')].astype(str).apply(''.join, axis=1)

# to convert binary strings to integers
def int2(x):
    return int(x, 2)

# every combination has a unique value
df['unique_values'] = df['concatenate'].apply(int2)

# prepare labels for the frequency plot
variables = list('abcde')
labels = []
for combination in df.concatenate:
    tmp = ''.join([variables[i] for i, x in enumerate(combination) if x != '0'])
    labels.append(tmp)

fig, ax = plt.subplots()
counts, bins, patches = ax.hist(df.unique_values, bins=32, rwidth=0.8)

# turn of the
plt.tick_params(
    axis='x',          # changes apply to the x-axis
    which='both',      # both major and minor ticks are affected
    top=False,         # ticks along the top edge are off
    labelbottom=False)

# calculate the bin centers
bin_centers = 0.5 * np.diff(bins) + bins[:-1]
ax.set_xticks(bin_centers)
for label, x in zip(labels, bin_centers):
    # replace integer mapping with the labels
    ax.annotate(str(label), xy=(x, 0), xycoords=('data', 'axes fraction'),
        xytext=(0, -5), textcoords='offset points', va='top', ha='center', rotation='30')

plt.show()

Timothy Chan · Answer

If you have richer data (ie more than 10 rows), you will want an upset plot.  Upset plots are a way to view information in an intuitive way like a Venn diagram, but is more useful for 4+ categories.
Some references which may give you some ideas and implementation in R:

https://cran.r-project.org/web/packages/UpSetR/vignettes/basic.usage.html (attached image from r-project.org).
https://www.littlemissdata.com/blog/set-analysis

Edmund · Answer

With Wolfram Language you may use AbsoluteCorrelation.
With
t = {
     {1, 0, 0, 1, 0}, {0, 1, 0, 1, 1}, 
     {0, 1, 1, 0, 0}, {0, 0, 0, 1, 0}, 
     {1, 1, 1, 0, 0}, {0, 1, 1, 0, 1}, 
     {1, 0, 1, 0, 0}, {1, 0, 0, 1, 1}, 
     {0, 1, 0, 1, 1}, {0, 0, 1, 1, 0}
    }

Then
MatrixForm[ac = AbsoluteCorrelation[t]]

Where the diagonals are the marginal column frequencies and the off-diagonals the joint frequencies. That is for ac[[1,1]] variable a occurs with frequency 0.4 and for ac[[1,2]] (row 1, column 2) variable a occurs jointly with variable b with frequency 0.1
This can be visualised with MatrixPlot or ArrayPlot.
MatrixPlot[
 ac 
 , FrameTicks -> {Transpose@{Range@5, CharacterRange["a", "e"]}}
 , PlotLegends -> Automatic]

Hope this helps.

Visualize frequency of 5 Boolean variables together

3 Answers

Add your own answers!

Ask a Question