Association Rules with Python (coded dataset)

Question

I have this dataset which I really need to use association rules techniques on. The dataset has like 90 variables, many of which are ordinal. Thing is, the data is already coded using numbers instead of strings (e.g. bread = 4 instead of "bread") as well as some re-scaled numerics such as 1 = 1%-10%".
What I have so far:
from apyori import apriori
#Convert dataframe to list
val_list= []
for row in range(1,5530):
    val_list.append([str(data.values[row,column]) for column in range (0,90)])
    print('Row ', row, ' ok')

apr = apriori(val_list,min_support=0.1,min_confidence=0.2,min_lift=2)    
result = list(apr)

Still, this way I don't get the feature names in the frequent "baskets" so it's not much use, since I have baskets like [33, 1, 8, 8, 1, 1] with no idea what the numbers might be referring to. What can I do and/or how do I prepare the data for association rule mining?

Amar Srivastava · Answer

Create a dictionary that contains the coded variables as keys and the item names as values.
So it would look like:
dicty = {4: "bread", 7: "milk", 9: "toothpaste"}

Constructing dictionaries in python is really easy if you have them in a table or excel spreadsheet.
dicty = {i:j for i,j in zip(coded_list,normal_list)}

where coded_list is the list of variables in numbers, and normal_list is the list of variables in their categorical names.
Once you have a dictionary you can simply convert them like this:
name = dicty[9]

and it should return toothpaste for name.

Association Rules with Python (coded dataset)

One Answer

Add your own answers!

Ask a Question