TransWikia.com

Association Rules with Python (coded dataset)

Data Science Asked on December 20, 2021

I have this dataset which I really need to use association rules techniques on. The dataset has like 90 variables, many of which are ordinal. Thing is, the data is already coded using numbers instead of strings (e.g. bread = 4 instead of "bread") as well as some re-scaled numerics such as 1 = 1%-10%".

What I have so far:

from apyori import apriori
#Convert dataframe to list
val_list= []
for row in range(1,5530):
    val_list.append([str(data.values[row,column]) for column in range (0,90)])
    print('Row ', row, ' ok')

apr = apriori(val_list,min_support=0.1,min_confidence=0.2,min_lift=2)    
result = list(apr)

Still, this way I don’t get the feature names in the frequent "baskets" so it’s not much use, since I have baskets like [33, 1, 8, 8, 1, 1] with no idea what the numbers might be referring to. What can I do and/or how do I prepare the data for association rule mining?

One Answer

Create a dictionary that contains the coded variables as keys and the item names as values.

So it would look like:

dicty = {4: "bread", 7: "milk", 9: "toothpaste"}

Constructing dictionaries in python is really easy if you have them in a table or excel spreadsheet.

dicty = {i:j for i,j in zip(coded_list,normal_list)}

where coded_list is the list of variables in numbers, and normal_list is the list of variables in their categorical names.

Once you have a dictionary you can simply convert them like this:

name = dicty[9]

and it should return toothpaste for name.

Answered by Amar Srivastava on December 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP