Random selection of a row from a pandas DataFrame with weights

Question

I'm trying to randomly select a row from a pandas DataFrame based on provided weights. I tried to use .sample() method with these parameters, but can't get the syntax working:
import pandas as pd

df = pd.DataFrame({
    'label': [1,0,1,-1],
    'ind': [2,3,6,8],
})

df.sample(n=1, weights=[0.5, 0.4, 0.1], axis=0)

labels are 1,0 and -1 and I want to assign different weights to each label for random selection.

Quang Hoang · Answer

You should scale the weight so it matches the expected distribution:
weights = {-1:0.1, 0:0.4, 1:0.5}

scaled_weights = (pd.Series(weights) / df.label.value_counts(normalize=True))

df.sample(n=1, weights=df.label.map(scaled_weights) )

Test distribution with 10000 samples
(df.sample(n=10000, replace=True, random_state=1,
           weights=df.label.map(scaled_weights))
   .label.value_counts(normalize=True)
)

Output:
 1    0.5060
 0    0.3979
-1    0.0961
Name: label, dtype: float64

Chris Schmitz · Answer

For each row, divide the desired weight by the frequency of that label in the df:
weights=df['label'].replace({1:0.5,0:0.4,-1:0.1})/df.groupby('label')['label'].transform('count')

df.sample(n=1, weights=weights, axis=0)

RunTheGauntlet · Answer

You can try following code. It assigns desired weights from dictionary to your rows in df (assuming you gave them in such an order). In case you want weights to be dependent from number of elements - you can replace lambda with more complex function.
w = df['label'].apply( lambda x: {-1:0.5, 0:0.4, 1:0.1}[x] )
df.sample(n=1, weights=w, axis=0)

Random selection of a row from a pandas DataFrame with weights

3 Answers

Add your own answers!

Ask a Question