Stack Overflow Asked by Mehdi Zare on November 7, 2021
I’m trying to randomly select a row from a pandas DataFrame based on provided weights. I tried to use .sample() method with these parameters, but can’t get the syntax working:
import pandas as pd
df = pd.DataFrame({
'label': [1,0,1,-1],
'ind': [2,3,6,8],
})
df.sample(n=1, weights=[0.5, 0.4, 0.1], axis=0)
labels are 1,0 and -1 and I want to assign different weights to each label for random selection.
You should scale the weight so it matches the expected distribution:
weights = {-1:0.1, 0:0.4, 1:0.5}
scaled_weights = (pd.Series(weights) / df.label.value_counts(normalize=True))
df.sample(n=1, weights=df.label.map(scaled_weights) )
Test distribution with 10000 samples
(df.sample(n=10000, replace=True, random_state=1,
weights=df.label.map(scaled_weights))
.label.value_counts(normalize=True)
)
Output:
1 0.5060
0 0.3979
-1 0.0961
Name: label, dtype: float64
Answered by Quang Hoang on November 7, 2021
For each row, divide the desired weight by the frequency of that label in the df:
weights=df['label'].replace({1:0.5,0:0.4,-1:0.1})/df.groupby('label')['label'].transform('count')
df.sample(n=1, weights=weights, axis=0)
Answered by Chris Schmitz on November 7, 2021
You can try following code. It assigns desired weights from dictionary to your rows in df (assuming you gave them in such an order). In case you want weights to be dependent from number of elements - you can replace lambda with more complex function.
w = df['label'].apply( lambda x: {-1:0.5, 0:0.4, 1:0.1}[x] )
df.sample(n=1, weights=w, axis=0)
Answered by RunTheGauntlet on November 7, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP