TransWikia.com

Adding and multiplying higher values based on different columns of a dataframe

Data Science Asked on December 16, 2021

I am trying to calculate scores based on different values in the dataframe. Since these scores are based on different conditions, I am having problems where in the final calculation, which I need in a new column, where I need to choose the higher number in one of the column(Col E) if there are similar values in two columns(Col ID & VID) and unique in one column(Col QID).

ID     VID     QID     A     B     C     D     E
121    212     123     1     2     1     1     1
121    212     435     1     2     1     1     5
223    244     567     2     3     5     1     2
313    232     709     5     1     2     1     3
313    232     887     5     1     2     1     2
454    969     457     1     3     2     2     4
454    969     457     1     2     1     2     4

The last row showcases that Col ID, VID, QID and E can be same but since Col A, B, C and D are different, it will result in different score. The multiplication of Columns A, B, C, D and E (the higher value) should be in the Col Score. The result is expected like below:

ID     VID     QID     A     B     C     D     E     Score
121    212     123     1     2     1     1     1     2
121    212     435     1     2     1     1     5     10
223    244     567     2     3     5     1     2     60
313    232     709     5     1     2     1     3     30
313    232     887     5     1     2     1     2     20
454    969     457     1     3     2     2     4     48
454    969     457     1     2     1     2     4     16

Calculation goes like Columns A * B * C * D * E. For calculating the Score based on the similar ID and VID but unique QID.

The higher value in Col E can be first or last. If it is possible via groupby and then merging them to get this result above then that also solves the purpose.

I have tried .sort in order to bring Col E in a descending or ascending format and then calculating but couldn’t write the logic behind the calculation. Just a beginner trying to work this problem for few days now.

Thanks in advance!

One Answer

So, from what I understand from the problem, you want create a Score column, whereby, typically:

$Score = A times B times C times D times E$

If ID == VID and the value in QID for that entry is unique in the whole data frame, then $E = max(E)$.

For this, I would create additional columns which check for these conditions before making the score column. Therefore, I would recommend this:

import numpy as np
import pandas as pd

# unique QID
QID_counts = df.groupby("QID").size().reset_index()
QID_counts.columns = ["QID", "QID_count"]

df = pd.merge(left = df, right = QID_count, on = "QID")

# checking if IDs are equal to VID
df["ID_VID"] = df[df["ID"] == df["VID"]]
df["Unique_QID"] = df[df["QID_count"] == 1]

# checking if both conditions are met
df["Max E"] = df["ID_VID"] & df["Unique_QID"]
df["Max E"] = df["Max E"].astype(int)
# obtaining indices 
max_E_idxs = df[df["Max E"] == 1].index

# updating E to suit conditions
df["Score_E"] = df["E"]
df.loc[max_E_idxs,"Score_E"] = np.max(df["E"])

# creating score
df["Score"] = df["A"] * df["B"] * df["C"] * df["D"] * df["Score_E"]
```

Answered by shepan6 on December 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP