Data Science Asked by ashnaa1610 on August 29, 2021
I am trying to calculate scores based on different values in the dataframe. Since these scores are based on different conditions, I am having problems where in the final calculation, which I need in a new column, where I need to choose the higher number in one of the column(Col E
) if there are similar values in two columns(Col ID & VID
) and unique in one column(Col QID
).
ID VID QID A B C D E
121 212 123 1 2 1 1 1
121 212 435 1 2 1 1 5
223 244 567 2 3 5 1 2
313 232 709 5 1 2 1 3
313 232 887 5 1 2 1 2
454 969 457 1 3 2 2 4
454 969 457 1 2 1 2 4
The last row showcases that Col ID, VID, QID and E
can be same but since Col A, B, C and D
are different, it will result in different score. The multiplication of Columns A, B, C, D and E
(the higher value) should be in the Col Score
. The result is expected like below:
ID VID QID A B C D E Score
121 212 123 1 2 1 1 1 2
121 212 435 1 2 1 1 5 10
223 244 567 2 3 5 1 2 60
313 232 709 5 1 2 1 3 30
313 232 887 5 1 2 1 2 20
454 969 457 1 3 2 2 4 48
454 969 457 1 2 1 2 4 16
Calculation goes like Columns A * B * C * D * E. For calculating the Score
based on the similar ID and VID but unique QID.
The higher value in Col E
can be first or last. If it is possible via groupby and then merging them to get this result above then that also solves the purpose.
I have tried .sort
in order to bring Col E
in a descending or ascending format and then calculating but couldn’t write the logic behind the calculation. Just a beginner trying to work this problem for few days now.
Thanks in advance!
So, from what I understand from the problem, you want create a Score column, whereby, typically:
$Score = A times B times C times D times E$
If ID == VID and the value in QID for that entry is unique in the whole data frame, then $E = max(E)$.
For this, I would create additional columns which check for these conditions before making the score column. Therefore, I would recommend this:
import numpy as np
import pandas as pd
# unique QID
QID_counts = df.groupby("QID").size().reset_index()
QID_counts.columns = ["QID", "QID_count"]
df = pd.merge(left = df, right = QID_count, on = "QID")
# checking if IDs are equal to VID
df["ID_VID"] = df[df["ID"] == df["VID"]]
df["Unique_QID"] = df[df["QID_count"] == 1]
# checking if both conditions are met
df["Max E"] = df["ID_VID"] & df["Unique_QID"]
df["Max E"] = df["Max E"].astype(int)
# obtaining indices
max_E_idxs = df[df["Max E"] == 1].index
# updating E to suit conditions
df["Score_E"] = df["E"]
df.loc[max_E_idxs,"Score_E"] = np.max(df["E"])
# creating score
df["Score"] = df["A"] * df["B"] * df["C"] * df["D"] * df["Score_E"]
```
Correct answer by shepan6 on August 29, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP