Data Science Asked by To Rrent on January 18, 2021
I am stuck at where how can I get the most dependent variables based on the mean
I have this dataset
and when I try to:
df.groupby('left').mean()
It gives the output:
And one of my friends said, from that graph the dependent variables for the attribute left
will be
1.Satisfaction Level
2.Average Monthly Hours
3.Promotion Last 5 Years
I am wondering How could someone guess that?
When you look a satisfaction_level, you see that the mean of group0 is 50% higher than group1 mean. So, on average "satisfaction of left=0 people is more important than satisfaction of left=1 people". Now if a new person has a very high satisfaction level, it is more likely that he is from group 0.
You have to check if the difference in means is statistically significant. Otherwise, this difference could be a coincidence.
As @Seymour stated, you cannot draw conclusions about causality but only about patterns of co-occurrence.
Answered by Rusoiba on January 18, 2021
in Statistics, the independent variables are inputs over which you have control. The dependent variables are the outcome observed by altering the values of the independent variables. Therefore, the answer is "it depends".
Therefore, if you are studying how alterations of left
values influence the values of satisfaction level
. Then, left values is your independent variables and satisfaction is your independent variables
Instead, if your friend was concluding that variations in left
values causes changes in Satisfaction Level
, Average Monthly Hours
and Promotion Last 5 Years
, then, it is a very biased conclusion based on no significant evidence. It would be more appropriate to talk about correlations between variables for which you only know that some behavior appear together (correlated) without identifying any causal effect.
Answered by Seymour on January 18, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP