How to find the dependent variables from a dataset?

Question

I am stuck at where how can I get the most dependent variables based on the mean

I have this dataset and when I try to:

df.groupby('left').mean()

It gives the output:

And one of my friends said, from that graph the dependent variables for the attribute left will be

1.Satisfaction Level

2.Average Monthly Hours

3.Promotion Last 5 Years

I am wondering How could someone guess that?

Rusoiba · Answer

When you look a satisfaction_level, you see that the mean of group0 is 50% higher than group1 mean. So, on average "satisfaction of left=0 people is more important than satisfaction of left=1 people". Now if a new person has a very high satisfaction level, it is more likely that he is from group 0.

You have to check if the difference in means is statistically significant. Otherwise, this difference could be a coincidence.

As @Seymour stated, you cannot draw conclusions about causality but only about patterns of co-occurrence.

Seymour · Answer

in Statistics, the independent variables are inputs over which you have control. The dependent variables are the outcome observed by altering the values of the independent variables. Therefore, the answer is "it depends".

Therefore, if you are studying how alterations of left values influence the values of satisfaction level. Then, left values is your independent variables and satisfaction is your independent variables

Instead, if your friend was concluding that variations in left values causes changes in Satisfaction Level, Average Monthly Hours and Promotion Last 5 Years, then, it is a very biased conclusion based on no significant evidence. It would be more appropriate to talk about correlations between variables for which you only know that some behavior appear together (correlated) without identifying any causal effect.

How to find the dependent variables from a dataset?

2 Answers

Add your own answers!

Ask a Question