Data Science Asked on March 31, 2021
I am following a Mooc and in this lecture about visualisation in explenatory data analysis the lecturer claims that when plotting the row indexes against feature values, if we have lines on the feature value axis it means that the data have been properly shuffled. I can’t see why.
On the contrary, in the following lecture, the lecturer claims that from the absence of vertical lines, the data hasn’t been properly shuffled:
I think I get it as if it was, I would have seen clear lines. But how can I bee sure there isn’t more classes hidden in these subs?
- Shouldn't an index have only one value in the feature axis?
Yes, that's correct. On the graph given as example this is not visible because there are too many row indexes (50000). As a consequence it's impossible to distinguish a particular index from its neighbors, but if the X axis was stretched long enough one would see a single feature value for every index.
- One horizontal line should mean that the feature values for all indexes have been uniformized, not randomized?
I think there could be two different confusions here:
One may also note on this graph that there is some kind of underlying discrete distribution of the values: very clearly for values 0 and 1, but also from all the white horizontal lines which show that some values seldom exist in the data.
Correct answer by Erwan on March 31, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP