Data Science Asked by PascalVKooten on August 17, 2021
I was thinking if anyone considered a sampling technique that would try to aim keeping as much of the variance as possible (e.g. as many unique values, or very widely distributed continuous variables).
The benefit might be that it will allow development of code around the sample, and really work with the edge cases in the data.
You can then later always take a representative sample.
So, I am wondering if people have tried to sample for maximum variance before and if there is a clever way to sample with as high possible variance (of course an approximation is just fine).
It depends on what you mean by sampling. Is it sampling between or within features?
For between features, scikit-learn has a built-in option for VarianceThreshold which removes features whose variance does not meet some threshold.
Answered by Brian Spiering on August 17, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP