TransWikia.com

What would be a good randomization environment for data science?

Data Science Asked on August 9, 2021

I would like to know if there are any best practices to optimize random environment. Currently I use this simple structure in my config :

from numpy.random import Generator, PCG64
rng = Generator(PCG64(42))
np.random.seed(42)

I use the rng generator for all general purposes (draw following a certain distribution, permutation of index, synthetic datapoints, etc.) and use the legacy np.random.seed to set the random state of scipy for the rvs method of scipy.stats generators.

I read somewhere in the sklearn doc (warning section here) that the sklearn.model_selection module uses the same global seed, that would the global seed set with np.random.seed isn’t it ?

If you have a better understanding of how scipy and sklearn refers to the global seed and what would be a good default randomization setup, it would be very usefull. Thanks

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP