What would be a good randomization environment for data science?

Data Science Asked on August 9, 2021

I would like to know if there are any best practices to optimize random environment. Currently I use this simple structure in my config :

from numpy.random import Generator, PCG64
rng = Generator(PCG64(42))
np.random.seed(42)

I use the rng generator for all general purposes (draw following a certain distribution, permutation of index, synthetic datapoints, etc.) and use the legacy np.random.seed to set the random state of scipy for the rvs method of scipy.stats generators.

I read somewhere in the sklearn doc (warning section here) that the sklearn.model_selection module uses the same global seed, that would the global seed set with np.random.seed isn’t it ?

If you have a better understanding of how scipy and sklearn refers to the global seed and what would be a good default randomization setup, it would be very usefull. Thanks

numpy python scikit learn scipy

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Joshua Engel on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
haakon.io on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?