Cross Validated Asked by MikkeyWilks on December 15, 2021
Let say I pick k number of particles from a population n, with replacement, and the population n is derived from picking i number of particles from a pool of 1.0995116e+12 unique particles, with replacement, how do I determine the optimal n for a desired k where all of k is unique?
The experiment is as follows, I start with an initial population of 1.0995116e+12 unique particles. There is a bottle neck where I will sample roughly 35,000 particles from the population into their own unique population, n. This population n will have 35,000 unique particles at near infinite copies, which is why it is with replacement. I want to sample k number of particles from n, where I will have less than 1% of the population being non-unique(I would like them all to be unique, but technical limits may require me to allow some non-unique). Finding a probability distribution to model this has been hard given my lack of statistical knowledge, I am trying to find the optimum number n for a given k, where the population k will have all unique particles.
My approach to this has been to model it computationally and manually pick values of n and k that give me a low non-unique rate. I then run the simulation a large number of times, and plot the distribution of non-unique over unique. I show an example below where n=35000, k=300, and the simulation is run 1,000,000 times. I then plotted a histogram of the proportions of non-unique to k for each simulation.
However, if I want to choose a larger number k I would have to manually run the simulations again and intuitively scale the n to the k after observing the data. I feel that this approach is computationally wasteful, is there a more elegant method?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP