Data Science Asked on March 2, 2021
I am using scikit-learn MiniBatchKMeans to do text clustering.
In the constructor method, there is a parameter reassignment_ratio
, which is described in the documentation (link above) as follows:
reassignment_ratio : float, default=0.01
Control the fraction of the
maximum number of counts for a center to be reassigned. A higher value
means that low count centers are more easily reassigned, which means
that the model will take longer to converge, but should converge in a
better clustering.
I cannot wrap my head around that.
If I raise the reassignment ratio, I raise the "maximum number of counts for a center to be reassigned", so a center will be reassigned only if the number of samples (does "counts" stands for "sample" here?) around it is above this threshold.
Shouldn’t it be the other way around? That a center is reassigned if the number of samples around it is below reassignment ratio
?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP