TransWikia.com

How to estimate the leafsize of the kd-tree?

Cross Validated Asked by curiosus on November 12, 2021

The kd-tree implementation proposed by the scipy python libray asks for the value of the leafsize parameter that is to say the maximum number of points a node can hold. It is by default set to 10.

Are there methods or ways to estimate the value of the leafsize parameter to better distribute the data and avoid having leaves nodes with a single point?

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html

scipy.spatial.KDTree(data, leafsize=10)
#The number of points at which the algorithm switches over to brute-force. Has to be positive.

One Answer

With this setting of 10, you should never have a leaf with a single point, unless your data set consists of exactly one point.

Because the splits are balanced in size, the previous level must have at more than 10 points. So the minimum size is 5, if you set the maximum to 10 (except if there are less than 5 data points total).

Answered by Has QUIT--Anony-Mousse on November 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP