Data Science Asked on December 4, 2020
I stumbled recently upon the Self-Organized Map, an ANN architecture used to cluster high dimensional data, while simultaneously imposing a neighbourhood structure on it. It’s trained through a competitive learning approach where neurons compete to respond to a given input. The strongest responding neuron / best matching unit (BMU) is rewarded by being moved closer to the given input in the data space, as well its neighbours. However, within the literature and implementations, I find some deviations in how this training is implemented. Specifically, the influence of the BMU on its neighbors are mitigated using a neighborhood function
where d is the distance of the BMU to the input and σ(t) is a radius which is decreased during the training. Effectively, resulting in the influence of the readjustment of the BMU on its neighbourhood shrinking during training. The difference I find concerns the implementation of the shrinking of σ(t) . Most explanations and blog posts describe an exponential decrease
where λ is a decay constant which can be tuned arbitrarily. Alternatively, I find that some implementations do not really use this exponential decay, but instead use a linear interpolation of the form
where n is the number of training epochs and r is radius which is altered depending the training phase. These implementations further explicitely between a ‘rough’ training phase where
with e.g. SOM.dims=(100,100) being for a 100×100 sized SOM, and a ‘fine-tuning’ training phase where
My problem is that I do not quite understand why there seems to be this disagreement and what the ‘canonical’ way of training a SOM is. It certainly makes sense to divide the training into a ‘rough’ and a ‘fine-tuning’ phase, but why most newer descriptions neglect this without further discussion and only consider a single training phase with exponential decay is baffling me a bit.
An answer from Kohonen, inventor of the self-organized map himself:
"The true mathematical form of σ(t) is not crucial, as long as its value is fairly large in the beginning of the process. Say, on the order of half of the diameter of the grid, whereafter it is gradually reduced to a fraction of it in about 1000 steps."
From: Kohonen, T., 2013. Essentials of the self-organizing map. Neural networks, 37, pp.52-65.
Answered by Steve on December 4, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP