TransWikia.com

How to compare the output of Self-organizing maps?

Data Science Asked by user66305 on December 1, 2020

I am trying to simultaneously cluster and visualize text documents using Self-organizing maps. Since text documents can be represented in various ways (vector space model, GloVe etc), I am trying to figure out how to tell which representation generates the best map. Measures like Quantization error etc., determine the goodness of the map given a dataset. However, they are not useful for quantitatively telling which representation gives a better output.

Is there a quantitative measure to compare the maps generated using different representations (for example, Tf-idf and GloVe) and tell for which representation the output is better?

One Answer

From Wikipedia:

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction.

So you only have the original data itself; no additional data (like labels in a supervised setting). If you are also say the result has to have two dimensions, you basically look at functions

$$f: X rightarrow mathbb{R}^2$$

where $X subsetneq mathbb{R}^n$ in most cases. You already mentioned quantization error.

Up to my knowledge there is nothing better measure which does not include getting more knowledge about the data itself by human inspection / using other datasets.

With human inspection you can, of course, tell for a given dataset and a given human if one mapping seems to make more sense.

You might also consider other dimensionality reduction techniques:

Answered by Martin Thoma on December 1, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP