TransWikia.com

Determining the Method option that FindClusters uses with AbsoluteOptions

Mathematica Asked on August 13, 2021

I am trying to determine which method mathematica chooses when using FindClusters. The documentation says that it chooses the best one for the data. I have tried to use AbsoluteOptions, which says it returns the options for a command, but it does not seem to be working.

GaussianRandomData[n_Integer, p_, sigma_] := 
  Table[p + 
    {Re[#], Im[#]}&[RandomReal[NormalDistribution[0, sigma]] E^(I RandomReal[{0, 2 π}])], {n}];
datapairs = BlockRandom[SeedRandom[2134];
Join[
  GaussianRandomData[100, {2, 1}, .3], 
  GaussianRandomData[100, {1, 1.8}, .2], 
  GaussianRandomData[100, {1, 1.1}, .4], 
  GaussianRandomData[100, {1.75, 1.75}, 0.1]]];

AbsoluteOptions[FindClusters[datapairs, Method -> Automatic], Method]

Any help would be appreciated.

2 Answers

Using Trace with the option TraceInternal -> True gives:

DeleteDuplicates[Flatten@Trace[FindClusters[datapairs, Method -> Automatic], 
   HoldPattern[Rule["Method", _]], TraceInternal -> True]]

{"Method"->"GaussianMixture"}

If you specify the number of clusters:

DeleteDuplicates[Flatten@Trace[FindClusters[datapairs, 3, Method -> Automatic], 
   HoldPattern[Rule["Method", _]], TraceInternal -> True]]

{"Method"->"KMeans"}

With PerformanceGoal -> "Quality"

DeleteDuplicates[Flatten@Trace[FindClusters[datapairs, 3, Method -> Automatic, 
    PerformanceGoal -> "Quality"], HoldPattern[Rule["Method", _]], 
   TraceInternal -> True]]

{"Method"->"KMedoids"}

l = {RGBColor[1., 0.5544801460824762, 0.12056345655596812`], RGBColor[
   1., 0.2818404077149421, 0.1073945311994069], RGBColor[
   1., 0.12423838985259317`, 0.19023691956664956`], RGBColor[
   0.8, 0.4542154246540884, 0.31688034954543], RGBColor[
   0.8, 0.5483770742736782, 0.16977938137471082`], RGBColor[
   0.8, 0.03163746197875539, 0.5781619271042624], RGBColor[
   0.8, 0.1612089376881538, 0.15737556414394493`], RGBColor[
   0.5, 0.8592283961197744, 0.04768022523989446], RGBColor[
   0.1544029090531034, 0.5400111921283921, 0.1332688011328087], 
   RGBColor[0.5550268260924609, 0.6650311925481958, 0.24096295360192643`], 
   RGBColor[0.8424867588418756, 0.9610747917029776, 0.38159472421539053`], 
   RGBColor[0.5, 0.6654316628707297, 0.9850955091132039], RGBColor[
   0.1726013976586489, 0.7948159289195966, 0.9375970360424373], 
   RGBColor[0.07338116039584297, 0.6615692536088942, 0.9035903703739081], 
   RGBColor[0.0396922307314016, 0.06815211658088716, 0.9401879243429714], 
   RGBColor[0.26561262398696184`, 0.1750699399994622, 0.47868645290098866`]};

DeleteDuplicates[Flatten@Trace[FindClusters[l], HoldPattern[Rule["Method", _]], 
   TraceInternal -> True]]

{Method -> DBSCAN}

The function MachineLearning`file40Decisions`PackagePrivate`automaticClusterNumberMethods seems to determine the method to be used based on input type, data dimensions and the setting for the option PerformanceGoal:

automaticClusterNumberMethods[type_, performanceGoal_, dims_]:= If[
    MachineLearning`file40Decisions`PackagePrivate`vectorSpaceQ[type],
    Switch[
            performanceGoal, Automatic | "Memory",
                If[Greater[Last @ dims, 7],
                    {"DBSCAN", "NeighborhoodContraction", "Agglomerate"},
                    {"DBSCAN", "NeighborhoodContraction", "GaussianMixture", 
      "Agglomerate"}
                ],
            "Speed",
                {"DBSCAN", "GaussianMixture", "NeighborhoodContraction"},
            "Quality",
                {
                    "Agglomerate", "DBSCAN", "JarvisPatrick", "MeanShift", 
     "Spectral", "SpanningTree",
                    "NeighborhoodContraction", "GaussianMixture"
                },
            "TrainingSpeed",
                {"DBSCAN", "NeighborhoodContraction"}
        ],
    {"DBSCAN", "JarvisPatrick"}
   ];

If the number of clusters is given the function MachineLearning`file40Decisions`PackagePrivate`givenClusterNumberMethods is called to determine the method to be used:

givenClusterNumberMethods[type_, performanceGoal_] := If[
    vectorSpaceQ[type],
    Switch[
        performanceGoal, Automatic | "Memory" | "Speed",
            {"KMeans", "Agglomerate"},
        "Quality",
            {"KMeans", "Agglomerate", "Spectral", "KMedoids"},
        "TrainingSpeed",
            {"KMeans"}
    ],
    If[MatchQ[type, {"Location"}],
        {"KMedoids"},
        {"KMedoids", "Agglomerate"}
    ]
];

Correct answer by kglr on August 13, 2021

As the approach in @kglr's answer doesn't work in v12.3, here I expand my related comments as an answer in case folks are still interested in this.

I came to this workaround by realising that FindClusters and ClusterClassify essentially perform the same task: classification, and there is a recent major improvement/overhaul in Information to facilitate the retrieval of symbol details, including a bunch of machine learning related objects.

So, instead of using Trace, now one can simply apply Information over the trained ClassifierFunction to get the Method under the hood:

funCC=ClusterClassify[data]
Information[funCC]

One can see in this case the Method is GaussianMixture.

enter image description here

Answered by sunt05 on August 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP