Mathematica Asked on August 13, 2021
I am trying to determine which method mathematica chooses when using FindClusters. The documentation says that it chooses the best one for the data. I have tried to use AbsoluteOptions
, which says it returns the options for a command, but it does not seem to be working.
GaussianRandomData[n_Integer, p_, sigma_] :=
Table[p +
{Re[#], Im[#]}&[RandomReal[NormalDistribution[0, sigma]] E^(I RandomReal[{0, 2 π}])], {n}];
datapairs = BlockRandom[SeedRandom[2134];
Join[
GaussianRandomData[100, {2, 1}, .3],
GaussianRandomData[100, {1, 1.8}, .2],
GaussianRandomData[100, {1, 1.1}, .4],
GaussianRandomData[100, {1.75, 1.75}, 0.1]]];
AbsoluteOptions[FindClusters[datapairs, Method -> Automatic], Method]
Any help would be appreciated.
Using Trace
with the option TraceInternal -> True
gives:
DeleteDuplicates[Flatten@Trace[FindClusters[datapairs, Method -> Automatic],
HoldPattern[Rule["Method", _]], TraceInternal -> True]]
{"Method"->"GaussianMixture"}
If you specify the number of clusters:
DeleteDuplicates[Flatten@Trace[FindClusters[datapairs, 3, Method -> Automatic],
HoldPattern[Rule["Method", _]], TraceInternal -> True]]
{"Method"->"KMeans"}
With PerformanceGoal -> "Quality"
DeleteDuplicates[Flatten@Trace[FindClusters[datapairs, 3, Method -> Automatic,
PerformanceGoal -> "Quality"], HoldPattern[Rule["Method", _]],
TraceInternal -> True]]
{"Method"->"KMedoids"}
l = {RGBColor[1., 0.5544801460824762, 0.12056345655596812`], RGBColor[
1., 0.2818404077149421, 0.1073945311994069], RGBColor[
1., 0.12423838985259317`, 0.19023691956664956`], RGBColor[
0.8, 0.4542154246540884, 0.31688034954543], RGBColor[
0.8, 0.5483770742736782, 0.16977938137471082`], RGBColor[
0.8, 0.03163746197875539, 0.5781619271042624], RGBColor[
0.8, 0.1612089376881538, 0.15737556414394493`], RGBColor[
0.5, 0.8592283961197744, 0.04768022523989446], RGBColor[
0.1544029090531034, 0.5400111921283921, 0.1332688011328087],
RGBColor[0.5550268260924609, 0.6650311925481958, 0.24096295360192643`],
RGBColor[0.8424867588418756, 0.9610747917029776, 0.38159472421539053`],
RGBColor[0.5, 0.6654316628707297, 0.9850955091132039], RGBColor[
0.1726013976586489, 0.7948159289195966, 0.9375970360424373],
RGBColor[0.07338116039584297, 0.6615692536088942, 0.9035903703739081],
RGBColor[0.0396922307314016, 0.06815211658088716, 0.9401879243429714],
RGBColor[0.26561262398696184`, 0.1750699399994622, 0.47868645290098866`]};
DeleteDuplicates[Flatten@Trace[FindClusters[l], HoldPattern[Rule["Method", _]],
TraceInternal -> True]]
{Method -> DBSCAN}
The function MachineLearning`file40Decisions`PackagePrivate`automaticClusterNumberMethods
seems to determine the method to be used based on input type, data dimensions and the setting for the option PerformanceGoal
:
automaticClusterNumberMethods[type_, performanceGoal_, dims_]:= If[
MachineLearning`file40Decisions`PackagePrivate`vectorSpaceQ[type],
Switch[
performanceGoal, Automatic | "Memory",
If[Greater[Last @ dims, 7],
{"DBSCAN", "NeighborhoodContraction", "Agglomerate"},
{"DBSCAN", "NeighborhoodContraction", "GaussianMixture",
"Agglomerate"}
],
"Speed",
{"DBSCAN", "GaussianMixture", "NeighborhoodContraction"},
"Quality",
{
"Agglomerate", "DBSCAN", "JarvisPatrick", "MeanShift",
"Spectral", "SpanningTree",
"NeighborhoodContraction", "GaussianMixture"
},
"TrainingSpeed",
{"DBSCAN", "NeighborhoodContraction"}
],
{"DBSCAN", "JarvisPatrick"}
];
If the number of clusters is given the function MachineLearning`file40Decisions`PackagePrivate`givenClusterNumberMethods
is called to determine the method to be used:
givenClusterNumberMethods[type_, performanceGoal_] := If[
vectorSpaceQ[type],
Switch[
performanceGoal, Automatic | "Memory" | "Speed",
{"KMeans", "Agglomerate"},
"Quality",
{"KMeans", "Agglomerate", "Spectral", "KMedoids"},
"TrainingSpeed",
{"KMeans"}
],
If[MatchQ[type, {"Location"}],
{"KMedoids"},
{"KMedoids", "Agglomerate"}
]
];
Correct answer by kglr on August 13, 2021
As the approach in @kglr's answer doesn't work in v12.3, here I expand my related comments as an answer in case folks are still interested in this.
I came to this workaround by realising that FindClusters
and ClusterClassify
essentially perform the same task: classification, and there is a recent major improvement/overhaul in Information
to facilitate the retrieval of symbol details, including a bunch of machine learning related objects.
So, instead of using Trace
, now one can simply apply Information
over the trained ClassifierFunction
to get the Method
under the hood:
funCC=ClusterClassify[data]
Information[funCC]
One can see in this case the Method
is GaussianMixture
.
Answered by sunt05 on August 13, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP