Code Golf Asked by Magic Octopus Urn on October 27, 2021
The task here is rather simple, perform a single iteration of a k-means clustering algorithm on a binary matrix. This is essentially the setup task for the main k-means algorithm, I felt the setup might be easier and entice golfing languages to give it a shot as well. The matrix passed to you will have the following format:
0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0 0
0 0 0 0 0 1 0 0 1
0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0
1 represents a point, while 0 represents a lack of points. Your job will be to randomly generate k-1
centroids and perform the initial clustering of the data around the centroids you’ve generated. k
is defined as min(grid#Width, grid#Height)-1
. The labeling of each centroid should go from 2
until k
. For instance, in this scenarios you could generate the following centroids:
Centroid 2 was generated at: (1.0, 4.0)
Centroid 3 was generated at: (1.0, 5.0)
Centroid 4 was generated at: (5.0, 1.0)
Centroid 5 was generated at: (3.0, 3.0)
Centroid 6 was generated at: (0.0, 2.0)
Centroid 7 was generated at: (6.0, 6.0)
Centroid 8 was generated at: (2.0, 6.0)
After generating the centroids, you must loop through each and every point that is marked with a 1
, as we can treat the ones marked with 0
as empty space. For each centroid, you must decide which centroid is the closest to the point in question. Here’s the distances for the example:
(0,8) distance from centroid 2 is 4.123105625617661
(0,8) distance from centroid 3 is 3.1622776601683795
(0,8) distance from centroid 4 is 8.602325267042627
(0,8) distance from centroid 5 is 5.830951894845301
(0,8) distance from centroid 6 is 6.0
(0,8) distance from centroid 7 is 6.324555320336759
(0,8) distance from centroid 8 is 2.8284271247461903
(1,1) distance from centroid 2 is 3.0
(1,1) distance from centroid 3 is 4.0
(1,1) distance from centroid 4 is 4.0
(1,1) distance from centroid 5 is 2.8284271247461903
(1,1) distance from centroid 6 is 1.4142135623730951
(1,1) distance from centroid 7 is 7.0710678118654755
(1,1) distance from centroid 8 is 5.0990195135927845
(2,8) distance from centroid 2 is 4.123105625617661
(2,8) distance from centroid 3 is 3.1622776601683795
(2,8) distance from centroid 4 is 7.615773105863909
(2,8) distance from centroid 5 is 5.0990195135927845
(2,8) distance from centroid 6 is 6.324555320336759
(2,8) distance from centroid 7 is 4.47213595499958
(2,8) distance from centroid 8 is 2.0
(3,2) distance from centroid 2 is 2.8284271247461903
(3,2) distance from centroid 3 is 3.605551275463989
(3,2) distance from centroid 4 is 2.23606797749979
(3,2) distance from centroid 5 is 1.0
(3,2) distance from centroid 6 is 3.0
(3,2) distance from centroid 7 is 5.0
(3,2) distance from centroid 8 is 4.123105625617661
(4,5) distance from centroid 2 is 3.1622776601683795
(4,5) distance from centroid 3 is 3.0
(4,5) distance from centroid 4 is 4.123105625617661
(4,5) distance from centroid 5 is 2.23606797749979
(4,5) distance from centroid 6 is 5.0
(4,5) distance from centroid 7 is 2.23606797749979
(4,5) distance from centroid 8 is 2.23606797749979
(4,8) distance from centroid 2 is 5.0
(4,8) distance from centroid 3 is 4.242640687119285
(4,8) distance from centroid 4 is 7.0710678118654755
(4,8) distance from centroid 5 is 5.0990195135927845
(4,8) distance from centroid 6 is 7.211102550927978
(4,8) distance from centroid 7 is 2.8284271247461903
(4,8) distance from centroid 8 is 2.8284271247461903
(6,1) distance from centroid 2 is 5.830951894845301
(6,1) distance from centroid 3 is 6.4031242374328485
(6,1) distance from centroid 4 is 1.0
(6,1) distance from centroid 5 is 3.605551275463989
(6,1) distance from centroid 6 is 6.082762530298219
(6,1) distance from centroid 7 is 5.0
(6,1) distance from centroid 8 is 6.4031242374328485
(7,1) distance from centroid 2 is 6.708203932499369
(7,1) distance from centroid 3 is 7.211102550927978
(7,1) distance from centroid 4 is 2.0
(7,1) distance from centroid 5 is 4.47213595499958
(7,1) distance from centroid 6 is 7.0710678118654755
(7,1) distance from centroid 7 is 5.0990195135927845
(7,1) distance from centroid 8 is 7.0710678118654755
(7,5) distance from centroid 2 is 6.082762530298219
(7,5) distance from centroid 3 is 6.0
(7,5) distance from centroid 4 is 4.47213595499958
(7,5) distance from centroid 5 is 4.47213595499958
(7,5) distance from centroid 6 is 7.615773105863909
(7,5) distance from centroid 7 is 1.4142135623730951
(7,5) distance from centroid 8 is 5.0990195135927845
(8,1) distance from centroid 2 is 7.615773105863909
(8,1) distance from centroid 3 is 8.06225774829855
(8,1) distance from centroid 4 is 3.0
(8,1) distance from centroid 5 is 5.385164807134504
(8,1) distance from centroid 6 is 8.06225774829855
(8,1) distance from centroid 7 is 5.385164807134504
(8,1) distance from centroid 8 is 7.810249675906654
(8,8) distance from centroid 2 is 8.06225774829855
(8,8) distance from centroid 3 is 7.615773105863909
(8,8) distance from centroid 4 is 7.615773105863909
(8,8) distance from centroid 5 is 7.0710678118654755
(8,8) distance from centroid 6 is 10.0
(8,8) distance from centroid 7 is 2.8284271247461903
(8,8) distance from centroid 8 is 6.324555320336759
The final result of the clustering algorithm is that there should be no 1s remaining in the matrix, only centroid numbers. This is why it was important to label the centroids from 2-k+1
, allowing us to replace them, as follows:
0 0 0 0 0 0 0 0 8
0 6 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 8
0 0 5 0 0 0 0 0 0
0 0 0 0 0 5 0 0 7
0 0 0 0 0 0 0 0 0
0 4 0 0 0 0 0 0 0
0 4 0 0 0 7 0 0 0
0 4 0 0 0 0 0 0 7
0 0 0 0 0 0 0 0 0
This is the initial clustering layout for 7 centroids on the provided grid, given randomly generated centroids. Your job is to output the clustered version of the binary input grid.
k-1
centroids must be randomly generated and should be anywhere from (0,0)
to (grid#Width, grid#Height)
.
k
is min(grid#Width, grid#Height)-1
.2
to k
.n
as row delimiters or a 2D array.⍴⍴,{⍺1+{⊃⍸⍵=⌊/⍵}¨↓⍵}⍸∘.(+.×⍨-)⌊/∘⍴{⍵[1↓⍺?⍴⍵]}∘⍸=⍨
-11 bytes thanks to Bubbler
⍝ Implicitly take binary matrix G as input
⍸=⍨ ⍝ Yield all (1-indexed) coordinates from (1,1) to (w,h)
⌊/∘⍴ ⍝ k+1=min(w,h)
{...} ⍝ Choose k centroids:
? ⍝ Randomly choose
⍺ ⍝ k+1 distinct indices
⍴⍵ ⍝ from the array of w×h possible indices
1↓ ⍝ Drop the first so there are only k indices
⍵[...] ⍝ index into the array of w×h coordinates
∘. ⍝ Outer product of
⍝ the k centroids with
⍳ ⍝ the coordinates of all of the "1"s in G
(+.×⍨-) ⍝ with operator: distance squared
{...}¨↓ ⍝ For each row (point)
⊃⍸⍵=⌊/⍵ ⍝ Find the index of the centroid with smallest distance
⍝ (breaks ties by choosing lower numbers → small bias toward lower centroid numbers)
1+ ⍝ Add 1 to shift centroid numbers to [2,k]
⍝ Now we have a list of the centroid numbers for the "1"s, in reading order
⍺ ⍝ Expand to prototype array:
, ⍝ Prototype array is flattened G (Expand throws RANK ERROR for non-vector left argument)
⍴ ⍝ Reshape back to
⍴ the shape of G
Answered by fireflame241 on October 27, 2021
If I understand the question correctly, something like this ought to work. "KMeans" is one of the built in methods to FindClusters and ClusteringComponents.
SparseArray[Thread[(p=Position[#,1])->1+ClusteringComponents[p,Min[d=Dimensions@#]-1,1,Method->"KMeans"]],d]&
Usage:
%@in//MatrixForm
to get visualization, in
is an integer array.
Answered by Kelly Lowder on October 27, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP