Three different errors using external information. Which one makes sense? (Or how to interpret each?)

Data Science Asked by Nima S on June 11, 2021

My goal is to compare clustering methods considering different method and different number of clusters using an external information.

Could anyone please give some opinion/ recommend book/paper about the following problem?

Let’s say I have N data point and I will feed each of them to a system as one input and get a corresponding output and I consider their sum as the total output.

Now, I cluster the N datapoints into K clusters and find their centroids. I assign the number of data points in each cluster as the weight of its centroid. Now, I will feed my centroids to the system one by one and I will get K output.

How should I calculate the error?

Should I calculate the (average?) error for each cluster first and then take the average of the errors calculated for the clusters?

Or should I divide the errors in each cluster by the mean of the ground truth output and then sum the errors calculated for the clusters? Or takes the average (instead of "sum")?

Or just find the weight sum of outputs obtained from centroids and substract it from the actual total output?

clustering machine learning statistics validation

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Lex on Does Google Analytics track 404 page responses as valid page views?
Joshua Engel on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?