cdalitz on December 3, 2020

If k raters are asked to rate the same set of objects on a continuous or Likert scale, there is the ICC3 for measuring the inter-rater agreement.

Is there also an agreement measure, if all raters have to order the rated objects by preference?

A naive approach would be to compute the Spearman correlation for all pairs of objects and then take the average, but as this most certainly is a standard problem, I wonder whether there is a standerd solution for it.

Following @chl's suggestion, I have treid Kendall's W, but the result is somewhat suprising. Although there is 80% perfect agreement among the raters, Kendall's W is only 0.36:

> library(irr)
> x <- data.frame(R1=c(1,2,3), R1=c(1,2,3), R1=c(1,2,3), R1=c(3,2,1), R1=c(1,2,3))
> x
  R1 R1.1 R1.2 R1.3 R1.4
1  1    1    1    3    1
2  2    2    2    2    2
3  3    3    3    1    3
> kendall(x)$value
[1] 0.36

Does someone know of a different index that yields a more reasonable result in this case?

cdalitz on December 3, 2020

