Intra-Class Correlation and Inter-rater Reliability

A few notes on agreement between raters.

Cohen's κ

Cohen's κ can be used for agreement between two raters on categorical data. The basic calculation is

κ = \frac{p_{a} - p_{e}}{1 - p_{e}},

where $p_{a}$ is the percentage observed agreement and $p_{e}$ is the percentage expected agreement by chance. Therefore κ is what percentage of the agreement over chance is observed.

Fleiss' κ is an extension to more than two raters and has a similar form.

A major flaw in either κ is that for ordinal data, any disagreement is treated equal. E.g. on a Likert scale, ratings of 4 and 5 are just as disagreeable as ratings of 1 and 5. Weighted κ addresses this by including a weight matrix which can be used to provide levels of disagreement.

Sources

Intra-class correlation

ICC is used for continuous measurements. It can be used in place of weighted κ with ordinal variables of course. The basic calculation is

ICC = \frac{σ_{w}^{2}}{σ_{w}^{2} + σ_{b}^{2}},

where $σ_{w}^{2}$ and $σ_{b}^{2}$ represent within- and between- rater variability respectively. Since the denominator is the total variance of all ratings regardless of rater, this fraction represents the percent of total variation accounted for by within-variation.

The modern way to estimate the ICC is by a mixed model, extracting the σ's that are needed.

ICC in R

Use the "Orthodont" data from nlme as our example. Look at distance measurements and look at correlation by Subject.

library("nlme")
library("lme4")
data(Orthondont)

With `nlme`

Using the nlme package, we fit the model:

fm1 <- lme(distance ~ 1, random = ~ 1 | Subject, data = Orthodont)
summary(fm1)

 Linear mixed-effects model fit by REML
  Data: Orthodont
       AIC      BIC    logLik
  521.3618 529.3803 -257.6809

Random effects:
 Formula: ~1 | Subject
        (Intercept) Residual
StdDev:    1.937002 2.220312

Fixed effects:  distance ~ 1
               Value Std.Error DF  t-value p-value
(Intercept) 24.02315 0.4296606 81 55.91192       0

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max
-3.2400448 -0.5277439 -0.1072888  0.4731815  2.7687301

Number of Observations: 108
Number of Groups: 27

The between-effect standard deviation is reported as the Residual StdDev. To obtain the ICC, we compute each σ:

s2w <- getVarCov(fm1)[[1]]
s2b <- fm1$s^2
c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))

 sigma2_w  sigma2_b       icc
3.7519762 4.9297832 0.4321677

With `lme4`

Using the lme4 package, we fit the model:

fm2 <- lmer(distance ~ (1 | Subject), data = Orthodont)
summary(fm2)

 Linear mixed model fit by REML ['lmerMod']
Formula: distance ~ (1 | Subject)
   Data: Orthodont

REML criterion at convergence: 515.4

Scaled residuals:
    Min      1Q  Median      3Q     Max
-3.2400 -0.5277 -0.1073  0.4732  2.7687

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 3.752    1.937
 Residual             4.930    2.220
Number of obs: 108, groups:  Subject, 27

Fixed effects:
            Estimate Std. Error t value
(Intercept)  24.0231     0.4297   55.91

The Variance column of the Random Effects table gives the within-subject (Subject) and between-subject (Residual) variances.

s2w <- summary(fm2)$varcor$Subject[1]
s2b <- summary(fm2)$sigma^2
c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))

 sigma2_w  sigma2_b       icc
3.7519736 4.9297839 0.4321675

Intra-Class Correlation and Inter-rater Reliability

Cohen's κ

Sources

Intra-class correlation

ICC in R

With nlme

With lme4

Sources

With `nlme`

With `lme4`