Intra-Class Correlation and Inter-rater Reliability
A few notes on agreement between raters.
Cohen's κ
Cohen's κ can be used for agreement between two raters on categorical data. The basic calculation is
where is the percentage observed agreement and is the percentage expected agreement by chance. Therefore κ is what percentage of the agreement over chance is observed.
Fleiss' κ is an extension to more than two raters and has a similar form.
A major flaw in either κ is that for ordinal data, any disagreement is treated equal. E.g. on a Likert scale, ratings of 4 and 5 are just as disagreeable as ratings of 1 and 5. Weighted κ addresses this by including a weight matrix which can be used to provide levels of disagreement.
Sources
Intra-class correlation
ICC is used for continuous measurements. It can be used in place of weighted κ with ordinal variables of course. The basic calculation is
where and represent within- and between- rater variability respectively. Since the denominator is the total variance of all ratings regardless of rater, this fraction represents the percent of total variation accounted for by within-variation.
The modern way to estimate the ICC is by a mixed model, extracting the σ's that are needed.
ICC in R
Use the "Orthodont" data from nlme
as our example. Look
at distance
measurements and look at correlation
by Subject
.
library("nlme") library("lme4") data(Orthondont)
With nlme
Using the nlme
package, we fit the model:
fm1 <- lme(distance ~ 1, random = ~ 1 | Subject, data = Orthodont) summary(fm1)
Linear mixed-effects model fit by REML Data: Orthodont AIC BIC logLik 521.3618 529.3803 -257.6809 Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev: 1.937002 2.220312 Fixed effects: distance ~ 1 Value Std.Error DF t-value p-value (Intercept) 24.02315 0.4296606 81 55.91192 0 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.2400448 -0.5277439 -0.1072888 0.4731815 2.7687301 Number of Observations: 108 Number of Groups: 27
The between-effect standard deviation is reported as
the Residual StdDev
. To obtain the ICC, we compute each
σ:
s2w <- getVarCov(fm1)[[1]] s2b <- fm1$s^2 c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))
sigma2_w sigma2_b icc 3.7519762 4.9297832 0.4321677
With lme4
Using the lme4
package, we fit the model:
fm2 <- lmer(distance ~ (1 | Subject), data = Orthodont) summary(fm2)
Linear mixed model fit by REML ['lmerMod'] Formula: distance ~ (1 | Subject) Data: Orthodont REML criterion at convergence: 515.4 Scaled residuals: Min 1Q Median 3Q Max -3.2400 -0.5277 -0.1073 0.4732 2.7687 Random effects: Groups Name Variance Std.Dev. Subject (Intercept) 3.752 1.937 Residual 4.930 2.220 Number of obs: 108, groups: Subject, 27 Fixed effects: Estimate Std. Error t value (Intercept) 24.0231 0.4297 55.91
The Variance column of the Random Effects table gives the within-subject (Subject) and between-subject (Residual) variances.
s2w <- summary(fm2)$varcor$Subject[1] s2b <- summary(fm2)$sigma^2 c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))
sigma2_w sigma2_b icc 3.7519736 4.9297839 0.4321675