A few notes on agreement between raters.
Cohen’s \(\kappa\) can be used for agreement between two raters on categorical data. The basic calculation is
\[ \kappa = \frac{p_a - p_e}{1 - p_e}, \]
where \(p_a\) is the percentage observed agreement and \(p_e\) is the percentage expected agreement by chance. Therefore \(\kappa\) is what percentage of the agreement over chance is observed.
Fleiss’ \(\kappa\) is an extension to more than two raters and has a similar form.
A major flaw in either \(\kappa\) is that for ordinal data, any disagreement is treated equal. E.g. on a Likert scale, ratings of 4 and 5 are just as disagreeable as ratings of 1 and 5. Weighted \(\kappa\) addresses this by including a weight matrix which can be used to provide levels of disagreement.
ICC is used for continuous measurements. It can be used in place of weighted \(\kappa\) with ordinal variables of course. The basic calculation is
\[ ICC = \frac{\sigma^2_w}{\sigma^2_w + \sigma^2_b}, \]
where \(\sigma^2_w\) and \(\sigma^2_b\) represent within- and between- rater variability respectively. Since the denominator is the total variance of all ratings regardless of rater, this fraction represents the percent of total variation accounted for by within-variation.
The modern way to estimate the ICC is by a mixed model, extracting the \(\sigma\)’s that are needed.
Use the Orthodont
data from nlme
as our example. Look at distance
measurements and look at correlation by Subject
.
library(nlme)
library(lme4)
data(Orthodont)
nlme
Using the nlme
package, we fit the model:
fm1 <- lme(distance ~ 1, random = ~ 1 | Subject, data = Orthodont)
summary(fm1)
## Linear mixed-effects model fit by REML
## Data: Orthodont
## AIC BIC logLik
## 521.3618 529.3803 -257.6809
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 1.937002 2.220312
##
## Fixed effects: distance ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 24.02315 0.4296606 81 55.91192 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.2400448 -0.5277439 -0.1072888 0.4731815 2.7687301
##
## Number of Observations: 108
## Number of Groups: 27
The between-effect standard deviation is reported as the Residual StdDev
. To obtain the ICC, we compute each \(\sigma\):
s2w <- getVarCov(fm1)[[1]]
s2b <- fm1$s^2
c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))
## sigma2_w sigma2_b icc
## 3.7519762 4.9297832 0.4321677
lme4
Using the lme4
package, we fit the model:
fm2 <- lmer(distance ~ (1 | Subject), data = Orthodont)
summary(fm2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: distance ~ (1 | Subject)
## Data: Orthodont
##
## REML criterion at convergence: 515.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2400 -0.5277 -0.1073 0.4732 2.7687
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 3.752 1.937
## Residual 4.930 2.220
## Number of obs: 108, groups: Subject, 27
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 24.0231 0.4297 55.91
The Variance column of the Random Effects table gives the within-subject (Subject) and between-subject (Residual) variances.
s2w <- summary(fm2)$varcor$Subject[1]
s2b <- summary(fm2)$sigma^2
c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))
## sigma2_w sigma2_b icc
## 3.7519771 4.9297829 0.4321678