A few notes on agreement between raters.

## Cohen’s $$\kappa$$

Cohen’s $$\kappa$$ can be used for agreement between two raters on categorical data. The basic calculation is

$\kappa = \frac{p_a - p_e}{1 - p_e},$

where $$p_a$$ is the percentage observed agreement and $$p_e$$ is the percentage expected agreement by chance. Therefore $$\kappa$$ is what percentage of the agreement over chance is observed.

Fleiss’ $$\kappa$$ is an extension to more than two raters and has a similar form.

A major flaw in either $$\kappa$$ is that for ordinal data, any disagreement is treated equal. E.g. on a Likert scale, ratings of 4 and 5 are just as disagreeable as ratings of 1 and 5. Weighted $$\kappa$$ addresses this by including a weight matrix which can be used to provide levels of disagreement.

## Intra-class correlation

ICC is used for continuous measurements. It can be used in place of weighted $$\kappa$$ with ordinal variables of course. The basic calculation is

$ICC = \frac{\sigma^2_w}{\sigma^2_w + \sigma^2_b},$

where $$\sigma^2_w$$ and $$\sigma^2_b$$ represent within- and between- rater variability respectively. Since the denominator is the total variance of all ratings regardless of rater, this fraction represents the percent of total variation accounted for by within-variation.

The modern way to estimate the ICC is by a mixed model, extracting the $$\sigma$$’s that are needed.

### ICC in R

Use the Orthodont data from nlme as our example. Look at distance measurements and look at correlation by Subject.

library(nlme)
library(lme4)
data(Orthodont)

#### With nlme

Using the nlme package, we fit the model:

fm1 <- lme(distance ~ 1, random = ~ 1 | Subject, data = Orthodont)
summary(fm1)
## Linear mixed-effects model fit by REML
##  Data: Orthodont
##        AIC      BIC    logLik
##   521.3618 529.3803 -257.6809
##
## Random effects:
##  Formula: ~1 | Subject
##         (Intercept) Residual
## StdDev:    1.937002 2.220312
##
## Fixed effects: distance ~ 1
##                Value Std.Error DF  t-value p-value
## (Intercept) 24.02315 0.4296606 81 55.91192       0
##
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max
## -3.2400448 -0.5277439 -0.1072888  0.4731815  2.7687301
##
## Number of Observations: 108
## Number of Groups: 27

The between-effect standard deviation is reported as the Residual StdDev. To obtain the ICC, we compute each $$\sigma$$:

s2w <- getVarCov(fm1)[[1]]
s2b <- fm1$s^2 c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b)) ## sigma2_w sigma2_b icc ## 3.7519762 4.9297832 0.4321677 #### With lme4 Using the lme4 package, we fit the model: fm2 <- lmer(distance ~ (1 | Subject), data = Orthodont) summary(fm2) ## Linear mixed model fit by REML ['lmerMod'] ## Formula: distance ~ (1 | Subject) ## Data: Orthodont ## ## REML criterion at convergence: 515.4 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.2400 -0.5277 -0.1073 0.4732 2.7687 ## ## Random effects: ## Groups Name Variance Std.Dev. ## Subject (Intercept) 3.752 1.937 ## Residual 4.930 2.220 ## Number of obs: 108, groups: Subject, 27 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 24.0231 0.4297 55.91 The Variance column of the Random Effects table gives the within-subject (Subject) and between-subject (Residual) variances. s2w <- summary(fm2)$varcor$Subject[1] s2b <- summary(fm2)$sigma^2
c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))
##  sigma2_w  sigma2_b       icc
## 3.7519771 4.9297829 0.4321678