# Intra-Class Correlation and Inter-rater Reliability

A few notes on agreement between raters.

## Cohen's \(\kappa\)

Cohen's \(\kappa\) can be used for agreement between two raters on categorical data. The basic calculation is

\[ \kappa = \frac{p_a - p_e}{1 - p_e}, \]where \(p_a\) is the percentage observed agreement and \(p_e\) is the percentage expected agreement by chance. Therefore \(\kappa\) is what percentage of the agreement over chance is observed.

Fleiss' \(\kappa\) is an extension to more than two raters and has a similar form.

A major flaw in either \(\kappa\) is that for ordinal data, any disagreement is treated equal. E.g. on a Likert scale, ratings of 4 and 5 are just as disagreeable as ratings of 1 and 5. Weighted \(\kappa\) addresses this by including a weight matrix which can be used to provide levels of disagreement.

### Sources

## Intra-class correlation

ICC is used for continuous measurements. It can be used in place of weighted \(\kappa\) with ordinal variables of course. The basic calculation is

\[ ICC = \frac{\sigma^2_w}{\sigma^2_w + \sigma^2_b}, \]where \(\sigma_w^2\) and \(\sigma_b^2\) represent within- and between- rater variability respectively. Since the denominator is the total variance of all ratings regardless of rater, this fraction represents the percent of total variation accounted for by within-variation.

The modern way to estimate the ICC is by a mixed model, extracting the \(\sigma\)'s that are needed.

### ICC in R

Use the "Orthodont" data from `nlme`

as our example. Look
at `distance`

measurements and look at correlation
by `Subject`

.

library("nlme") library("lme4") data(Orthondont)

#### With `nlme`

Using the `nlme`

package, we fit the model:

fm1 <- lme(distance ~ 1, random = ~ 1 | Subject, data = Orthodont) summary(fm1)

Linear mixed-effects model fit by REML Data: Orthodont AIC BIC logLik 521.3618 529.3803 -257.6809 Random effects: Formula: ~1 | Subject (Intercept) Residual StdDev: 1.937002 2.220312 Fixed effects: distance ~ 1 Value Std.Error DF t-value p-value (Intercept) 24.02315 0.4296606 81 55.91192 0 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.2400448 -0.5277439 -0.1072888 0.4731815 2.7687301 Number of Observations: 108 Number of Groups: 27

The between-effect standard deviation is reported as
the `Residual StdDev`

. To obtain the ICC, we compute each
\(\sigma\):

s2w <- getVarCov(fm1)[[1]] s2b <- fm1$s^2 c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))

sigma2_w sigma2_b icc 3.7519762 4.9297832 0.4321677

#### With `lme4`

Using the `lme4`

package, we fit the model:

fm2 <- lmer(distance ~ (1 | Subject), data = Orthodont) summary(fm2)

Linear mixed model fit by REML ['lmerMod'] Formula: distance ~ (1 | Subject) Data: Orthodont REML criterion at convergence: 515.4 Scaled residuals: Min 1Q Median 3Q Max -3.2400 -0.5277 -0.1073 0.4732 2.7687 Random effects: Groups Name Variance Std.Dev. Subject (Intercept) 3.752 1.937 Residual 4.930 2.220 Number of obs: 108, groups: Subject, 27 Fixed effects: Estimate Std. Error t value (Intercept) 24.0231 0.4297 55.91

The Variance column of the Random Effects table gives the within-subject (Subject) and between-subject (Residual) variances.

s2w <- summary(fm2)$varcor$Subject[1] s2b <- summary(fm2)$sigma^2 c(sigma2_w = s2w, sigma2_b = s2b, icc = s2w/(s2w + s2b))

sigma2_w sigma2_b icc 3.7519736 4.9297839 0.4321675