 SciStat

# Inter-rater agreement (Kappa)

## Description

Creates a classification table from raw data in the data table, for two observers and calculates an inter-rater agreement statistic (Kappa) to evaluate the agreement between two classifications on ordinal or nominal scales (Cohen, 1960). (If you have the data already organised in a table, you can use the Inter-rater agreement command in the Calculators menu).

Agreement is quantified by the Kappa (K) or Weighted Kappa (Kw) statistic (Fleiss et al., 2003):

• K is 1 when there is perfect agreement between the classification system;
• K is 0 when there is no agreement better than chance;
• K is negative when agreement is worse than chance.

## Required input

• Select the variables containing the classification data for the two observers.

• Optionally select a filter to include a subset of cases.

## Options

• Weighted Kappa: select 'Weighted Kappa' if applicable. SciStat.com offers linear and quadratic weights. See below.

## Weighted kappa

Kappa does not take into account the degree of disagreement between observers and all disagreement is treated equally as total disagreement. Therefore when the categories are ordered, it is preferable to use Weighted Kappa, and assign different weights wi to subjects for whom the raters differ by i categories, so that different levels of agreement can contribute to the value of Kappa.

SciStat.com offers two sets of weights, called linear and quadratic. In the linear set, if there are k categories, the weights are calculated as follows: and in the quadratic set: When there are 5 categories, the weights in the linear set are 1, 0.75, 0.50, 0.25 and 0 when there is a difference of 0 (=total agreement) or 1, 2, 3 and 4 categories respectively. In the quadratic set the weights are 1, 0.937, 0.750, 0.437 and 0.

Use linear weights when the difference between the first and second category has the same importance as a difference between the second and third category, etc. If the difference between the first and second category is less important than a difference between the second and third category, etc., use quadratic weights.

## Results

The results windows displays a k x k table showing the classification by the two observers.

An inter-rater agreement statistic (K, Kappa) is calculated with 95% confidence interval (Fleiss et al., 2003).

The Standard errors reported by SciStat.com are the appropriate standard errors for testing the hypothesis that the underlying value of weighted kappa is equal to a prespecified value other than zero (Fleiss, 2003).

• K is 1 when there is perfect agreement between the classification systems;
• K is 0 when there is no agreement better than chance;
• K is negative when agreement is worse than chance.

The K value can be interpreted as follows (Altman, 1991):

 Value of K Strength of agreement < 0.20 Poor 0.21 - 0.40 Fair 0.41 - 0.60 Moderate 0.61 - 0.80 Good 0.81 - 1.00 Very good

## Literature

• Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
• Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37-46.
• Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd ed. Hoboken: John Wiley & Sons.