Inter-rater agreement (Kappa)
Use Inter-rater agreement to evaluate the agreement between two classifications (nominal or ordinal scales).
This test is not performed on data in the spreadsheet, but on tabulated date (you must enter the table data in the dialog box). If you do have the data in the spreadsheet, use Inter-rater agreement in the Statistics menu.
- First select the number of categories in the classification system - the maximum number of categories is 12.
- Next enter the number of observations in each cell of the data table.
- Weighted Kappa: select Weighted Kappa if the data come from an ordered scale. If the data come from a nominal scale, do not select Weighted Kappa.
SciStat.com offers two sets of weights, called linear and quadratic. In the linear set, if there are k categories, the weights are calculated as follows:
and in the quadratic set:
When there are 5 categories, the weights in the linear set are 1, 0.75, 0.50, 0.25 and 0 when there is a difference of 0 (=total agreement) or 1, 2, 3 and 4 categories respectively. In the quadratic set the weights are 1, 0.937, 0.750, 0.437 and 0.
Use linear weights when the difference between the first and second category has the same importance as a difference between the second and third category, etc. If the difference between the first and second category is less important than a difference between the second and third category, etc., use quadratic weights.
Agreement is quantified by the Kappa (K) statistic (Cohen, 1960; Fleiss et al., 2003):
- K is 1 when there is perfect agreement between the classification systems;
- K is 0 when there is no agreement better than chance;
- K is negative when agreement is worse than chance.
After you have clicked the Test button, the program will display the value for Kappa with its Standard Error and 95% confidence interval (CI) (Fleiss et al., 2003).
The Standard errors reported by SciStat.com are the appropriate standard errors for testing the hypothesis that the underlying value of weighted kappa is equal to a prespecified value other than zero (Fleiss et al., 2003).
The K value can be interpreted as follows (Altman, 1991):
|Value of K||Strength of agreement|
|0.21 - 0.40||Fair|
|0.41 - 0.60||Moderate|
|0.61 - 0.80||Good|
|0.81 - 1.00||Very good|
- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
- Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37-46.
- Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3rd ed. Hoboken: John Wiley & Sons.