# Precision-recall curve

## Description

A precision-recall curve is a plot of the precision (positive predictive value, y-axis) against the recall (sensitivity, x-axis) for different thresholds. It is an alternative for the ROC curve (Saito & Rehmsmeier, 2015).

SciStat generates the precision-recall curve from the raw data (not from a sensitivity-PPV table).

## How to enter data for a precision-recall curve

In order to create a precision-recall curve you should have a measurement of interest (= the parameter you want to study) and an independent diagnosis which classifies your study subjects into two distinct groups: a diseased and non-diseased group. The latter diagnosis should be independent from the measurement of interest.

In the spreadsheet, create a column Classification and a column for the variable of interest, e.g. Param. For every study subject enter a code for the classification as follows: 1 for the diseased cases, and 0 for the non-diseased or normal cases. In the Param column, enter the measurement of interest (this can be measurements, grades, etc. - if the data are categorical, code them with numerical values).

## Required input

**Variable**: select the variable of interest.**Classification variable**: select a dichotomous variable indicating diagnosis (0=negative, 1=positive).It is important to correctly identify the positive cases.

**Filter**: (optionally) a filter in order to include only a selected subgroup of cases (e.g. AGE>21, SEX="Male").**Options:**- Bootstrap Confidence Intervals: select this option to calculate a confidence interval for the Area Under the Curve (AUC) using the bootstrap technique.
- Advanced: click this button to specify the bootstrap parameters: number of replications and random number seed.

**Graph:**- Option to mark points corresponding to criterion values.

## Results

SciStat reports:

- The sample sizes in the positive and negative groups.
- The area under the precision-recall curve (AUC), calculated using non-linear interpolation (Davis & Goadrich, 2006).
- F1
_{max}: the F1 Score is a measure of a test's accuracy, and is the harmonic mean of the precision and recall. It is calculated at each measurement level and F1_{max}is the maximum F1 score over all measurement levels.F1 score = 2 x (Recall x Precision) / (Recall + Precision)

- Associated criterion: the criterion (measurement level) at which F1
_{max}was reached. - If the corresponding option was selected, the program also gives the 95% BC
_{a}bootstrap confidence interval (Efron, 1987; Efron & Tibshirani, 1993) for AUC.

See also a note on Criterion values.

## Literature

- Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. Proceedings of the 23
^{rd}International Conference on Machine Learning, Pittsburgh, PA, 2006. - Efron B (1987) Better Bootstrap Confidence Intervals. Journal of the American Statistical Association 82:171-185.
- Efron B, Tibshirani RJ (1993) An introduction to the Bootstrap. Chapman & Hall/CRC.
- Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. Plos One 10:e0118432.

## See also

- Comparison of precision-recall curves
- ROC curve analysis: theory summary
- Graph options
- More help on variables
- More help on filters

## Link

Go to Precision-recall curve.