Comparison of precision-recall curves


A precision-recall curve is a plot of the precision (positive predictive value, y-axis) against the recall (sensitivity, x-axis) for different thresholds. It is an alternative for the ROC curve (Saito & Rehmsmeier, 2015).

If MedCalc's comparison of precision-recall curves, the precision-recall curves of two dependent variables are constructed. "Dependent variables" means that the data of the two variables are derived from the same cases and are therefore paired.

MedCalc generates the precision-recall curves from the raw data (not from a sensitivity-PPV table), and calculates the difference between the areas under the two curves, together with the 95% BCa bootstrap confidence interval for this difference.

In order to create the precision-recall curves you should have the two measurements of interest (= the parameters you want to study) and an independent diagnosis which classifies your study subjects into two distinct groups: a diseased and non-diseased group. The latter diagnosis should be independent from the measurements of interest.

In the spreadsheet, create a column Classification and two columns for the variables of interest, e.g. Param1 and Param2. For every study subject enter a code for the classification as follows: 1 for the diseased cases, and 0 for the non-diseased or normal cases. In the Param1 and Param2 columns, enter the measurements of interest for each case on the same row (this can be measurements, grades, etc. - if the data are categorical, code them with numerical values).

Required input

  • Variables: select the two variables of interest.
  • Classification variable: select a dichotomous variable indicating diagnosis (0=negative, 1=positive).

    It is important to correctly identify the positive cases.

  • Filter: (optionally) a filter in order to include only a selected subgroup of cases (e.g. AGE>21, SEX="Male").
  • Options:
    • Bootstrap Confidence Intervals: select this option to calculate a confidence interval for the Area Under the Curve (AUC), and for the difference between AUCs using the bootstrap technique.
    • Advanced: click this button to specify the bootstrap parameters: number of replications and random number seed.
  • Graph:
    • Option to mark points corresponding to criterion values.


First MedCalc reports the following statistics for each variable:

  • The sample sizes in the positive and negative groups.
  • The area under the precision-recall curve (AUC), calculated using non-linear interpolation (Davis & Goadrich, 2006).
  • F1max: the F1 score is a measure of a test's accuracy, and is the harmonic mean of the precision and recall. It is calculated at each measurement level and F1max is the maximum F1 score over all measurement levels.

    F1 score = 2 x (Recall x Precision) / (Recall + Precision)

  • Associated criterion: the criterion (measurement level) at which F1max was reached.
  • If the corresponding option was selected, the program also gives the 95% BCa bootstrap confidence interval (Efron, 1987; Efron & Tibshirani, 1993) for AUC.

See also a note on Criterion values.

Next, MedCalc reports the difference between the two areas under the precision-recall curve (AUCs) and, if the corresponding option was selected, also the 95% BCa bootstrap confidence interval for this difference.


  • Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006.
  • Efron B (1987) Better Bootstrap Confidence Intervals. Journal of the American Statistical Association 82:171-185.
  • Efron B, Tibshirani RJ (1993) An introduction to the Bootstrap. Chapman & Hall/CRC.
  • Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. Plos One 10:e0118432. PubMed

See also


Go to Comparison of precision-recall curves.