# Literature

- Abu-Arafeh A, Jordan H, Drummond G (2016) Reporting of method comparison studies: a review of advice, an assessment of current practice, and specific suggestions for future reports. British Journal of Anaesthesia 117:595-575.
- Altman DG (1980) Statistics and ethics in medical research. VI - Presentation of results. British Medical Journal 281:1542-1544.
- Altman DG (1991) Practical statistics for medical research. London: Chapman and Hall.
- Altman DG (1993) Construction of age-related reference centiles using absolute residuals. Statistics in Medicine 12:917-924.
- Altman DG (1998) Confidence intervals for the number needed to treat. British Medical Journal 317: 1309-1312.
- Altman DG, Chitty LS (1993) Design and analysis of studies to derive charts of fetal size. Ultrasound in Obstetrics and Gynecology 3:378-384.
- Altman DG, Chitty LS (1994) Charts of fetal size: 1. Methodology. British Journal of Obstetrics and Gynaecology 101:29-34.
- Altman DG, Gardner MJ (1988) Calculating confidence intervals for regression and correlation. British Medical Journal 296:1238-1242.
- Altman DG, Gore SM, Gardner MJ, Pocock SJ (1983) Statistical guidelines for contributors to medical journals. British Medical Journal 286:1489-1493.
- Armitage P (1955) Tests for linear trends in proportions and frequencies. Biometrics 11:375-386.
- Armitage P, Berry G, Matthews JNS (2002) Statistical methods in medical research. 4
^{th}ed. Blackwell Science. - Barnhart HX, Barborial DP (2009) Applications of the repeatability of quantitative imaging biomarkers: a review of statistical analysis of repeat data sets. Translational Oncology 2:231-235.
- Begg CB, Mazumdar M (1994) Operating characteristics of a rank correlation test for publication bias. Biometrics 50:1088-1101.
- Bellera CA, Hanley JA (2007) A method is presented to plan the required sample size when estimating regression-based reference limits. Journal of Clinical Epidemiology 60:610-615.
- Bewick V, Cheek L, Ball J (2004) Statistics review 10: further nonparametric methods. Critical Care 8:196-199.
- Bland M (2000) An introduction to medical statistics, 3
^{rd}ed. Oxford: Oxford University Press. - Bland M (2005) What is the origin of the formula for repeatability? https://www-users.york.ac.uk/~mb55/meas/repeat.htm
- Bland M (2006) How should I calculate a within-subject coefficient of variation? https://www-users.york.ac.uk/~mb55/meas/cv.htm
- Bland JM, Altman DG (1986) Statistical method for assessing agreement between two methods of clinical measurement. The Lancet i:307-310.
- Bland JM, Altman DG (1995) Comparing methods of measurement: why plotting difference against standard method is misleading. The Lancet 346:1085-1087.
- Bland M, Altman DG (1996) Statistics Notes: Measurement error proportional to the mean. British Medical Journal 313:106
- Bland JM, Altman DG (1997) Statistics notes: Cronbach's alpha. British Medical Journal 314:572.
- Bland JM, Altman DG (1999) Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8:135-160.
- Bland JM, Altman DG (2007) Agreement between methods of measurement with multiple observations per individual. Journal of Biopharmaceutical Statistics 17:571-582.
- Borenstein M, Hedges LV, Higgins JPT, Rothstein HR (2009) Introduction to meta-analysis. Chichester, UK: Wiley.
- Bulpitt CJ (1987) Confidence intervals. The Lancet i:494-497.
- Campbell I (2007) Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations. Statistics in Medicine 26:3661-3675.
- Campbell MJ, Gardner MJ (1988) Calculating confidence intervals for some non-parametric analyses. British Medical Journal, 296:1454-1456.
- Chitty LS, Altman DG, Henderson A, Campbell S (1994) Charts of fetal size: 2. Head Measurements. British Journal of Obstetrics and Gynaecology, 101:35-43.
- Clopper C, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26:404–413.
- CLSI (2003) Estimation of Total Analytical Error for Clinical Laboratory Methods; Approved Guideline. CLSI Document EP21-A. Wayne, PA: Clinical and Laboratory Standards Institute.
- CLSI (2008) Defining, establishing, and verifying reference intervals in the clinical laboratory; Approved guideline - 3
^{rd}edition. CLSI Document C28-A3. Wayne, PA: Clinical and Laboratory Standards Institute. - CLSI (2012) Evaluation of detection capability for clinical laboratory measurement procedures; Approved guideline - 2
^{nd}edition. CLSI Document EP17-A2. Wayne, PA: Clinical and Laboratory Standards Institute. - CLSI (2018) Measurement procedure comparison and bias estimation using patient samples. 3
^{rd}ed. CLSI guideline EP09c. Wayne, PA: Clinical and Laboratory Standards Institute. - Cochran WG (1954) Some methods for strengthening the common chi-squared tests. Biometrics 10:417-451.
- Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:37-46.
- Cohen J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70:213-220.
- Conover WJ (1999) Practical nonparametric statistics, 3
^{rd}edition. New York: John Wiley & Sons. - Cornbleet PJ, Gochman N (1979) Incorrect least-squares regression coefficients in method-comparison analysis. Clinical Chemistry 25:432-438.
- Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16:297-334.
- Daly LE(1998) Confidence limits made easy: interval estimation using a substitution method. American Journal of Epidemiology 147: 783-790.
- Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. Proceedings of the 23
^{rd}International Conference on Machine Learning, Pittsburgh, PA, 2006. - DeLong ER, DeLong DM, Clarke-Pearson DL (1988): Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837-845.
- DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Controlled Clinical Trials 7:177-188.
- Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6:241-252.
- Efron B (1987) Better Bootstrap Confidence Intervals. Journal of the American Statistical Association 82:171-185.
- Efron B, Tibshirani RJ (1993) An introduction to the Bootstrap. Chapman & Hall/CRC.
- Egger M, Smith GD, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315: 629–634.
- Finney DJ (1947) Probit Analysis. A statistical treatment of the sigmoid response curve. Cambridge: Cambridge University Press.
- Feldt LS (1965) The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty. Psychometrika 30:357-371.
- Fleiss JL (1981) Statistical methods for rates and proportions, 2
^{nd}edn. New York: John Wiley & Sons. - Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions, 3
^{rd}ed. Hoboken: John Wiley & Sons. - Forkman J (2009) Estimator and tests for common coefficients of variation in normal distributions. Communications in Statistics - Theory and Methods 38:233-251.
- Gardner MJ, Altman DG (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal 292:746-750.
- Girden ER (1992) ANOVA: repeated measures. Sage University Papers Series on Quantitative Applications in the Social Sciences, 84. Thousand Oaks, CA: Sage.
- Glantz SA, Slinker BK (2001) Primer of applied regression & analysis of variance. 2
^{nd}ed. McGraw-Hill. - Greenhouse SW, Geisser S (1959) On methods in the analysis of profile data. Psychometrika 24:95-112.
- Greiner M, Pfeiffer D, Smith RD (2000) Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Preventive Veterinary Medicine 45:23-41.
- Griner PF, Mayewski RJ, Mushlin AI, Greenland P (1981) Selection and interpretation of diagnostic tests and procedures. Annals of Internal Medicine 94:555-600.
- Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11:1-21.
- Hanley H (1986) Analysis of Crude Data. In: Modern Epidemiology, ed Rothman KJ. Boston: Little, Brown & Co.
- Hanley JA, Hajian-Tilaki KO (1997) Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Academic Radiology 4:49-58.
- Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29-36.
- Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
- Hedges LV, Olkin I (1985) Statistical methods for meta-analysis. London: Academic Press.
- Higgins JPT, Green S (editors) (2011) Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011.
- Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327:557-560.
- Hilgers RA (1991) Distribution-free confidence bounds for ROC curves. Methods of Information in Medicine 30:96-101.
- Hinkle DE, Wiersma W, Jurs SG (1988) Applied statistics for the behavioral sciences. 2
^{nd}ed. Boston: Houghton Mifflin Company. - Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied Logistic Regression. Third Edition. New Jersey: John Wiley & Sons.
- Huitema BE (1980) The analysis of covariance and alternatives. Wiley-Interscience.
- Husted JA, Cook RJ, Farewell VT, Gladman DD (2000) Methods for assessing responsiveness: a critical review and recommendations. Journal of Clinical Epidemiology 53:459-168.
- Huynh H, Feldt LS (1976) Estimation of the Box correction for degrees of freedom from sample data in randomised block and split-plot designs. Journal of Educational Statistics 1:69-82.
- Hyslop NP, White WH (2009) Estimating precision using duplicate measurements. Journal of the Air & Waste Management Association 59:1032-1039.
- Jones R, Payne B (1997) Clinical investigation and statistics in laboratory medicine. London: ACB Venture Publications.
- Krouwer JS (2008) Why Bland-Altman plots should use
*X*, not (*Y*+*X*)/2 when*X*is a reference method. Statistics in Medicine 27:778-780. - Krouwer JS, Monti KL (1995) A simple, graphical method to evaluate laboratory assays. Eur J Clin Chem Clin Biochem 33:525-527.
- Lecoutre B (1991) A correction for the e approximate test in repeated measures designs with two or more independent groups. Journal of Educational Statistics 16:371-372.
- Lentner C (Ed) (1982) Geigy Scientific Tables, 8
^{th}edition, Volume 2. Basle: Ciba-Geigy Limited. - Lin L.I-K (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics 45:255-268.
- Lin L.I-K (2000) A note on the concordance correlation coefficient. Biometrics 56:324-325.
- Linnet K, Boyd JC (2012) Selection and analytical evaluation of methods - with statistical techniques. In Burtis CA, Ashwood ER, Bruns DE (eds). Tietz Textbook of Clinical Chemistry and Molecular Diagnostics (5th ed). Elsevier Saunders, St Louis, MO, pp. 201-228.
- Long JS (1997) Regression Models for categorical and limited dependent variables. Thousand Oaks, CA: Sage Publications.
- Lu MJ, Zhong WH, Liu YX, Miao HZ, Li YC, Ji MH (2016) Sample size for assessing agreement between two methods of measurement by Bland-Altman method. The International Journal of Biostatistics 12: issue 2 (8 pp).
- Machin D, Campbell MJ, Tan SB, Tan SH (2009) Sample size tables for clinical studies. 3
^{rd}ed. Chichester: Wiley-Blackwell. - Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation 8:3-30.
- Matthews JN, Altman DG, Campbell MJ, Royston P (1990) Analysis of serial measurements in medical research. Britisch Medical Journal 300:230-235.
- McBride GB (2005) A proposal for strength-of-agreement criteria for Lin's Concordance Correlation Coefficient. NIWA Client Report: HAM2005-062.
- McGill R, Tukey JW, Larsen WA (1978) Variations of box plots. The American Statistician, 32:12-16.
- McGraw KO, Wong SP (1996) Forming inferences about some intraclass correlation coefficients. Psychological Methods 1:30-46. (Correction: 1:390).
- Mercaldo ND, Lau KF, Zhou XH (2007) Confidence intervals for predictive values with an emphasis to case-control studies. Statistics in Medicine 26:2170-2183.
- Metz CE (1978) Basic principles of ROC analysis. Seminars in Nuclear Medicine 8:283-298.
- NCCLS (2000) How to define and determine reference intervals in the clinical laboratory: approved guideline - second edition. NCCLS document C28-A2. Wayne, PA: NCCLS.
- Neter J, Wasserman W, Whitmore GA (1988) Applied statistics. 3
^{rd}ed. Boston: Allyn and Bacon, Inc. - Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models. 4
^{th}ed. McGraw-Hill. - Norman GR, Wyrwich KW, Patrick DL (2007) The mathematical relationship among different forms of responsiveness coefficients. Quality of Life Research 16:815-822.
- Pampel FC (2000) Logistic regression: A primer. Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-132. Thousand Oaks, CA: Sage.
- Passing H, Bablok W (1983) A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in Clinical Chemistry, Part I. J Clin Chem Clin Biochem 21:709-720.
- Petrie A, Bulman JS, Osborn JF (2003) Further statistics in dentistry. Part 8: systematic reviews and meta-analyses. British Dental Journal 194:73-78.
- Pocock SJ (1984) Clinical trials. A practical approach. Chichester: John Wiley & Sons.
- Reed AH, Henry RJ, Mason WB (1971) Influence of statistical method used on the resulting estimate of normal range. Clinical Chemistry 17:275-284.
- Richardson JTE (2011) The analysis of 2 x 2 contingency tables - Yet again. Statistics in Medicine 30:890.
- Rosner B (1983) Percentage points for a generalized ESD many-outlier procedure. Technometrics 25:165-172.
- Rosner B (2006) Fundamentals of Biostatistics. 6
^{th}ed. Pacific Grove: Duxbury. - Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. Plos One 10:e0118432.
- Schoonjans F, De Bacquer D, Schmid P (2011) Estimation of population percentiles. Epidemiology 22: 750-751.
- Schuirmann DJ (1987) A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics 15:657-680.
- Schwartz D, Mayaux MJ (1980) Mode of evaluation of results in artificial insemination. In: Human Artificial Insemination and Semen Preservation (eds David G and Price WS). New York: Plenum Press, pp 197-210.
- Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52: 3-4.
- Sheskin DJ (2004) Handbook of parametric and nonparametric statistical procedures. 3
^{rd}ed. Boca Raton: Chapman & Hall /CRC. - Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures. 5
^{th}ed. Boca Raton: Chapman & Hall /CRC. - Shrout PE, Fleiss JL (1979) Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin 86:420-428.
- Snedecor GW, Cochran WG (1989) Statistical methods, 8
^{th}edition. Ames, Iowa: Iowa State University Press. - Spiegel MR (1961) Theory and problems of statistics. New York: McGraw-Hill Book Company.
- Sterne JAC, Egger E (2001) Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis. Journal of Clinical Epidemiology 54:1046–1055.
- Sterne JAC, Sutton AJ, Ioannidis JPA et al. (2011) Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ 2011;343:d4002.
- Stöckl D, Rodríguez Cabaleiro D, Van Uytfanghe K, Thienpont LM (2004) Interpreting method comparison studies by use of the Bland-Altman plot: reflecting the importance of sample size by incorporating confidence limits and predefined error limits in the graphic. Clinical Chemistry 50:2216-2218.
- Synek V (2008) Evaluation of the standard deviation from duplicate results. Accreditation and Quality Assurance 13:335-337.
- Tukey JW (1977) Exploratory data analysis. Reading, Mass: Addison-Wesley Publishing Company.
- Westfall PH (2014) Kurtosis as Peakedness, 1905 - 2014. R.I.P. The American Statistician 68:191-195.
- Westgard JO, Barry PL, Hunt MR, Groth T (1981) A multi-rule Shewhart chart for Quality Control in Clinical Chemistry. Clinical Chemistry 27:493-501.
- Westgard JO (2008) Basic method validation. 3
^{rd}ed. Madison: Westgard QC, Inc. - Wildt AR, Ahtola OT (1978) Analysis of covariance. Sage Publications.
- Wright EM, Royston P (1997) Simplified estimation of age-specific reference intervals for skewed data. Statistics in Medicine 16:2785-2803.
- Wright EM, Royston P (1997) A comparison of statistical methods for age-related reference intervals. Journal of the Royal Statistical Society, A 160:47-69.
- Youden WJ (1950) An index for rating diagnostic tests. Cancer 3:32-35.
- Youden WJ (1959) Graphical diagnosis of interlaboratory test results. Industrial Quality Control 15:24-28.
- Zhou XH, NA Obuchowski, DK McClish (2002) Statistical methods in diagnostic medicine. New York: Wiley.
- Zou GY (2013) Confidence interval estimation for the Bland-Altman limits of agreement with multiple observations per individual. Statistics in Medicine 22:630-642.
- Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry 39:561-577.