Close
SciStat
 

Outlier detection

Description

Outlier detection is used to detect anomalous observations in sample data.

Required input

  • Select the variable of interest.

  • Optionally select a filter to include a subset of cases.

Methods of outlier detection

  • Grubbs - left-sided: check only the smallest value(*) (Grubbs, 1969).
  • Grubbs - right-sided: check only the largest value(*) (Grubbs, 1969).
  • Grubbs - double-sided: check the most extreme value at either side(*) (Grubbs, 1969).
  • Generalized ESD test: the Generalized Extreme Studentized Deviate (ESD) procedure can detect multiple outliers in one step (Rosner, 1983).
    • test for maximum number of outliers: enter the maximum number of outliers to detect.
  • Tukey: check for multiple outliers at either side, categorized as 'outside' or 'far out' values (Tukey, 1977).
    • An outside value is defined as a value that is smaller than the lower quartile minus 1.5 times the interquartile range, or larger than the upper quartile plus 1.5 times the interquartile range (the 'inner fences').
    • A far out value is defined as a value that is smaller than the lower quartile minus 3 times the interquartile range, or larger than the upper quartile plus 3 times the interquartile range (the 'outer fences').

(*) The single-sided Grubbs' tests are more sensitive than the double-sided test.

Options

  • Alpha level for Grubbs' and ESD test: select the alpha-level (ranging from 0.10 to 0.001), applicable only in Grubbs's test and the Generalized ESD test. With a bigger alpha-level the test will be more sensitive and outliers will more rapidly be detected; however, this may result in false-positive results.
  • Logarithmic transformation: the outlier detection methods assume that the data follow an approximately normal distribution (see next option). Sometimes data should be logarithmically transformed before analysis. See Logarithmic transformation.
  • Test for Normal distribution: see Tests for Normal distribution.

Results

Summary statistics

  • Summary statistics for the selected data are displayed. See Summary statistics results.
  • If the test for Normal distribution reports 'reject Normality' the outlier detection methods may be invalid since they assume that the data follow an approximately normal distribution. Perhaps data should have been logarithmically transformed before analysis.

Suspected outliers

The program lists the outliers identified by the different methods (see above). When you click on one of the values, SciStat.com will locate the value in the data table editor.

What to do when you have identified an outlier

Do not remove outliers automatically.

  • Remove outliers (see Exclude & Include) only when a cause can be found for the spurious result, such as a pre-, post-, or analytical error.

    When you conclude that a pre-, post-, or analytical error is the cause of the spurious result, be aware that the same errors may exist in the other data values.
  • Check the distribution of the data. Logarithmically transformed sample data may more closely follow a Normal distribution. Graph the data with and without logarithmic transformation, for example using a Box-and-Whisker plot.
  • You may consider to replace the outlier value with the next highest/lowest (non-outlier) number.
  • Keep the outlier but use robust or nonparametric statistical methods that do not assume that data are Normally distributed.
  • Do the statistical analysis and report conclusions both with and without the suspected outlier.

In all cases, report the outliers and how you have dealt with them.

Literature

  • Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11:1-21.
  • Rosner B (1983) Percentage points for a generalized ESD many-outlier procedure. Technometrics 25:165-172.
  • Tukey JW (1977) Exploratory data analysis. Reading, Mass: Addison-Wesley Publishing Company.

See also

Link

Go to Outlier detection.