# Summary statistics

## Description

Use Summary statistics to obtain the following statistics: sample size, range, mean, 95% confidence interval for the mean, median, 95% confidence interval for the median, variance, standard deviation (SD), relative standard deviation (=coefficient of variation), standard error of the mean (SEM), and 2.5th, 5th, 10th, 25th, 75th, 90th, 95^{th} and 97.5^{th} percentiles, etc.

## Required input

- Select the variable of interest.
- Optionally select a filter to include a subset of cases.
- If the variable needs Logarithmic transformation, select the corresponding option.
- Test for Normal distribution: see Tests for Normal distribution.

## Results

The results window for Summary statistics displays:

**Sample size**: the number of cases N is the number of numerical entries for the variable that fulfill the filter.

**Range**: the lowest and highest value of all observations.

**Arithmetic mean**: the arithmetic mean is the sum of all observations divided by the number of observations.

**95% CI for the mean**: a 95% confidence interval for the arithmetic mean, i.e. the range of values which contains the true population mean with probability 95%.

**Median**: when you have 100 observations, and these are sorted from smaller to larger, then the median is equal to the middle value. If the distribution of the data is Normal, then the median is equal to the arithmetic mean.

**95% CI for the median**: a 95% confidence interval for the median, i.e. the range of values which contains the true population median with probability 95%.

**Variance**: the variance is the mean of the square of the differences of all values with the arithmetic mean.

**Standard Deviation**: the standard deviation is the square root of the variance. When the distribution of the observations is Normal, then it can be assumed that 68% and 95% of all observations are located in the intervals Mean ± 1SD and Mean ± 2SD respectively.

**Relative Standard Deviation (RSD)**: this is the standard deviation divided by the mean. If appropriate, this number can be expressed as a percentage by multiplying it by 100 (coefficient of variation)

**Standard Error of the Mean (SEM)**: the SEM is used to calculate confidence intervals for the mean (see t-table).

**Skewness**: the coefficient of Skewness is a measure for the degree of symmetry in the variable distribution. If the corresponding P-value is low (P<0.05) then the variable symmetry is significantly different from that of a Normal distribution, which has a coefficient of Skewness equal to 0 (Sheskin, 2011) (see Skewness and Kurtosis). Minimal required sample size for Skewness: 2, for it's P-value: 8.

**Kurtosis**: The coefficient of Kurtosis is a measure for the degree of peakedness/flatness in the variable distribution. If the corresponding P-value is low (P<0.05) then the variable peakedness is significantly different from that of a Normal distribution, which has a coefficient of Kurtosis equal to 0 (Sheskin, 2011) (see Skewness and Kurtosis). Minimal required sample size for Kurtosis and it's P-value: 3.

**Test for Normal Distribution**: The result of this test is expressed as '*accept Normality*' or '*reject Normality*', with P value.

- If P is higher than 0.05, it may be assumed that the data have a Normal distribution and the conclusion '
*accept Normality*' is displayed. - If P is less than 0.05, then the hypothesis that the distribution of the observations in the sample is Normal, should be rejected, and the conclusion '
*reject Normality*' is displayed. In the latter case, the sample cannot accurately be described by arithmetic mean and standard deviation, and such samples should not be submitted to any parametrical statistical test or procedure, such as e.g. a t-test. To test the possible difference between not Normally distributed samples, the*Wilcoxon test*should be used, and correlation can be estimated by means of*rank correlation*. - When the sample size is small, it may not be possible to perform the selected test and an appropriate message will appear. In this case you can visually evaluate the symmetry and peakedness of the distribution using the Histogram or Cumulative frequency distribution.

**Percentiles**: when n observations are sorted from smaller to larger, then the p-th percentile is the value with rank number (Lentner, 1982; Schoonjans et al., 2011):

## Logarithmic transformation

If the option Logarithmic transformation was selected, the program will display the back-transformed results. The back-transformed mean is named the Geometric mean. Variance, Standard deviation and Standard error of the mean cannot be back-transformed meaningfully and are not reported.

## See also

## Link

Go to Summary statistics.