# Case-control matching

## Description

The case-control matching procedure is used to randomly match cases and controls based on specific criteria. SciStat.com can match on up to 4 different variables.

## Required input

**Classification variable**: select or enter a dichotomous variable indicating group membership (0=control, 1=case).**Variable with case identification**: select a variable that contains a unique identification code for each subject in the spreadsheet. If you do not select a variable here (not recommended), MedCalc will use row numbers as case identification.**Filter**: (optionally) a filter in order to include only a selected subgroup of cases (e.g. SEX="Male").**Match on**: select up to 4 variables and for each variable the maximum allowable difference (caliper). Smaller calipers will result in reduced bias and closer matches, but may also result in a smaller number of matches. Select the option "Exact match" to match on a variable that is not numerical (for example, a variable "Gender" that is coded 'Male' and 'Female').**Advanced**: click the Advanced button to enter the number of iterations and the random number seed.**Number of iterations**: enter the number of iterations replications. High numbers increase the probability of finding a higher number of matched cases with better matching characteristics or less average differences between matched cases.**Random-number seed**: this is the seed for the random number generator. Enter 0 for a random seed; this can result in a different set of matches when the procedure is repeated. Any other value will give a repeatable "random" sequence, which will result in repeatable sets of matches. MedCalc uses the Mersenne twister as a random number generator (implementation MT19937) (Matsumoto & Nishimura, 1998).

## Results

The program reports the total number of subjects, number of cases, number of controls and the number of matched cases, i.e. the number of cases for which a matching control has been found.

Next, the mean difference between the matched subjects are given, with mean difference, SD, 95% CI of the difference and associated P-value (paired samples t-test). The 95% confidence intervals should be small and neglectable. P-values should be non-significant. If for one or more variables the confidence interval is large or the P-value is significant, the "maximum allowable difference" entered in the input dialog box (see above) was probably too large.

### Save match IDs in spreadsheet column

Click the "Save match IDs..." button to create a new column in the spreadsheet with for each case the identification of the matched control (and vice-versa).

In subsequent statistical analyses this new column can be used in a filter in order to include only cases and controls for which a match was found.

E.g. if the new column has MATCH_ID as a heading, the filter could be MATCH_ID>0 or MATCH_ID<>"" (<> means Not Equal To).

### Save as new file with paired data

Click the "Save new file..." button to create a new MedCalc data file in which the data are rearranged as follows:

- The file includes the data of cases with matching controls only.
- A first set of columns contains the data of the cases. The heading of these columns is the original heading with "_T" appended. A second set of columns contains the data of the controls. The heading of these columns is the original heading with "_C" appended.
- On each row, the data of a case and its matching control is given.

This new datafile will allow to perform statistical tests on paired data.

## Literature

- Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation 8:3-30.

## See also

- Logistic regression for calculation of propensity scores

## Link

Go to Case-control matching.