The case-control matching procedure is used to randomly match cases and controls based on specific criteria. SciStat.com can match on up to 4 different variables.
- Classification variable: select or enter a dichotomous variable indicating group membership (0=control, 1=case).
- Variable with case identification: select a variable that contains a unique identification code for each subject in the spreadsheet. If you do not select a variable here (not recommended), MedCalc will use row numbers as case identification.
- Filter: (optionally) a filter in order to include only a selected subgroup of cases (e.g. SEX="Male").
- Match on: select up to 4 variables and for each variable the maximum allowable difference (caliper). Smaller calipers will result in reduced bias and closer matches, but may also result in a smaller number of matches. Select the option "Exact match" to match on a variable that is not numerical (for example, a variable "Gender" that is coded 'Male' and 'Female').
- Advanced: click the Advanced button to enter the number of iterations and the random number seed.
- Number of iterations: enter the number of iterations replications. High numbers increase the probability of finding a higher number of matched cases with better matching characteristics or less average differences between matched cases.
- Random-number seed: this is the seed for the random number generator. Enter 0 for a random seed; this can result in a different set of matches when the procedure is repeated. Any other value will give a repeatable "random" sequence, which will result in repeatable sets of matches. MedCalc uses the Mersenne twister as a random number generator (implementation MT19937) (Matsumoto & Nishimura, 1998).
The program reports the total number of subjects, number of cases, number of controls and the number of matched cases, i.e. the number of cases for which a matching control has been found.
Next, the mean difference between the matched subjects are given, with mean difference, SD, 95% CI of the difference and associated P-value (paired samples t-test). The 95% confidence intervals should be small and neglectable. P-values should be non-significant. If for one or more variables the confidence interval is large or the P-value is significant, the "maximum allowable difference" entered in the input dialog box (see above) was probably too large.
Save match IDs in spreadsheet column
Click the "Save match IDs..." button to create a new column in the spreadsheet with for each case the identification of the matched control (and vice-versa).
In subsequent statistical analyses this new column can be used in a filter in order to include only cases and controls for which a match was found.
E.g. if the new column has MATCH_ID as a heading, the filter could be MATCH_ID>0 or MATCH_ID<>"" (<> means Not Equal To).
Save as new file with paired data
Click the "Save new file..." button to create a new MedCalc data file in which the data are rearranged as follows:
- The file includes the data of cases with matching controls only.
- A first set of columns contains the data of the cases. The heading of these columns is the original heading with "_T" appended. A second set of columns contains the data of the controls. The heading of these columns is the original heading with "_C" appended.
- On each row, the data of a case and its matching control is given.
This new datafile will allow to perform statistical tests on paired data.
- Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation 8:3-30.
- Logistic regression for calculation of propensity scores
Go to Case-control matching.