For this example I used SPSS to generate five variables as random samples (of 20 cases each) from a normally distributed population. These samples are independent of each other, and the population correlation would be 0.0. The results, without the data, are shown below. Notice that the intercorrelation matrix shows you the correlation, below that the sample size, and below that the two-tailed significance level. (Thus, for example, when the true correlation between X1 and X2 in the population is 0.00, a sample correlation as extreme as ±.1127 would occur 63.6 percent of the time.)
- - Correlation Coefficients - - X1 X2 X3 X4 X5 X1 1.0000 -.1127 .2541 -.3364 .1563 ( 20) ( 20) ( 20) ( 20) ( 20) P= . P= .636 P= .280 P= .147 P= .511 X2 -.1127 1.0000 -.1044 .1905 .0451 ( 20) ( 20) ( 20) ( 20) ( 20) P= .636 P= . P= .661 P= .421 P= .850 X3 .2541 -.1044 1.0000 -.1739 .3960 ( 20) ( 20) ( 20) ( 20) ( 20) P= .280 P= .661 P= . P= .464 P= .084 X4 -.3364 .1905 -.1739 1.0000 -.1503 ( 20) ( 20) ( 20) ( 20) ( 20) P= .147 P= .421 P= .464 P= . P= .527 X5 .1563 .0451 .3960 -.1503 1.0000 ( 20) ( 20) ( 20) ( 20) ( 20) P= .511 P= .850 P= .084 P= .527 P= . (Coefficient / (Cases) / 2-tailed Significance)
A scatterplot of these data follows:
To give you a sense of the relationship between sample size and the variablility of correlation coefficients, I have repeated the previous example, but this time I have generated 200 cases. Because the correlations are based on much more data, they should hover more closely around the true population correlation of 0.00. Can you see this in the following set of data?
- - Correlation Coefficients - - X1 X2 X3 X4 X5 X1 1.0000 -.0002 .0500 .0236 .0072 ( 200) ( 200) ( 200) ( 200) ( 200) P= . P= .998 P= .482 P= .741 P= .919 X2 -.0002 1.0000 -.0378 .1233 .0306 ( 200) ( 200) ( 200) ( 200) ( 200) P= .998 P= . P= .595 P= .082 P= .667 X3 .0500 -.0378 1.0000 .1810 -.0225 ( 200) ( 200) ( 200) ( 200) ( 200) P= .482 P= .595 P= . P= .010 P= .751 X4 .0236 .1233 .1810 1.0000 -.0168 ( 200) ( 200) ( 200) ( 200) ( 200) P= .741 P= .082 P= .010 P= . P= .814 X5 .0072 .0306 -.0225 -.0168 1.0000 ( 200) ( 200) ( 200) ( 200) ( 200) P= .919 P= .667 P= .751 P= .814 P= . (Coefficient / (Cases) / 2-tailed Significance) " . " is printed if a coefficient cannot be computed
Notice that there is one Type I error here. (Remember that a Type I error consists of rejecting the null hypothesis when it is in fact true. Since I drew all of my samples independently, the true correlation is the population would in fact be 0.00.) Can you find the Type I error? What do you think happens to the probability of a Type I error when we work at a = .05, but run many hypothesis tests? (How many tests did we actually run here?)
Last revised: 7/13/98