UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SPSS Learning Module
An overview of statistical tests

1. Introduction and description of data

We will present sample programs for some basic statistical tests in SPSS, including  t-tests, chi square, correlation, regression, and analysis of variance.  These examples use the auto data file.  The program below reads the data and creates a temporary SPSS data file.  (In order to demonstrate how these commands handle missing values, some of the values of mpg have been set to be missing for the AMC cars.  This differs from the data files for other modules where the AMC cars have valid data for mpg.)

DATA LIST FIXED/
   make  (A17) price 19-23 mpg 25-26 rep78 28 hdroom 30-32 
   trunk 34-35 weight 37-40 length 42-44 turn 46-47 
   displ 49-51 gratio 53-56 foreign 58 .
BEGIN DATA.
AMC Concord        4099    3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer          4749    3 3.0 11 3350 173 40 258 2.53 0
AMC Spirit         3799      3.0 12 2640 168 35 121 3.08 0
Audi 5000          9690 17 5 3.0 15 2830 189 37 131 3.20 1
Audi Fox           6295 23 3 2.5 11 2070 174 36  97 3.70 1
BMW 320i           9735 25 4 2.5 12 2650 177 34 121 3.64 1
Buick Century      4816 20 3 4.5 16 3250 196 40 196 2.93 0
Buick Electra      7827 15 4 4.0 20 4080 222 43 350 2.41 0
Buick LeSabre      5788 18 3 4.0 21 3670 218 43 231 2.73 0
Buick Opel         4453 26   3.0 10 2230 170 34 304 2.87 0
Buick Regal        5189 20 3 2.0 16 3280 200 42 196 2.93 0
Buick Riviera     10372 16 3 3.5 17 3880 207 43 231 2.93 0
Buick Skylark      4082 19 3 3.5 13 3400 200 42 231 3.08 0
Cad. Deville      11385 14 3 4.0 20 4330 221 44 425 2.28 0
Cad. Eldorado     14500 14 2 3.5 16 3900 204 43 350 2.19 0
Cad. Seville      15906 21 3 3.0 13 4290 204 45 350 2.24 0
Chev. Chevette     3299 29 3 2.5  9 2110 163 34 231 2.93 0
Chev. Impala       5705 16 4 4.0 20 3690 212 43 250 2.56 0
Chev. Malibu       4504 22 3 3.5 17 3180 193 31 200 2.73 0
Chev. Monte Carlo  5104 22 2 2.0 16 3220 200 41 200 2.73 0
Chev. Monza        3667 24 2 2.0  7 2750 179 40 151 2.73 0
Chev. Nova         3955 19 3 3.5 13 3430 197 43 250 2.56 0
Datsun 200         6229 23 4 1.5  6 2370 170 35 119 3.89 1
Datsun 210         4589 35 5 2.0  8 2020 165 32  85 3.70 1
Datsun 510         5079 24 4 2.5  8 2280 170 34 119 3.54 1
Datsun 810         8129 21 4 2.5  8 2750 184 38 146 3.55 1
Dodge Colt         3984 30 5 2.0  8 2120 163 35  98 3.54 0
Dodge Diplomat     4010 18 2 4.0 17 3600 206 46 318 2.47 0
Dodge Magnum       5886 16 2 4.0 17 3600 206 46 318 2.47 0
Dodge St. Regis    6342 17 2 4.5 21 3740 220 46 225 2.94 0
Fiat Strada        4296 21 3 2.5 16 2130 161 36 105 3.37 1
Ford Fiesta        4389 28 4 1.5  9 1800 147 33  98 3.15 0
Ford Mustang       4187 21 3 2.0 10 2650 179 43 140 3.08 0
Honda Accord       5799 25 5 3.0 10 2240 172 36 107 3.05 1
Honda Civic        4499 28 4 2.5  5 1760 149 34  91 3.30 1
Linc. Continental 11497 12 3 3.5 22 4840 233 51 400 2.47 0
Linc. Mark V      13594 12 3 2.5 18 4720 230 48 400 2.47 0
Linc. Versailles  13466 14 3 3.5 15 3830 201 41 302 2.47 0
Mazda GLC          3995 30 4 3.5 11 1980 154 33  86 3.73 1
Merc. Bobcat       3829 22 4 3.0  9 2580 169 39 140 2.73 0
Merc. Cougar       5379 14 4 3.5 16 4060 221 48 302 2.75 0
Merc. Marquis      6165 15 3 3.5 23 3720 212 44 302 2.26 0
Merc. Monarch      4516 18 3 3.0 15 3370 198 41 250 2.43 0
Merc. XR-7         6303 14 4 3.0 16 4130 217 45 302 2.75 0
Merc. Zephyr       3291 20 3 3.5 17 2830 195 43 140 3.08 0
Olds 98            8814 21 4 4.0 20 4060 220 43 350 2.41 0
Olds Cutl Supr     5172 19 3 2.0 16 3310 198 42 231 2.93 0
Olds Cutlass       4733 19 3 4.5 16 3300 198 42 231 2.93 0
Olds Delta 88      4890 18 4 4.0 20 3690 218 42 231 2.73 0
Olds Omega         4181 19 3 4.5 14 3370 200 43 231 3.08 0
Olds Starfire      4195 24 1 2.0 10 2730 180 40 151 2.73 0
Olds Toronado     10371 16 3 3.5 17 4030 206 43 350 2.41 0
Peugeot 604       12990 14   3.5 14 3420 192 38 163 3.58 1
Plym. Arrow        4647 28 3 2.0 11 3260 170 37 156 3.05 0
Plym. Champ        4425 34 5 2.5 11 1800 157 37  86 2.97 0
Plym. Horizon      4482 25 3 4.0 17 2200 165 36 105 3.37 0
Plym. Sapporo      6486 26   1.5  8 2520 182 38 119 3.54 0
Plym. Volare       4060 18 2 5.0 16 3330 201 44 225 3.23 0
Pont. Catalina     5798 18 4 4.0 20 3700 214 42 231 2.73 0
Pont. Firebird     4934 18 1 1.5  7 3470 198 42 231 3.08 0
Pont. Grand Prix   5222 19 3 2.0 16 3210 201 45 231 2.93 0
Pont. Le Mans      4723 19 3 3.5 17 3200 199 40 231 2.93 0
Pont. Phoenix      4424 19   3.5 13 3420 203 43 231 3.08 0
Pont. Sunbird      4172 24 2 2.0  7 2690 179 41 151 2.73 0
Renault Le Car     3895 26 3 3.0 10 1830 142 34  79 3.72 1
Subaru             3798 35 5 2.5 11 2050 164 36  97 3.81 1
Toyota Celica      5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla     3748 31 5 3.0  9 2200 165 35  97 3.21 1
Toyota Corona      5719 18 5 2.0 11 2670 175 36 134 3.05 1
Volvo 260         11995 17 5 2.5 14 3170 193 37 163 2.98 1
VW Dasher          7140 23 4 2.5 12 2160 172 36  97 3.74 1
VW Diesel          5397 41 5 3.0 15 2040 155 35  90 3.78 1
VW Rabbit          4697 25 4 3.0 15 1930 155 35  89 3.78 1
VW Scirocco        6850 25 4 2.0 16 1990 156 36  97 3.78 1
END DATA.
FORMATS hdroom (F3.1) gratio (F4.2) .

The data has missing values which were left blank, and the long character variable make which contains blanks. Thus fixed field input was used with columns ranges specified.

2. T-tests

We can use t-test to determine whether the average mpg for domestic cars differ from the mean for foreign cars.

T-TEST
  /GROUPS=foreign(0 1)
  /VARIABLES=mpg .

Here is the output produced by the T-TEST.  The results show that foreign cars have significantly higher gas mileage ( mpg ) than domestic cars. Note that the overall N is 71 (not 74).  This is because mpg was missing for 3 of the observations, so those observations were omitted from the analysis.

t-tests for Independent Samples of FOREIGN

                             Number
 Variable                   of Cases       Mean          SD   SE of Mean
 -----------------------------------------------------------------------
 MPG
 FOREIGN 0                    49        19.7959       4.852         .693
 FOREIGN 1                    22        24.7727       6.611        1.410
 -----------------------------------------------------------------------
          Mean Difference = -4.9768
          Levene's Test for Equality of Variances: F= 1.618  P= .208

       t-test for Equality of Means                                      95%
 Variances   t-value       df    2-Tail Sig     SE of Diff           CI for Diff
 -------------------------------------------------------------------------------
 Equal         -3.56       69          .001          1.398      (-7.766, -2.188)
 Unequal       -3.17    31.58          .003          1.571      (-8.178, -1.776)
 -------------------------------------------------------------------------------

Note that the output provides two t values, one assuming that the variances are Unequal and another assuming that the variances are Equal. Above the t-test output notice "Levene's Test for Equality of Variances", which tests whether the variances are equal.   The test for equal variances has an F value of 1.618, with a p-value of 0.208, which indicates that the variances of the two groups do not significantly differ. Therefore the Equal variance t-test would be the appropriate test to use.  In this case, we would report a t value of -3.56 with a p value of 0.001, concluding that the mean mpg for foreign cars is significantly greater than the mpg for domestic cars.  Had the F test of equal variances been significant, then the Unequal variance t value (-3.17) would have been the appropriate value to use.  This is especially important when the sample sizes for the two groups differ, because when the variances of the two groups differ and the sample sizes of the two groups differ, then the results assuming Equal variances can be quite inaccurate and could differ from the Unequal variance result.

3. Chi-square tests

We can use crosstabs to examine the repair records of the cars (rep78, where 1 is the word repair record, 5 is the best repair record) by foreign (foreign coded 1, domestic coded 0).  Use the chissq keyword on the /statistics= subcommand to request a chi-square test. This test determines if these two variables are independent. The program is shown below.

CROSSTABS
  /TABLES=rep78  BY foreign
  /STATISTICS=CHISQ .

The results are shown below, presenting the crosstab first and then following with the chi-square test.

REP78  by  FOREIGN

                    FOREIGN     Page 1 of 1
            Count  |
                   |
                   |                Row
                   |     0|     1| Total
REP78      --------+------+------+
                1  |     2|      |     2
                   |      |      |   2.9
                   +------+------+
                2  |     8|      |     8
                   |      |      |  11.6
                   +------+------+
                3  |    27|     3|    30
                   |      |      |  43.5
                   +------+------+
                4  |     9|     9|    18
                   |      |      |  26.1
                   +------+------+
                5  |     2|     9|    11
                   |      |      |  15.9
                   +------+------+
            Column      48     21     69
             Total    69.6   30.4  100.0

     Chi-Square                Value        DF     Significance
--------------------        -----------     ----    ------------
Pearson                       27.26396        4        .00002
Likelihood Ratio              29.91212        4        .00001
Mantel-Haenszel test for      23.85063        1        .00000
      linear association

Minimum Expected Frequency -     .609
Cells with Expected Frequency < 5 4 OF 10 ( 40.0%) Number of Missing Observations: 5 

Notice that SPSS tells us that four of 10 cells have an expected value of less than five. The chi-square is not really valid when you have cells with expected values less than five. Thus, you should use Fisher's exact test, which is valid under such circumstances. Unfortunately, Fisher's exact test is only available if you have installed the EXACT Tests add-on to SPSS.

4. Correlation

Let's use the correlations command to examine the relationships among price mpg and weight.

CORRELATIONS
  /VARIABLES=price mpg weight.

The results of the CORRELATIONS command are shown below. 

             - -  Correlation Coefficients  - -

             PRICE      MPG        WEIGHT
PRICE        1.0000     -.4777      .5386
            (   74)    (   71)    (   74)
            P= .       P= .000    P= .000
MPG          -.4777     1.0000     -.8075
            (   71)    (   71)    (   71)
            P= .000    P= .       P= .000
WEIGHT        .5386     -.8075     1.0000
            (   74)    (   71)    (   74)
            P= .000    P= .000    P= .

(Coefficient / (Cases) / 2-tailed Significance) 
" . " is printed if a coefficient cannot be computed

The output is a correlation matrix for the price, mpg, and weight   Each cell has three entries: correlation coefficient, number of cases (N), and P value. The line below the correlation matrix, (Coefficient / (Cases) / 2-tailed Significance), tells you how to read each cell. The p value is the two tailed p-value for the hypothesis test that the correlation is 0.

By looking at the sample sizes, we can see how correlations handle the missing values.  Since mpg had three missing values, all the correlations with mpg have an N of 71. The rest of the correlations were based on an N of 74.  This is called pairwise deletion of missing data. Since SPSS used the maximum number of non-missing values for each pair of variables it uses pairwise deletion.  It is possible to ask SPSS for correlations only on the cases having complete data for all of the variables on the /variables= subcommand.  This is called listwise deletion of missing data, when any of the variables are missing for a case, the entire case will be omitted from analysis.  You can request listwise deletion with the LISTWISE keyword on the /missing= subcommand. This is demonstrated in the program below.

CORR
  /VARIABLES=price mpg weight
  /MISSING=LISTWISE .

Notice that the correlations command can be abbreviated as corr.

The results of this command are shown below. 

            - -  Correlation Coefficients  - -

             PRICE      MPG        WEIGHT

PRICE        1.0000     -.4777      .5418
            (   71)    (   71)    (   71)
            P= .       P= .000    P= .000

MPG          -.4777     1.0000     -.8075
            (   71)    (   71)    (   71)
            P= .000    P= .       P= .000

WEIGHT        .5418     -.8075     1.0000
            (   71)    (   71)    (   71)
            P= .000    P= .000    P= .

(Coefficient / (Cases) / 2-tailed Significance)   
" . " is printed if a coefficient cannot be computed

The N is 71 for all of the correlations in the matrix since /missing=listwise was specified. In some versions of SPSS the N is not presented with each correlation, but rather is presented separately when this subcommand is specified. This is possible since the N for all correlations in the matrix is the same with listwise deletion of missing values.

5. Regression

Regression is a technique used to find the best linear prediction of a criterion variable from a set of predictor variables. To perform a regression analysis to predict price from mpg and weight. We can use the regression command as in the example below. The /dependent subcommand names the criterion variable price. The /method subcommand names the predictor variables mpg and weight, and the enter keyword causes both variables to enter the equation at the same time.

REG  
  /DEPENDENT price
  /METHOD=ENTER mpg weight.

You should note the following two points in looking at the output below.

1) Only 71 observations are used instead of 74 because mpg had three missing values.  Reg deletes missing cases using listwise deletion.  If you have a large amount of missing data you may lose too many cases unless you use some method for estimating missing values.

2) Direct your attention to Variables in the Equation look at the regression coefficients for the predictors. The results show that weight is the only variable that significantly predicts price. The predicted regression coefficient (B) for weight is 1.689685 with a t value of  2.603 and a p-value of   0.0113. One reason for this may be the high correlation between mpg and weight.

The results are shown below. 

      * * * *   M U L T I P L E   R E G R E S S I O N   * * * *

Listwise Deletion of Missing Data

Equation Number 1    Dependent Variable..   PRICE

Block Number  1.  Method:  Enter      MPG      WEIGHT

Variable(s) Entered on Step Number  1..    WEIGHT
                                    2..    MPG

Multiple R         .54605      Analysis of Variance
R Square           .29817                     DF   Sum of Squares    Mean Square
Adjusted R Sq      .27752      Regression      2   185670655.6187   92835327.809
Standard Error 2535.16029      Residual       68   437038564.8601    6427037.718
                                     F =      14.44450       Signif F =  .0000

------------------ Variables in the Equation ------------------
Variable              B        SE B       Beta         T  Sig T
MPG          -58.668896   87.294000   -.115750     -.672  .5038
WEIGHT         1.689685     .649145    .448293     2.603  .0113
(Constant)  2394.284967 3647.875362                 .656  .5138

End Block Number   1   All requested variables entered.

6. Analysis of variance (and analysis of covariance)

To compare the average prices among the cars in the different repair groups we use Analysis of Variance. Use anova to perform an ANOVA comparing the prices among the repair groups.  Since there are so few cars with a repair record (rep78) of 1 or 2, we should concentrate on the cars with repair records of 3, 4 and 5.  We will use the range specification (3,5) on the /variables subcommand to limit processing to those categories three through five. The ANOVA below performs an tests the hypothesis that the average mpg for the three repair groups (rep78) are the same.  It also produces the means for the three repair groups.

ANOVA
  /VARIABLES=mpg BY rep78(3,5)
  /METHOD=EXPERIM 
  /STATISTICS MEAN .

The results of the ANOVA are shown below.  SPSS informs us that it used only 57 observations (due to the missing values of mpg and restrictions on the values of rep78).  The results suggest that there are significant differences in mpg among the three repair groups (based on the F value of 8.08 with a p value of 0.001).  The means for groups 3, 4 and 5 were 19.43, 21.67 and 27.36 .

                      * * *  C E L L   M E A N S  * * *
               MPG
            by REP78

Total Population
    21.67
 (    57)

REP78
        3         4         5
    19.43     21.67     27.36
 (    28)  (    18)  (    11)

            * * *  A N A L Y S I S   O F   V A R I A N C E  * * *

                 MPG
            by   REP78

                 EXPERIMENTAL sums of squares
                 Covariates entered FIRST

                         Sum of              Mean             Sig
Source of Variation     Squares     DF     Square       F    of F

Main Effects            497.264      2    248.632     8.081  .001
   REP78                497.264      2    248.632     8.081  .001

Explained               497.264      2    248.632     8.081  .001

Residual               1661.403     54     30.767

Total                  2158.667     56     38.548

74 cases were processed.
17 cases (23.0 pct) were missing.

The above ANOVA will work both for SPSS 6.1 and SPSS 7.5. 

7. Analysis of variance with GLM (SPSS 7.5 and higher)

In SPSS version 7.5 and later versions you may want to use the glm command instead. The glm command allows the calculation of post hoc tests as well. Since the glm command does not allow the specification of a range, you will have to use the filter command to restrict the range of rep78. An example of the glm command with filtering and the Tukey HSD post hoc test follows.

COMPUTE filt345=(ANY(rep78 ,3,4,5)).
FILTER BY filt345.
EXECUTE .

GLM   mpg  BY rep78    
  /POSTHOC = rep78 ( TUKEY )
  /EMMEANS = TABLES(rep78) .
 
FILTER OFF.
EXECUTE. 

The results just for the Tukey tests produced by this GLM command are shown below (the rest of the output would be identical except for formatting).  The group with rep78 of 5 is significantly different both from 3 and from 4.  However, the group with rep78 of 3 is not significantly different from rep78 of 4. This output was produce on a PC running SPSS version 7.5.

Post Hoc Tests

REP78

Multiple Comparisons
Dependent Variable: MPG
Tukey HSD

Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval
(I) REP78 (J) REP78


Lower Bound Upper Bound
3 4 -2.24 1.676 .382 -6.28 1.80
5 -7.94(*) 1.974 .001 -12.69 -3.18
4 3 2.24 1.676 .382 -1.80 6.28
5 -5.70(*) 2.123 .026 -10.81 -.58
5 3 7.94(*) 1.974 .001 3.18 12.69
4 5.70(*) 2.123 .026 .58 10.81
Based on observed means. The error term is Error.
* The mean difference is significant at the .050 level.


Homogeneous Subsets

MPG
Tukey HSD

N Subset
REP78
1 2
3 28 19.43
4 18 21.67
5 11
27.36
Sig.
.483 1.000
Means for groups in homogeneous subsets are displayed.
Based on Type III Sum of Squares
The error term is Mean Square(Error) = 30.767.
a Uses Harmonic Mean Sample Size = 16.467.
b The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not guaranteed.
c Alpha = .050.

8. Problems to look out for

9. For more information


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California