UCLA Academic Technology Services HomeServicesClassesContactJobs

Mplus Data Analysis Examples
Probit Regression

Examples

Example 1:  Suppose that we are interested in factors that influence whether or not a political candidate wins an election.  Our outcome variable has only two possible values:  win or not win.  We believe that factors such as the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether the candidate is an incumbent affect whether the candidate wins the election.  Because our outcome variable is binary (either the candidate wins or does not win), we need to use a model that handles this feature correctly. 

Example 2:  Some people have heart attacks and others don't.  We would like to see if exercise, age and gender influences whether or not someone has a heart attack.  Again, we have a binary outcome:  have heart attack or not. 

Example 3:  Many undergraduates wish to continue their education in graduate school.  In their application to any given graduate program, they include their GRE scores and their GPA from their undergraduate institution.  Some students are graduating from very prestigious institutions, while others are graduating from not-so-prestigious institutions.  Many months after sending in their applications, students receive either a thick or a thin envelope from the graduate program to which they applied:  some were admitted and others were not.

Description of the Data

For our data analysis below, we are going to expand on Example 3 about getting into graduate school.  We have generated hypothetical data, which can be obtained by clicking on probit.dat. You can store this anywhere you like, but our examples will assume it has been stored in c:\data.  (Be sure NOT to include the names of the variables at the top of your text file.  Instead, the variables are names on the variable statement.)  You may want to do your descriptive statistics in a general use statistics package, such as SAS, Stata or SPSS.  In Mplus, you can a few descriptive statistics.

This hypothetical data set has a binary response (outcome, dependent) variable called admit. There are three predictor variables:  gre, gpa and topnotch, which is a binary predictor in which 1 indicates that the undergraduate institution was "top notch" and 0 indicates that it is not. 

NOTE:  This example was done using Mplus version 4.21.  The syntax may not work with earlier versions of Mplus.

  title: Mplus DAE for probit;
  data: file is "D:\probit.dat";
  variable: names are admit gre topnotch gpa;
  categorical = admit;
  analysis:
    type = basic;
  plot: type is plot1;

For this output only, we will display all of the information in the output.  You will want to look at this carefully to be sure that the data were read into Mplus correctly.  You will want to make sure that you have the correct number of observations, and that the categorical and continuous variables have been correctly specified.  We have not used a missing statement because we have no missing data in this data set.

INPUT READING TERMINATED NORMALLY

Mplus DAE for probit;

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         400

Number of dependent variables                                    4
Number of independent variables                                  0
Number of continuous latent variables                            0

Observed dependent variables

  Continuous
   GRE         TOPNOTCH    GPA

  Binary and ordered categorical (ordinal)
   ADMIT

Estimator                                                    WLSMV
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Parameterization                                             DELTA

Input data file(s)
  D:\probit.dat

Input data format  FREE

SUMMARY OF CATEGORICAL DATA PROPORTIONS

    ADMIT
      Category 1    0.683
      Category 2    0.317

RESULTS FOR BASIC ANALYSIS

     ESTIMATED SAMPLE STATISTICS

           MEANS/INTERCEPTS/THRESHOLDS
              ADMIT$1       GRE           TOPNOTCH      GPA
              ________      ________      ________      ________
      1         0.475       587.700         0.162         3.390


           CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
              ADMIT         GRE           TOPNOTCH      GPA
              ________      ________      ________      ________
 ADMIT
 GRE            0.243     13310.683
 TOPNOTCH       0.167         0.217         0.136
 GPA            0.232         0.384         0.243         0.144

     STANDARD ERRORS FOR ESTIMATED SAMPLE STATISTICS

           S.E. FOR MEANS/INTERCEPTS/THRESHOLDS
              ADMIT$1       GRE           TOPNOTCH      GPA
              ________      ________      ________      ________
      1         0.065         5.805     16598.305         0.019

           S.E. FOR CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
              ADMIT         GRE           TOPNOTCH      GPA
              ________      ________      ________      ________
 ADMIT
 GRE            0.063      1040.244
 TOPNOTCH       0.061         0.049      6693.099
 GPA            0.060         0.039         0.047         0.012


Some Strategies You Might Try

Using the Probit Model

Before running the probit model, check to see if any cells (created by the crosstab of our categorical and response variables) are empty or particularly small.  If this occurs, there may be difficulty running the logit model.  (This crosstab should be done in a general use statistics package.)  In our example, none of the cells are too small or empty (has no cases), so we will run our logit model.

title: Mplus DAE for probit;
data: file is "D:\probit.dat";
variable: names are admit gre topnotch gpa;
categorical = admit;
model: admit on gre topnotch gpa;
analysis:
type = meanstructure; 
! you need to specify type = meanstructure to get the threshold;
! by default, wls is used, which gives you a probit (as opposed to a logit) model;
MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

 ADMIT    ON
    GRE                0.002    0.001      2.407
    TOPNOTCH           0.273    0.177      1.545
    GPA                0.401    0.187      2.143

 Thresholds
    ADMIT$1            2.793    0.650      4.298

The section called MODEL RESULTS shows the coefficients (estimates), their standard errors and the ratio of the estimate to the standard error.  The can be considered a z-test where values 2 and above are statistically significant.  Both gre and gpa are statistically significant while topnotch is not.  A discussion of the interpretation of the coefficients can be found in the sample write up section below.  There is no equivalent of an exponentiated coefficient in probit.

A probit model can incorporate either an intercept or a threshold (sometimes called a cutpoint) in the model.  Instead of reporting the intercept for the model, Mplus reports a threshold.  It is the same as the intercept, except it has the opposite sign (so the intercept would be -4.601).  For more information on the differences between intercepts and thresholds, please see http://www.stata.com/support/faqs/stat/oprobit.html .

Sample Write-up of the Analysis

Below is one way of describing these results.

Below is one way of describing the results.  Please note that the coefficients can be discussed in terms of either Z-scores or probit index.  These are equivalent terms.

The Z-score of a person with a zero GRE score and zero GPA at a non-topnotch school is about -2.8.  For each point of increase in GRE score, the Z-score is increased by .0015244; for each point of increase in GPA, the probit index increases by .4.

Similarities and differences between logit and probit models

Neither the logit model nor the probit model are linear, which makes things difficult.  To make the model linear, a transformation is done on the dependent variable.  In logit regression, the transformation is the logit function which is the natural log of the odds.  In probit models, the function used is the inverse of the standard normal cumulative distribution (a.k.a. a z-score).  In reality, this difference isn't too important:  both transformations are equally good at linearizing the model; which one you use is a matter of personal preference.  Both models need to have diagnostics done afterwards to check that the assumptions of the model have not been violated.  Both methods use maximum likelihood, and so require more cases than a similar OLS model.  Unlike logit models, you don't get odds ratios with probit models.  In general, the logit coefficients are larger than the probit coefficients by a factor of 1.7.  However, this rule often does not apply when an independent variable has a high standard error (lots of variability).

Cautions, Flies in the Ointment

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.