UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Data Analysis Examples
Probit Regression

Examples

Example 1:  Suppose that we are interested in factors that influence whether or not a political candidate wins an election.  Our outcome variable has only two possible values:  win or not win.  We believe that factors such as the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether the candidate is an incumbent affect whether the candidate wins the election.  Because our outcome variable is binary (either the candidate wins or does not win), we need to use a model that handles this feature correctly. 

Example 2:  Some people have heart attacks and others don't.  We would like to see if exercise, age and gender influences whether or not someone has a heart attack.  Again, we have a binary outcome:  have heart attack or not. 

Example 3:  Many undergraduates wish to continue their education in graduate school.  In their application to any given graduate program, they include their GRE scores and their GPA from their undergraduate institution.  Some students are graduating from very prestigious institutions, while others are graduating from not-so-prestigious institutions.  Many months after sending in their applications, students receive either a thick or a thin envelope from the graduate program to which they applied:  some were admitted and others were not.

Description of the Data

For our data analysis below, we are going to expand on our third example about getting into graduate school.  We have generated hypothetical data, which can be obtained from our website:
use http://www.ats.ucla.edu/stat/stata/dae/probit.dta, clear

This hypothetical data set has a 0/1 variable called admit that we will use as our response (i.e., outcome, dependent) variable.  We also have three variables that we will use as predictors:  gre, which is the student's Graduate Record Exam score; gpa, which is the student's grade point average; and topnotch, which is a 0/1 variable where 1 indicates that the undergraduate institution was "top notch" and 0 indicates that it is not. 

summarize gre gpa

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         gre |       400       587.7    115.5165        220        800
         gpa |       400      3.3899    .3805668       2.26          4

tab topnotch

   topnotch |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        335       83.75       83.75
          1 |         65       16.25      100.00
------------+-----------------------------------
      Total |        400      100.00

Some Strategies You Might Try

Using the Probit Model

Before we run our probit model, we will see if any cells (created by the crosstab of our categorical and response variables) are empty or particularly small.  If any are, we may have difficulty running our model. 

tab admit topnotch

           |       topnotch
     admit |         0          1 |     Total
-----------+----------------------+----------
         0 |       238         35 |       273 
         1 |        97         30 |       127 
-----------+----------------------+----------
     Total |       335         65 |       400 

None of the cells is too small or empty (has no cases), so we will run our model.

probit admit gre topnotch gpa

Iteration 0:   log likelihood = -249.98826
Iteration 1:   log likelihood = -238.97735
Iteration 2:   log likelihood = -238.94339
Iteration 3:   log likelihood = -238.94339

Probit regression                                 Number of obs   =        400
                                                  LR chi2(3)      =      22.09
                                                  Prob > chi2     =     0.0001
Log likelihood = -238.94339                       Pseudo R2       =     0.0442

------------------------------------------------------------------------------
       admit |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gre |   .0015244   .0006382     2.39   0.017     .0002736    .0027752
    topnotch |   .2730334   .1795984     1.52   0.128     -.078973    .6250398
         gpa |   .4009853   .1931077     2.08   0.038     .0225012    .7794694
       _cons |  -2.797884   .6475363    -4.32   0.000    -4.067032   -1.528736
------------------------------------------------------------------------------

In the output above, we first see the iteration log.  In general, this is not so interesting but does contain information on how well the model converges.  The final log likelihood (-238.94339) can be used in comparisons of nested models, but we won't show an example of that here.  Also at the top of the output we see that all 400 observations in our data set were used in the analysis.  Fewer observations would have been used if any of our variables had missing values.  By default, Stata does a listwise deletion of cases with missing values.  The likelihood ratio chi-square of 22.09 with a p-value of 0.0001 tells us that our model as a whole is statistically significant, as compared to model with no predictors.  The pseudo-R-squared is also  given.  It is a pseudo-R-squared because there is no direct equivalent of an R-squared (from OLS regression) in non-linear models.  There are many different pseudo-R-squares, but the emphasis should be on the pseudo.

In the table we see the coefficients, their standard errors, the z-test and associated p-values, and the 95% confidence interval of the coefficients.  Both gre and gpa are statistically significant; topnotch is not.  A discussion of the interpretation of the coefficients can be found in the sample write up section below.

There is no equivalent of an exponentiated coefficient in probit, so if you find interpreting probit coefficients tricky, you might prefer looking at  predicted probabilities, which are sometimes easier for many to understand than the coefficients.  The commands used below are user-written and need to be downloaded, which you can do by typing findit spost.  We will start with prtab.  This can be used with either a categorical variable or a continuous variable and shows the predicted probability for each of the values of the variable specified.  Although topnotch is not statistically significant, we will use it as an example with a categorical predictor. As you can see, the predicted probability of being accepted into the graduate program is 0.3 if the undergraduate institution was not "top notch" and .4 if it was.  We can also see that the predicted probability of getting accepted is only .14 if one's GRE score is 220 and increases to .43 if one's GRE score is 800.  Beneath each output, we can see the values at which the variables are held; by default, they are held at their mean.

prtab topnotch

probit: Predicted probabilities of positive outcome for admit

----------------------
 topnotch | Prediction
----------+-----------
        0 |     0.2937
        1 |     0.3937
----------------------

         gre  topnotch       gpa
x=     587.7     .1625    3.3899

prtab gre

probit: Predicted probabilities of positive outcome for admit

----------------------
      gre | Prediction
----------+-----------
      220 |     0.1448
      300 |     0.1744
      340 |     0.1905
      360 |     0.1989
      380 |     0.2076
      400 |     0.2164
      420 |     0.2254
      440 |     0.2347
      460 |     0.2442
      480 |     0.2538
      500 |     0.2637
      520 |     0.2737
      540 |     0.2840
      560 |     0.2944
      580 |     0.3050
      600 |     0.3158
      620 |     0.3267
      640 |     0.3378
      660 |     0.3490
      680 |     0.3603
      700 |     0.3718
      720 |     0.3834
      740 |     0.3951
      760 |     0.4068
      780 |     0.4187
      800 |     0.4307
----------------------

         gre  topnotch       gpa
x=     587.7     .1625    3.3899

We can use the prvalue command to obtain the predicted probabilities when GRE is set to specific values: 2, 3 and 4.  As you can see, when one's GPA is 2, the predicted probability of being accepted is only .146 and .85 of not being accepted.  When GPA is increased to 3, the probability of being accepted increases to .25, and when one's GPA is 4, the predicted probability of being accepted is .4.

prvalue , x(gpa=2)

probit: Predictions for admit

Confidence intervals by delta method

                                95% Conf. Interval
  Pr(y=1|x):          0.1456   [ 0.0200,    0.2711]
  Pr(y=0|x):          0.8544   [ 0.7289,    0.9800]

         gre  topnotch       gpa
x=     587.7     .1625         2

prvalue , x(gpa=3)

probit: Predictions for admit

Confidence intervals by delta method

                                95% Conf. Interval
  Pr(y=1|x):          0.2563   [ 0.1910,    0.3217]
  Pr(y=0|x):          0.7437   [ 0.6783,    0.8090]

         gre  topnotch       gpa
x=     587.7     .1625         3

prvalue , x(gpa=4)

probit: Predictions for admit

Confidence intervals by delta method

                                95% Conf. Interval
  Pr(y=1|x):          0.3999   [ 0.2997,    0.5000]
  Pr(y=0|x):          0.6001   [ 0.5000,    0.7003]

         gre  topnotch       gpa
x=     587.7     .1625         4

We can use the prvalue command to look at specific profiles.  Below we see the predicted probabilities for a student who had a 3.5 GPA and a GRE score of 700.  Finally, we see the predicted probabilities for a student with the highest values for all variables.

prvalue , x(gpa=3.5 gre=700)

probit: Predictions for admit

Confidence intervals by delta method

                                95% Conf. Interval
  Pr(y=1|x):          0.3886   [ 0.3202,    0.4570]
  Pr(y=0|x):          0.6114   [ 0.5430,    0.6798]

         gre  topnotch       gpa
x=       700     .1625       3.5

prvalue , x(gpa=max gre=max topnotch=1)

probit: Predictions for admit

Confidence intervals by delta method

                                95% Conf. Interval
  Pr(y=1|x):          0.6174   [ 0.4765,    0.7583]
  Pr(y=0|x):          0.3826   [ 0.2417,    0.5235]

         gre  topnotch       gpa
x=       800         1         4

Sample Write-up of the Analysis

We will use the estout command to create a table of the results that might be more appropriate for publication.  This command is user-written, so type findit estout to download it.

estout, varwidth(12) varlabels(_cons Constant) cells(b(star fmt(%8.2f)) ///
se(par fmt(%8.2f))) ///
stats(ll chi2 r2_p, labels(log_likelihood LR_chi_square r2_pvalue) fmt(%8.2f))

                b/se
gre             0.00*
                (0.00)
topnotch        0.27
                (0.18)
gpa             0.40*
                (0.19)
Constant        -2.80***
                (0.65)
log_likelihood  -238.94
LR_chi_square   22.09
r2_pvalue       0.04

Below is one way of describing the results.  Please note that the coefficients can be discussed in terms of either Z-scores or probit index.  These are equivalent terms.

The Z-score of a person with a zero GRE score and zero GPA at a non-topnotch school is about -2.8.  For each point of increase in GRE score, the Z-score is increased by .0015244; for each point of increase in GPA, the probit index increases by .4.

Describing the results in terms of Z-scores may not be the simplest metric for your audience to understand.  As we saw above, you can use the prvalue, prtab and other spost commands to obtain predicted probabilities.  These are often useful for helping to tell the "story" of your results.

Similarities and differences between logit and probit models

Neither the logit model nor the probit model are linear, which makes things difficult.  To make the model linear, a transformation is done on the dependent variable.  In logit regression, the transformation is the logit function which is the natural log of the odds.  In probit models, the function used is the inverse of the standard normal cumulative distribution (a.k.a. a z-score).  In reality, this difference isn't too important:  both transformations are equally good at linearizing the model; which one you use is a matter of personal preference.  Both models need to have diagnostics done afterwards to check that the assumptions of the model have not been violated.  Both methods use maximum likelihood, and so require more cases than a similar OLS model.  Unlike logit models, you don't get odds ratios with probit models.  In general, the logit coefficients are larger than the probit coefficients by a factor of 1.7.  However, this rule often does not apply when an independent variable has a high standard error (lots of variability).

Cautions, Flies in the Ointment

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California