|
|
|
||||
|
|
|||||
Example 2: Some people have heart attacks and others don't. We would like to see if exercise, age and gender influences whether or not someone has a heart attack. Again, we have a binary outcome: have heart attack or not.
Example 3: Many undergraduates wish to continue their education in graduate school. In their application to any given graduate program, they include their GRE scores and their GPA from their undergraduate institution. Some students are graduating from very prestigious institutions, while others are graduating from not-so-prestigious institutions. Many months after sending in their applications, students receive either a thick or a thin envelope from the graduate program to which they applied: some were admitted and others were not.
This hypothetical data set has a binary response (outcome, dependent) variable called admit. There are three predictor variables: gre, gpa and topnotch, which is a binary predictor in which 1 indicates that the undergraduate institution was "top notch" and 0 indicates that it is not.
NOTE: This example was done using Mplus version 4.21. The syntax may not work with earlier versions of Mplus.
title: Mplus DAE for probit;
data: file is "D:\probit.dat";
variable: names are admit gre topnotch gpa;
categorical = admit;
analysis:
type = basic;
plot: type is plot1;
For this output only, we will display all of the information in the output. You will want to look at this carefully to be sure that the data were read into Mplus correctly. You will want to make sure that you have the correct number of observations, and that the categorical and continuous variables have been correctly specified. We have not used a missing statement because we have no missing data in this data set.
INPUT READING TERMINATED NORMALLY
Mplus DAE for probit;
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 400
Number of dependent variables 4
Number of independent variables 0
Number of continuous latent variables 0
Observed dependent variables
Continuous
GRE TOPNOTCH GPA
Binary and ordered categorical (ordinal)
ADMIT
Estimator WLSMV
Maximum number of iterations 1000
Convergence criterion 0.500D-04
Maximum number of steepest descent iterations 20
Parameterization DELTA
Input data file(s)
D:\probit.dat
Input data format FREE
SUMMARY OF CATEGORICAL DATA PROPORTIONS
ADMIT
Category 1 0.683
Category 2 0.317
RESULTS FOR BASIC ANALYSIS
ESTIMATED SAMPLE STATISTICS
MEANS/INTERCEPTS/THRESHOLDS
ADMIT$1 GRE TOPNOTCH GPA
________ ________ ________ ________
1 0.475 587.700 0.162 3.390
CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
ADMIT GRE TOPNOTCH GPA
________ ________ ________ ________
ADMIT
GRE 0.243 13310.683
TOPNOTCH 0.167 0.217 0.136
GPA 0.232 0.384 0.243 0.144
STANDARD ERRORS FOR ESTIMATED SAMPLE STATISTICS
S.E. FOR MEANS/INTERCEPTS/THRESHOLDS
ADMIT$1 GRE TOPNOTCH GPA
________ ________ ________ ________
1 0.065 5.805 16598.305 0.019
S.E. FOR CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
ADMIT GRE TOPNOTCH GPA
________ ________ ________ ________
ADMIT
GRE 0.063 1040.244
TOPNOTCH 0.061 0.049 6693.099
GPA 0.060 0.039 0.047 0.012
Before running the probit model, check to see if any cells (created by the crosstab of our categorical and response variables) are empty or particularly small. If this occurs, there may be difficulty running the logit model. (This crosstab should be done in a general use statistics package.) In our example, none of the cells are too small or empty (has no cases), so we will run our logit model.
title: Mplus DAE for probit; data: file is "D:\probit.dat"; variable: names are admit gre topnotch gpa; categorical = admit; model: admit on gre topnotch gpa; analysis: type = meanstructure; ! you need to specify type = meanstructure to get the threshold; ! by default, wls is used, which gives you a probit (as opposed to a logit) model;MODEL RESULTS Estimates S.E. Est./S.E. ADMIT ON GRE 0.002 0.001 2.407 TOPNOTCH 0.273 0.177 1.545 GPA 0.401 0.187 2.143 Thresholds ADMIT$1 2.793 0.650 4.298
The section called MODEL RESULTS shows the coefficients (estimates), their standard errors and the ratio of the estimate to the standard error. The can be considered a z-test where values 2 and above are statistically significant. Both gre and gpa are statistically significant while topnotch is not. A discussion of the interpretation of the coefficients can be found in the sample write up section below. There is no equivalent of an exponentiated coefficient in probit.
A probit model can incorporate either an intercept or a threshold (sometimes called a cutpoint) in the model. Instead of reporting the intercept for the model, Mplus reports a threshold. It is the same as the intercept, except it has the opposite sign (so the intercept would be -4.601). For more information on the differences between intercepts and thresholds, please see http://www.stata.com/support/faqs/stat/oprobit.html .
Below is one way of describing these results.
Below is one way of describing the results. Please note that the coefficients can be discussed in terms of either Z-scores or probit index. These are equivalent terms.
The Z-score of a person with a zero GRE score and zero GPA at a non-topnotch school is about -2.8. For each point of increase in GRE score, the Z-score is increased by .0015244; for each point of increase in GPA, the probit index increases by .4.
Neither the logit model nor the probit model are linear, which makes things difficult. To make the model linear, a transformation is done on the dependent variable. In logit regression, the transformation is the logit function which is the natural log of the odds. In probit models, the function used is the inverse of the standard normal cumulative distribution (a.k.a. a z-score). In reality, this difference isn't too important: both transformations are equally good at linearizing the model; which one you use is a matter of personal preference. Both models need to have diagnostics done afterwards to check that the assumptions of the model have not been violated. Both methods use maximum likelihood, and so require more cases than a similar OLS model. Unlike logit models, you don't get odds ratios with probit models. In general, the logit coefficients are larger than the probit coefficients by a factor of 1.7. However, this rule often does not apply when an independent variable has a high standard error (lots of variability).
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services