|
|
|
||||
|
|
|||||
The aim of this seminar is to help you learn how to visualize main effects for models using logistic regression. It will demonstrate a suite of tools name vibl for visualizing binary logit models (By the way, vibl is pronounced "vibble" and it rhymes with kibble).. You can get all of the programs and data files associated with the seminar as shown below.
net from http://www.ats.ucla.edu/stat/stata/ado/analysis net install vibl net get vibl
This page also refers to the xi3 and postgr3 commands. If you do not have these, you can download them as shown below.
net from http://www.ats.ucla.edu/stat/stata/ado/analysis net install xi3 net install postgr3
Some of the sections illustrate interactive use of the viblmdb command and have movies that accompany the sections. These sections start with a link that will look like this.
--- View the movie that accompanies this section ---
You can click on the link and it will bring up a movie showing us interacting with Stata and with verbal (audio) explanations.
Let's look at a model which has a single dummy variable in the model
use http://www.ats.ucla.edu/stat/stata/seminars/stata_vibl/hsbvibl, clear
xi3: regress socst i.academic
i.academic _Iacademic_0-1 (naturally coded; _Iacademic_0 omitted)
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 1, 198) = 42.70
Model | 4068.72633 1 4068.72633 Prob > F = 0.0000
Residual | 18867.4687 198 95.2902458 R-squared = 0.1774
-------------+------------------------------ Adj R-squared = 0.1732
Total | 22936.195 199 115.257261 Root MSE = 9.7617
------------------------------------------------------------------------------
socst | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iacademic_1 | 9.03208 1.382238 6.53 0.000 6.306283 11.75788
_cons | 47.66316 1.001526 47.59 0.000 45.68813 49.63819
------------------------------------------------------------------------------
We can graph the predicted means using the postgr3 command as shown below. Note that there is only one set of adjusted means.
postgr3 academic, table
-----------------------
academic | mean(yhat_)
----------+------------
0 | 47.66316
1 | 56.69524
-----------------------
Say that we now add two covariates to this model, math and science.
xi3: regress socst i.academic math science
i.academic _Iacademic_0-1 (naturally coded; _Iacademic_0 omitted)
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 37.76
Model | 8400.40598 3 2800.13533 Prob > F = 0.0000
Residual | 14535.789 196 74.1621889 R-squared = 0.3663
-------------+------------------------------ Adj R-squared = 0.3566
Total | 22936.195 199 115.257261 Root MSE = 8.6117
------------------------------------------------------------------------------
socst | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iacademic_1 | 5.171911 1.383241 3.74 0.000 2.443964 7.899858
math | .3268875 .0931793 3.51 0.001 .1431248 .5106503
science | .2549513 .0800324 3.19 0.002 .0971161 .4127865
_cons | 19.26153 3.837613 5.02 0.000 11.69321 26.82984
------------------------------------------------------------------------------
We can use the postgr3 command to get the adjusted means. By adding the table option we also get a table of means as well as the graph of the adjusted means. We also add the ylabel() option to specify the scaling for the y axis. Note that these adjusted means are computed by holding the covariates math and science at their respective means (as indicated by the notes below the command).
postgr3 academic, table ylabel(40 45 to 60) Variables left asis: _Iacademic_1 Holding math constant at 52.645 Holding science constant at 51.85 Holding math constant at 52.645 Holding science constant at 51.85
Here we see the table of means from the table option.
-----------------------
academic | mean(yhat_)
----------+------------
0 | 49.68975
1 | 54.86166
-----------------------
If we subtract these means we get (54.86166 - 49.68975) = 5.17191 and note how this matches the main effect for academic.
We can repeat the postgr3 command showing the adjusted means by holding math and science constant at different values. In this example, we hold math constant at 40 and science constant at 40.
postgr3 academic, table x(math=40 science=40) ylabel(40 45 to 60) Holding math constant at 40 Holding science constant at 40
----------------------- academic | mean(yhat_) ----------+------------ 0 | 42.53508 1 | 47.70699 -----------------------
Although the means are generally lower, note how the difference still is the same, (47.70699 - 42.53508) = 5.17191.
Likewise, we can set the covariates math constant at 60 and science constant at 60.
postgr3 academic, table x(math=60 science=60) ylabel(40 45 to 60) Holding math constant at 60 Holding science constant at 60
-----------------------
academic | mean(yhat_)
----------+------------
0 | 54.17186
1 | 59.34377
-----------------------
Note how the difference in the means remains the same, (59.34377 - 54.17186) = 5.17191.
Summary
We will repeat the same steps that we showed in the section above, but this time using a logistic regression model and we will generate predicted values that are the predicted logits from the model. Where the outcome from the previous model was socst (a continuous variable), the outcome in this model is honors which is a 0/1 variable.
. xi3: logit honors i.academic, nolog
i.academic _Iacademic_0-1 (naturally coded; _Iacademic_0 omitted)
Logit estimates Number of obs = 200
LR chi2(1) = 29.62
Prob > chi2 = 0.0000
Log likelihood = -123.81064 Pseudo R2 = 0.1068
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iacademic_1 | 1.602517 .3063106 5.23 0.000 1.00216 2.202875
_cons | -.8223589 .2227875 -3.69 0.000 -1.259014 -.3857034
------------------------------------------------------------------------------
We can graph the predicted logits using the postgr3 command. Note we add the predict(xb) option to specify that we want the logits (not predicted probabilities). Since there are no covariates, note there is only one predicted logit per group.
postgr3 academic, table predict(xb)

-----------------------
academic | mean(yhat_)
----------+------------
0 | -.8223589
1 | .7801586
-----------------------
Note how the difference in the predicted logits matches the coefficient for the main effect, (.7801586 - -.8223589) = 1.6025175.
Let's now add two covariates to the above model.
xi3: logit honors i.academic math science, nolog
i.academic _Iacademic_0-1 (naturally coded; _Iacademic_0 omitted)
Logit estimates Number of obs = 200
LR chi2(3) = 65.41
Prob > chi2 = 0.0000
Log likelihood = -105.91552 Pseudo R2 = 0.2359
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iacademic_1 | 1.193744 .363803 3.28 0.001 .4807031 1.906785
math | .0501096 .025738 1.95 0.052 -.000336 .1005553
science | .072061 .0228077 3.16 0.002 .0273587 .1167632
_cons | -6.979717 1.218391 -5.73 0.000 -9.367719 -4.591715
------------------------------------------------------------------------------
We can graph the predicted logits using postgr3 and we also show them as a table.
postgr3 academic, predict(xb) table ylabel(-3 -2 to 3) Holding math constant at 52.645 Holding science constant at 51.85
----------------------- academic | mean(yhat_) ----------+------------ 0 | -.6053352 1 | .5884086 -----------------------
As you might expect, the difference in these predicted logits correspond to the coefficient for the main effect of academic (.5884086 - -.6053352) = 1.1937438.
postgr3 academic, predict(xb) table x(math=40 science=40) ylabel(-3 -2 to 3) Holding math constant at 40 Holding science constant at 40
----------------------- academic | mean(yhat_) ----------+------------ 0 | -2.092894 1 | -.8991498 -----------------------
Again, even when holding the covariates at different values, the difference in the adjusted logits corresponds to the coefficient for academic, (-.8991498 - -2.092894) = 1.1937442.
postgr3 academic, predict(xb) table x(math=60 science=60) ylabel(-3 -2 to 3) Holding math constant at 60 Holding science constant at 60
----------------------- academic | mean(yhat_) ----------+------------ 0 | .350518 1 | 1.544262 -----------------------
Again, note how the difference in the adjusted logits remains the same, (1.544262 - .350518) = 1.193744.
Summary
So far we have seen that the adjusted means in OLS models and adjusted logits in logistic models yield differences in adjusted means and differences in adjusted logits that are the same regardless of the values of the covariates. Let's now repeat these analyses but generating predicted probabilities and adjusted probabilities.
First, we will run the same logit regression with the dummy variable academic in the model.
xi3: logit honors i.academic, nolog
i.academic _Iacademic_0-1 (naturally coded; _Iacademic_0 omitted)
Logit estimates Number of obs = 200
LR chi2(1) = 29.62
Prob > chi2 = 0.0000
Log likelihood = -123.81064 Pseudo R2 = 0.1068
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iacademic_1 | 1.602517 .3063106 5.23 0.000 1.00216 2.202875
_cons | -.8223589 .2227875 -3.69 0.000 -1.259014 -.3857034
------------------------------------------------------------------------------
We can graph the predicted probabilities like this. Note since there are no covariates, we have only a single set of predicted probabilities.
postgr3 academic, table
-----------------------
academic | mean(yhat_)
----------+------------
0 | .3052632
1 | .6857143
-----------------------
We can compute the difference in the predicted probabilities like as (.6857143 - .3052632) = .3804511.
xi3: logit honors i.academic math science, nolog
i.academic _Iacademic_0-1 (naturally coded; _Iacademic_0 omitted)
Logit estimates Number of obs = 200
LR chi2(3) = 65.41
Prob > chi2 = 0.0000
Log likelihood = -105.91552 Pseudo R2 = 0.2359
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iacademic_1 | 1.193744 .363803 3.28 0.001 .4807031 1.906785
math | .0501096 .025738 1.95 0.052 -.000336 .1005553
science | .072061 .0228077 3.16 0.002 .0273587 .1167632
_cons | -6.979717 1.218391 -5.73 0.000 -9.367719 -4.591715
------------------------------------------------------------------------------
Now we will use postgr3 to create the adjusted probabilities.
postgr3 academic, table ylabel(0 .1 to 1) Holding math constant at 52.645 Holding science constant at 51.85
----------------------- academic | mean(yhat_) ----------+------------ 0 | .353124 1 | .6429999 -----------------------
We can compute the difference in predicted probabilities as (.6429999 - 353124) = .2898759.
postgr3 academic, table x(math=40 science=40) ylabel(0 .1 to 1) Holding math constant at 40 Holding science constant at 40
----------------------- academic | mean(yhat_) ----------+------------ 0 | .1097894 1 | .2892252 -----------------------
Note how the difference in the adjusted probabilities is not the same as above, (.2892252 - .1097894) = .1794358.
postgr3 academic, table x(math=60 science=60) ylabel(0 .1 to 1) Holding math constant at 60 Holding science constant at 60
----------------------- academic | mean(yhat_) ----------+------------ 0 | .5867432 1 | .8240834 -----------------------
Note how the difference in the adjusted probabilities yields a different value again, (.8240834 - .5867432) = .2373402.
To recap, here is a table showing the difference in the probabilities and how they changed based on the covariate patterns.
------------------------------------------ Covariates | Difference in Probabilities --------------+--------------------------- both at 40 | .179 both at mean | .289 both at 60 | .237 ------------------------------------------
Summary
In our simple model above, we have 2 covariates. Given that the pattern of results depends on the values of the covariates, how can we go about investigating a reasonable set of covariates to understand how our pattern of results depends on the covariates? Here are some options.
Let's re-run our model, this time using dummy variables and a manually constructed main effect.
. logit honors academic math science, nolog
Logit estimates Number of obs = 200
LR chi2(3) = 65.41
Prob > chi2 = 0.0000
Log likelihood = -105.91552 Pseudo R2 = 0.2359
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
academic | 1.193744 .363803 3.28 0.001 .4807031 1.906785
math | .0501096 .025738 1.95 0.052 -.000336 .1005553
science | .072061 .0228077 3.16 0.002 .0273587 .1167632
_cons | -6.979717 1.218391 -5.73 0.000 -9.367719 -4.591715
------------------------------------------------------------------------------
So, the statistical model, based on these results, is
Yhat = -6.97 + 1.19*academic + .05*math + .072*science
We can simplify the model to
Yhat = -6.97 + 1.19*academic + Covariate Contribution
where
Covariate Contribution = .05*math + .072*science
So, the Covariate Contribution when math is 40 and science is 60 is
.05*40 + .072*60 = 6.3200
and when math is 60 and science is 46.2 it is approximately the same
.05*60 + .072*46.2 = 6.3264.
To make a more general statement, if we are focusing on the variable x1 and we have a model like this.
Yhat = B0 + B1*x1 + B2*x2 + B3*x3 + B4*x4 + B5*x5 etc...
then the covariate contribution would be
Covariate Contribution = B2*x2 + B3*x3 + B4*x4 + B5*x5 etc...
Rather than fretting about the individual values of the covariates, the covariate contribution forms a composite index of the influence of the covariates on the value of Yhat. Using our data, we can calculate the covariate contribution by using a simple generate command, like this.
generate cc = .0501096*math + .072061*science
We can then inspect the covariate contributions using the summarize command.
summarize cc
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cc | 200 6.374383 1.07328 4.078408 8.946612
Here we use the centile command to get the 10th to 90th percentiles (in increments of 10).
centile cc, centile(10 20 30 40 50 60 70 80 90)
-- Binom. Interp. --
Variable | Obs Percentile Centile [95% Conf. Interval]
-------------+-------------------------------------------------------------
cc | 200 10 4.866447 4.716819 5.115421
| 20 5.325397 5.115421 5.60332
| 30 5.696632 5.441512 5.961735
| 40 6.163605 5.778728 6.346111
| 50 6.398439 6.229934 6.634359
| 60 6.751931 6.467883 6.959954
| 70 7.04013 6.885457 7.12026
| 80 7.248819 7.095433 7.521404
| 90 7.851421 7.550572 8.079131
We can select a range of values for the covariate contributions and explore the adjusted probabilities that are derived from that range of covariate contributions. We could then examine the differences in adjusted probabilities that result from a range of covariate contributions. There are probably many ways you could select an appropriate range, for example by taking the average covariate contribution and adding and subtracting one standard deviation, or choosing quartiles, or choosing percentiles such as the 10th percentile, 50th percentile, 90th percentile. Below, we choose to explore the values based on the 20th, 50th and 80th percentiles.
We could explore graphs of the adjusted probabilities with the covariate contributions ranging from 5.32 to 6.39 to 7.24. While we can specify values for specific covariates in postgr3, we cannot directly specify covariate contribution values. We need some kind of program that will let us visualize main effects like postgr3 but permit us to vary the covariate contribution. We have developed a suite of programs to help with this named vibl (for Visualizing Binary Logistic models).
--- View the movie that accompanies this section ---
The vibl suite of programs allow us to plot of the main effects like postgr3 but it permits us to enter the coefficients for all of the terms in the model, and vary the covariate contribution. If you have not done so already, you can download this suite of programs and the associated data files from our web site like this.
net from http://www.ats.ucla.edu/stat/stata/ado/analysis net install vibl net get vibl
Let's start by focusing on the program viblmdb. We can start this program by typing
viblmdb , b0(-6.97) b1(1.19) ccat(6.39) ccmin(5) ccmax(7.5)
or we can type
viblmdb
and once in the program we can use the point and click interface to select these values
The graph assumes you have a single dummy variable we are focusing on, x1 and the rest of the terms in the model are covariates. Note that we show the Version 8 style graphs in this page for aesthetics, but the default graphs are Version 7 graphs which are used for their great speed.

We are also shown a little table of predicted probabilities in the Stata results window.
x 0 1 -------------------- 0.36 (A) 0.65 (B) (B-A) = 0.29
For our given main effect, we have a family of graphs that vary depending on the size of the covariate contribution. We can imagine this family of graphs being printed on different pages of a book, and each page number corresponds to a different covariate contribution. If higher page numbers correspond to higher covariate contributions, then as you flip the pages forward in the book the predicted probabilities would rise and the shape of the main effect might begin to change.
By the way, sometimes it is easy to pass parameters to viblmdb to specify the starting values. For example
viblmdb , b0(-6.97) b1(1.19) ccat(6.39) ccmin(5) ccmax(7.5)
The meaning of these parameters are summarized below.
--- View the movie that accompanies this section ---
You might want to show graphs for a number of covariate contributions side by side so you can compare them. The viblmdb tool will allow you to do that.
Start viblmdb with these parameter values
viblmdb , b0(-6.97) b1(1.19) ccat(6.39) ccmin(5) ccmax(7.5)

In addition, the output window shows tables of probabilities that correspond to these three graphs. We can see that the differences in the predicted probabilities at the three covariate contribution values are .23, .29 and .24, a fairly consistent pattern. As a reminder, x is a stand in for the variable academic with 0 being a non-academic program and 1 being an academic program.
**For CC=5.32**
x
0 1
--------------------
0.16 (A) 0.39 (B) (B-A) = 0.23
**For CC=6.39**
x
0 1
--------------------
0.36 (A) 0.65 (B) (B-A) = 0.29
**For CC=7.24**
x
0 1
--------------------
0.57 (A) 0.81 (B) (B-A) = 0.24
--- View the movie that accompanies this section ---
While it is nice to be able to see three lines for three different covariate contributions at once, this is like viewing three pages from our imaginary flip book. However, you might want to be able to see all of the pages at once. While we could make such a graph three dimensionally, we have found a way to make such a graph two dimensionally, but it can be a bit tricky to interpret.
Click on Type II and then Show Plots and an additional graph appears.

On the x axis of the right graph you can see the covariate contribution ranging from 5 to 7.5 (because those are the values we chose in the dialog box for the min and max CC values). Also, note that three vertical lines are drawn. These correspond to values we chose on the CC list.
The vertical line at 5.32 in the right graph helps us map that graph to the blue line in the left graph. In the right graph, the dashed blue line is when X=0 and the red line is when X=1. Note in the right graph where the vertical line at CC=5.32 intersects with the blue and red lines corresponds to the blue line in the left graph. Likewise, where CC=6.39 in the right graph maps to the red line in the left graph, and the line at CC=7.24 corresponds to the green line in the left graph. So, the right graph contains the predicted probabilities for when X=0 and X=1 when CC is equal to 5.32, 6.39 and 7.24, but also all of hte points in between and some of the points beyond as well.
The important part of this graph is that you can see that the difference between the red and blue lines (i.e. the effect of X=0 vs X=1) is fairly constant across the levels of CC.
--- View the movie that accompanies this section ---
The Type II graph is pretty good at helping us see the predicted probabilities when X=0 and X=1, we might want to see a single line representing the difference between these predicted probabilities across the levels of the covariate contributions. We can do this with a Type III plot.

You can see the line in the Type III plot has a slight upside down U shape. Even so, it is fairly flat indicating that the difference between the two groups in their predicted probabilities is fairly constant across the spectrum of covariate contributions ranging from the 20th percentile (at 5.32) to the 80th percentile (at 7.24).
--- View the movie that accompanies this section ---
We can click on the up and dn to vary the covariate contribution from 5.3 to 7.2 and view all three graphs at once.



Each time we press Up to increment the CC it is like we are moving to a different page from our flip book corresponding to the different covariate contributions.
--- View the movie that accompanies this section ---
So far we have focused on visualizing the effects for a dummy variable, but viblmdb will allow you to visualize the main effects for a continuous predictor as well. Consider this model.
logit honors write math science, nolog
Logit estimates Number of obs = 200
LR chi2(3) = 66.44
Prob > chi2 = 0.0000
Log likelihood = -105.40164 Pseudo R2 = 0.2396
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
write | .0808476 .0242257 3.34 0.001 .0333661 .1283291
math | .0503773 .025635 1.97 0.049 .0001336 .100621
science | .0394677 .0226928 1.74 0.082 -.0050094 .0839448
_cons | -8.960344 1.347136 -6.65 0.000 -11.60068 -6.320005
------------------------------------------------------------------------------
Using the viblmcc command we can get a sense of the range of the covariate contributions.
viblmcc honors write math science
Percentiles for Covaraite Contribution
P1 P10 P20 P30 P40 P50 P60 P70 P80 P90 P99
3.243 3.624 3.903 4.205 4.492 4.69 4.994 5.123 5.356 5.777 6.401
With the summarize command, we get a sense of the range of the writing scores.
summarize write
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
write | 200 52.775 9.478586 31 67
With this information, we can start viblmdb to visualize the main effect of write.
viblmdb
After making these choices, we get this graph.

and we also get this output showing the predicted probabilities at 40 and 50.
**For CC=4.69**
write
40 50
--------------------
0.26 (A) 0.43 (B) (B-A) = 0.17
If we change the CC to 3.90 (the 20th percentile) and press Update Plots, we get this graph and output.

**For CC=3.9**
write
40 50
--------------------
0.13 (A) 0.26 (B) (B-A) = 0.13
Or if we can check CC List and specify 3.90 4.69 5.35 (the 20th, 50th, and 80th percentiles on the CC) and we get this graph and output.

**For CC=3.9**
write
40 50
--------------------
0.13 (A) 0.26 (B) (B-A) = 0.13
**For CC=4.69**
write
40 50
--------------------
0.26 (A) 0.43 (B) (B-A) = 0.17
**For CC=5.35**
write
40 50
--------------------
0.40 (A) 0.60 (B) (B-A) = 0.20
We can check Type II and Type III graphs and then we these graphs. The Type II graph shows two lines, one for write=40 and one for write=50. The Type III graph shows the difference in these two lines.

Note that for the Type II and Type III graphs, you can view many such graphs selecting different values of X to compare (this example compared X of 40 vs 50). This is in contrast to dummy variables, where you can only compare X=1 to X=0.
--- View the movie that accompanies this section ---
Let's start viblmdb again to look at more options.
viblmdb
Let's adjust the values for b0, b1 and and cc like this

**For CC=0**
x
0 1
--------------------
0.50 (A) 0.73 (B) (B-A) = 0.23

**For CC=2**
x
0 1
--------------------
0.88 (A) 0.95 (B) (B-A) = 0.07

**For CC=-2.5**
x
0 1
--------------------
0.08 (A) 0.18 (B) (B-A) = 0.10
--- View the movie that accompanies this section ---
Start viblmdb with these coefficients.
viblmdb , b0(-.5) b1(1) ccmin(-1) ccmax(1)

**For CC=0**
x
0 1
--------------------
0.38 (A) 0.62 (B) (B-A) = 0.24
Assume b1 of 1 is significant, but you are interested in exploring the patterns of the differences in predicted probabilities across the values of the covariate contributions from -1 to 1. At a covariate contribution of 0, the difference in the predicted probabilities is .24, but how would this change across covariate contributions? Say that the median CC is 0 and the 20th percentile is -1 and the 80th percentile is 1.

**For CC=-1**
x
0 1
--------------------
0.18 (A) 0.38 (B) (B-A) = 0.20
**For CC=0**
x
0 1
--------------------
0.38 (A) 0.62 (B) (B-A) = 0.24
**For CC=1**
x
0 1
--------------------
0.62 (A) 0.82 (B) (B-A) = 0.20

Summary. For this pattern of results, the difference in predicted probabilities is consistently positive across the levels of the covariate contribution, holding within a fairly narrow range of .20 to .24.
--- View the movie that accompanies this section ---
Let's examine a different set of coefficients and this time say the covariate contribution ranges from -1 (at the 20th percentile) to 1 (at the median) to 3 (at the 80th percentile). We start with a graph with CC at the median with CC=1.
viblmdb , b0(-1) b1(1.5) ccat(1) ccmin(-1) ccmax(3)
**For CC=1**
x
0 1
--------------------
0.50 (A) 0.82 (B) (B-A) = 0.32
The difference in the predicted probabilities is fairly substantial at .32. But let's see how this holds when the covariate contribution is at -1, 1, and 3

**For CC=-1**
x
0 1
--------------------
0.12 (A) 0.38 (B) (B-A) = 0.26
**For CC=1**
x
0 1
--------------------
0.50 (A) 0.82 (B) (B-A) = 0.32
**For CC=3**
x
0 1
--------------------
0.88 (A) 0.97 (B) (B-A) = 0.09
Note how the difference in the predicted probabilities changes quite a bit
across the different CC values, ranging from 0.09 to .32.

The Type II and Type III graphs confirm the way the difference in the predicted probabilities change across the levels of the covariate contribution.
Summary The differences in the predicted probabilities in this example depend on the covariate contribution. The difference is greatest when the covariate contribution is low (e.g. between -1 and 1) and diminishes as the CC increases (e.g. increases from 1 to 3). It would seem important to factor in the covariate contribution when interpreting this relationship.
--- View the movie that accompanies this section ---
Consider the results of this model
viblmdb , b0(-1) b1(1.5) ccat(1) ccmin(-1) ccmax(3)


The output includes the tables of the predicted probabilities and predicted logiits, confirming what we see in the graph.
(note, these are in probability scale)
**For CC=-1**
x
0 1
--------------------
0.12 (A) 0.38 (B) (B-A) = 0.26
**For CC=1**
x
0 1
--------------------
0.50 (A) 0.82 (B) (B-A) = 0.32
**For CC=3**
x
0 1
--------------------
0.88 (A) 0.97 (B) (B-A) = 0.09
(note, these are in logit scale)
**For CC=-1**
x
0 1
--------------------
-2.00 (A) -0.50 (B) (B-A) = 1.50
**For CC=1**
x
0 1
--------------------
0.00 (A) 1.50 (B) (B-A) = 1.50
**For CC=3**
x
0 1
--------------------
2.00 (A) 3.50 (B) (B-A) = 1.50
Say that we run this model
viblmdb , b0(-1) b1(1.5) ccat(1) ccmin(-1) ccmax(3)
Say that we want to get a nice looking version 8 graph. We can click on the Version 8 button and then we see a graph that looks like this.

So, you may ask, why do we create version 7 graphs by default instead of version 8 graphs. Even though Version 8 graphs look terrific, they are not nearly as fast version 7 graphs. For the sake of speed, we use version 7 graphs by default, but give you the option of making version 8 graphs in case you desire a good looking graph. However, you might want to customize these graphs further. The next section shows you how.
Say that we use viblmdb like below.
viblmdb , b0(-1) b1(1.5) ccat(1) ccmin(-1) ccmax(3)

Say that we like the graph but would like to tinker with it. We can press the Paste Syntax button and the program will display the following message.
Syntax has been pasted to _viblm_paste_syntax.do file.
The syntax for creating the graph has been placed into the file named _vibli_paste_syntax.do and we can view the file with the type command as shown below.
type _viblm_paste_syntax.do /*--------------------------------------------- Session starts at 18:59:57 26 Oct 2004. ----------------------------------------------*/ viblmgraph, b0(-1) b1(1.5) ccmin(-1) ccmax(3) xmin(0) xmax(1) xata(0) xatb(1) ab nodraw xname(x) ccat(1) type(1) name(g1, replace) graph combine g1
You can then edit this file with the do file editor to tailor the graph to your liking. You can see help viblmgraph for more options for customizing these graphs. You can also supply regular Stata graph options at the end of the command. This is discussed further in section 10.
We have seen how we can make graphs for various covariate contributions. Rather than approaching this as a teaching tool, let's see how you can use this as a research tool to create publication quality graphs. Let's use the data file vibl
use http://www.ats.ucla.edu/stat/stata/seminars/stata_vibl/hsbvibl, clear
Then, say we want to run the following logit model.
logit honors academic math science, nolog
Logit estimates Number of obs = 200
LR chi2(3) = 65.41
Prob > chi2 = 0.0000
Log likelihood = -105.91552 Pseudo R2 = 0.2359
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
academic | 1.193744 .363803 3.28 0.001 .4807031 1.906785
math | .0501096 .025738 1.95 0.052 -.000336 .1005553
science | .072061 .0228077 3.16 0.002 .0273587 .1167632
_cons | -6.979717 1.218391 -5.73 0.000 -9.367719 -4.591715
------------------------------------------------------------------------------
We can then use the viblmcc command to compute the covariate contribution creating the variable mycc1.
viblmcc honors academic math science, generate(mycc1)
Saving covariate contribution as mycc1
Percentiles for Covaraite Contribution
P1 P10 P20 P30 P40 P50 P60 P70 P80 P90 P99
4.146 4.866 5.325 5.697 6.164 6.398 6.752 7.04 7.249 7.851 8.629
The program creates the variable mycc1 and it also displays percentiles of the covariate contribution going from 1 10 and incrementing by 10s until 90 and then 99.
We can run the viblmcc command and add the graph option and it then calls viblmgraph to display the graph of the predicted probabilities holding the covariate contribution at the 20th, 50th and 80th percentiles.
viblmcc honors academic math science, graph
Percentiles for Covaraite Contribution
P1 P10 P20 P30 P40 P50 P60 P70 P80 P90 P99
4.146 4.866 5.325 5.697 6.164 6.398 6.752 7.04 7.249 7.851 8.629
. viblmgraph , b0(-6.98) b1(1.194) ccat(5.325 6.398 7.249) /// xmin(0) xmax(1) xname(academic) **For CC=5.325** academic 0 1 -------------------- 0.16 (A) 0.39 (B) (B-A) = 0.23 **For CC=6.398** academic 0 1 -------------------- 0.36 (A) 0.65 (B) (B-A) = 0.29 **For CC=7.249** academic 0 1 -------------------- 0.57 (A) 0.81 (B) (B-A) = 0.24
Above we see the output from the viblmgraph command (as a result of
using the graph option) and below we see the graph produced. Three lines
are shown for the 20th, 50th and 80th percentiles of the covariate contribution.

However, we might want to explore the range of covariate contributions (not just display a graph where the covariate contribution is at the median). Instead of adding the graph option, we can add the db option and that will start viblmdb as shown below. The min and max CC values are set to the 20th and 80th percentiles of the covariate contribution and the coefficients from the model are automatically filled in and the covariate contribution is set to start at the median. You can then vary the covariate contribution values or view the different types of graphs to better understand your results.
viblmcc honors academic public acpub math science, db

The above command is equivalent to typing
viblib , b0(-6.98) b1(1.19) ccat(6.39) ccmin(5.32) ccmax(7.25)
Here is a graph where we supply the coefficients.
viblmgraph, b0(-6.98) b1(1.19) ccat(6.39)

Here is a graph where we supply the coefficients and increase the covariate contribution to 7.25
viblmgraph, b0(-6.98) b1(1.19) ccat(7.25)
Here is a graph where we change the label for the y axis via ylabel(.6 .7 to 1) You can add graph options at the end of the viblmgraph command.
viblmgraph, b0(-6.98) b1(1.19) ccat(7.25) ylabel(.5 .6 to 1)
Here is the same graph in logit scale and adjusting the labeling of the y axis.
viblmgraph, b0(-6.98) b1(1.19) ccat(7.25) logit ylabel(0 .25 to 1.5)
This example shows how you can label the y axis, use a different title for the x axis, add a title, add a note to the graph and so on. As you can see, you can add just about any graph option you desire to customize the graph.
viblmgraph , b0(-6.98) b1(1.19) ccat(6.39) ylabel(.3 .35 to .7) ///
ytitle("Predicted Probability of Honors") ///
xtitle("Academic") xlabel(0 "No" 1 "Yes") ///
title(Predicted Probabilities by Academic) ///
note(Contribution of Covariates at the median)
Here is a simpler version of the same graph but shown as a Type II graph.
viblmgraph , b0(-6.98) b1(1.19) ccat(5.32 6.39 7.25) ccmin(5) ccmax(7.5) type(2)
Here is the same graph but shown as a Type III graph.
viblmgraph , b0(-6.98) b1(1.19) ccat(5.32 6.39 7.25) ccmin(5) ccmax(7.5) type(3)
You can see help viblmgraph to learn more about the syntax for using it.
Although all of the discussion has focused on dummy variables, you can use viblmgraph with continuous variables.
First, lets look at an analysis using xi3 and postgr3 where we focus on the main effect of write, a continuous main effect.
use http://www.ats.ucla.edu/stat/stata/seminars/stata_vibl/hsbvibl, clear
xi3: logit honors write math science, nolog
Logit estimates Number of obs = 200
LR chi2(3) = 66.44
Prob > chi2 = 0.0000
Log likelihood = -105.40164 Pseudo R2 = 0.2396
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
write | .0808476 .0242257 3.34 0.001 .0333661 .1283291
math | .0503773 .025635 1.97 0.049 .0001336 .100621
science | .0394677 .0226928 1.74 0.082 -.0050094 .0839448
_cons | -8.960344 1.347136 -6.65 0.000 -11.60068 -6.320005
------------------------------------------------------------------------------
postgr3 write

Now let's consider this using the vibl tools. We can use viblmcc to get a sense of the range of the covariate values.
viblmcc honors write math science
Percentiles for Covaraite Contribution
P1 P10 P20 P30 P40 P50 P60 P70 P80 P90 P99
3.243 3.624 3.903 4.205 4.492 4.69 4.994 5.123 5.356 5.777 6.401
And we summarize write to get a sense of the range of the writing scores.
summ write
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
write | 200 52.775 9.478586 31 67
Now, we can pull together this information for the viblmgraph command. We get the coefficients from the logit output, the ccat() value is the median covariate contribution from the viblmcc command above and the xmin() and xmax() are selected at 40 and 65 to make sure they are within the min and max for both groups.
viblmgraph , b0(-8.96) b1(0.08) ccat(4.69) xmin(35) xmax(65)
We then repeat the command for the CC at the 20th, 50th, and 80th percentiles.
viblmgraph , b0(-8.96) b1(0.08) ccat(3.90 4.69 5.35) xmin(35) xmax(65)
We can specify the xata(50) and xatb(60) option to indicate we want to get the predicted probabilities when x is 50 and 60. We do this for the CC at the 20th, 50th, and 80th percentiles, shown below.
viblmgraph , b0(-8.96) b1(0.08) ccat(3.90 4.69 5.35) /// xmin(35) xmax(65) xata(50) xatb(60) ab
**For CC=3.9**
x
50 60
--------------------
0.26 (A) 0.44 (B) (B-A) = 0.18
**For CC=4.69**
x
50 60
--------------------
0.43 (A) 0.63 (B) (B-A) = 0.20
**For CC=5.35**
x
50 60
--------------------
0.60 (A) 0.77 (B) (B-A) = 0.17
The change in the predicted probability as write changes from 50 to 60 is fairly constant for these three levels of the covariate contribution. But what if we compare writing scores from 40 to 50? We examine this below.
viblmgraph , b0(-8.96) b1(0.08) ccat(3.90 4.69 5.35) xmin(35) xmax(65) xata(40) xatb(50) ab
**For CC=3.9**
x
40 50
--------------------
0.13 (A) 0.26 (B) (B-A) = 0.13
**For CC=4.69**
x
40 50
--------------------
0.26 (A) 0.43 (B) (B-A) = 0.17
**For CC=5.35**
x
40 50
--------------------
0.40 (A) 0.60 (B) (B-A) = 0.20
You can see that when we compare the predicted probabilities as writing changes from 40 to 50, the change in the predicted probability ranges from .13 to .20. For this range of writing scores, the change in predicted probability depends more highly on the covariate contribution.
We have used xi3 and postgr3 to compute adjusted means, but these programs should not be treated as some black box. Here is a demonstration of how these programs work by showing how to manually obtain the values. Since xi3 and postgr3 are the underlying technology beneath vibl and graphengine , you can use this logic to see how you can extend this logic to other kinds of models.
First, consider this model run via xi3.
use http://www.ats.ucla.edu/stat/stata/seminars/stata_vibl/hsbvibl, clear
xi3: logit honors i.academic math science, nolog
i.academic _Iacademic_0-1 (naturally coded; _Iacademic_0 omitted)
Logit estimates Number of obs = 200
LR chi2(3) = 65.41
Prob > chi2 = 0.0000
Log likelihood = -105.91552 Pseudo R2 = 0.2359
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iacademic_1 | 1.193744 .363803 3.28 0.001 .4807031 1.906785
math | .0501096 .025738 1.95 0.052 -.000336 .1005553
science | .072061 .0228077 3.16 0.002 .0273587 .1167632
_cons | -6.979717 1.218391 -5.73 0.000 -9.367719 -4.591715
------------------------------------------------------------------------------
Now we generate the adjusted probabilities using postgr3.
postgr3 academic, table
Variables left asis: _Ipublic_1 _Iacademic_1 _Iac1Xpu1 Holding math constant at 52.645 Holding science constant at 51.85
-----------------------
academic | mean(yhat_)
----------+------------
0 | .353124
1 | .6429999
-----------------------
Now we run the logit model manually (with manually constructed main effect terms).
logit honors academic math science, nolog
Logit estimates Number of obs = 200
LR chi2(3) = 65.41
Prob > chi2 = 0.0000
Log likelihood = -105.91552 Pseudo R2 = 0.2359
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
academic | 1.193744 .363803 3.28 0.001 .4807031 1.906785
math | .0501096 .025738 1.95 0.052 -.000336 .1005553
science | .072061 .0228077 3.16 0.002 .0273587 .1167632
_cons | -6.979717 1.218391 -5.73 0.000 -9.367719 -4.591715
------------------------------------------------------------------------------
We will use the preserve command so we can restore our data back to its original state.
preserve
Now we will replace math with the mean of math, and science with the mean of science.
summarize math replace math = r(mean) summarize science replace science = r(mean)
At this point, the variable math contains the mean of math and science contains the mean of science. Now, when we use the predict command, the predictions will be based on the average value of math and science. So, we issue predict yhat that creates the adjusted probability, holding math and science at their mean.
predict yhat (option p assumed; Pr(honors))
Now we can graph the results.
graph twoway line yhat academic

Here we show the tables of predicted probabilities
table academic, contents(mean yhat)
----------------------
academic | mean(yhat)
----------+-----------
0 | .353124
1 | .6429999
----------------------
We can then use the restore command to restore the data file back to its original state (namely putting math and science back to their original values).
restore
Here is how we made the hsbvibl data file.
use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear generate honors = socst > 51 generate public = schtyp==1 generate academic = prog==2 gen acpub = academic*public save hsbvibl, replace
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services