UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Library
Categorical and Count Data Analysis Utilities

Introduction

Interpreting the results of categorical (logistic, probit, ordered logistic, and multinomial logistic) and count data (poisson and negative binomial) regression analyses can be a difficult and demanding task. J. Scott Long (www.indiana.edu/~jsl650/) and Jeremy Freese (www.ssc.wisc.edu/~jfreese/) have written a marvelous collection of utilities to make the process easier and less daunting. You may also want to check out their book due out August 16, 2001 Long, J.S. & Freese, J (2001) Regression Models for Categorical Dependent Variables. College Station, TX: Stata Press.

Obtaining the Programs

We will be demonstrating the Stata 7 version of the commands.  You can download these programs from within Stata by typing findit spostado (see How can I used the findit command to search for programs and get additional help? for more information about using findit). You can then get more information by typing
help spostado

-------------------------------------------------------------------------------------
 spostado: SPost ado files for post-estimation interpretation of regression
 models for categorical and limited dependent variables.
-------------------------------------------------------------------------------------

 Commands by J. Scott Long and Jeremy Freese. These commands will work with
 the estimation commands: clogit, cloglog, cnreg, intreg, logit, mlogit,
 nbreg, ologit, oprobit, poisson, probit, regress, tobit, zinb,
 zip, and in most cases gologit. The commands are:

    fitstat computes fit statistics following model estimation.

    listcoef lists factor change, percent change coefficients, and
        standardized coefficients. The help option provides details on
        interpretation.

    mlogtest computes a variety of tests useful for the mlogit model.

    mlogview opens a dialog plot for constructing discrete change plots
        and odds ratio plots for mlogit.

    prchange computes the marginal and discrete change coefficients.

    prcounts computes predicted rates and probabilities for count models.

    prgen computes predicted values for a range of values. These values
        can then be plotted.

    prtab computes a table of predicted values for given values of the
        independent variables.

    prvalue computes predicted values for set values of the independent
        variables.

-------------------------------------------------------------------------------

 Authors: J. Scott Long - jslong@indiana.edu - www.indiana.edu/~jsl650
          Jeremy Freese - jfreese@ssc.wisc.edu

Logistic Regression Examples

We will use a dataset called hsblog which was derived from the 1987 High School and Beyond Survey. The response variable is honcomp which indicates whether students werre in honors composition or not. We will use female, read (standardized reading test) and math (standardized math test) as covariates.
use http://www.ats.ucla.edu/stat/stata/webbooks/logistic/hsblog
(highschool and beyond (200 cases))

describe female read math honcomp

              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
female          float  %9.0g       fl         
read            float  %9.0g                  reading score
math            float  %9.0g                  math score
honcomp         float  %9.0g                  honors composition

logit honcomp female read math

Iteration 0:   log likelihood = -115.64441
Iteration 1:   log likelihood = -79.367198
Iteration 2:   log likelihood = -75.475342
Iteration 3:   log likelihood = -75.211846
Iteration 4:   log likelihood = -75.209827

Logit estimates                                   Number of obs   =        200
                                                  LR chi2(3)      =      80.87
                                                  Prob > chi2     =     0.0000
Log likelihood = -75.209827                       Pseudo R2       =     0.3496

------------------------------------------------------------------------------
     honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   1.154801   .4340856     2.66   0.008      .304009    2.005593
        read |   .0752424    .027577     2.73   0.006     .0211924    .1292924
        math |   .1317117   .0324607     4.06   0.000       .06809    .1953335
       _cons |  -13.12749   1.850769    -7.09   0.000    -16.75493    -9.50005
------------------------------------------------------------------------------

logistic honcomp female read math

Logit estimates                                   Number of obs   =        200
                                                  LR chi2(3)      =      80.87
                                                  Prob > chi2     =     0.0000
Log likelihood = -75.209827                       Pseudo R2       =     0.3496

------------------------------------------------------------------------------
     honcomp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   3.173393   1.377524     2.66   0.008     1.355281    7.430502
        read |   1.078145   .0297321     2.73   0.006     1.021419    1.138023
        math |   1.140779   .0370305     4.06   0.000     1.070462    1.215716
------------------------------------------------------------------------------
These results are pretty straight forward but there are other ways of looking at the coefficients. Let's try the listcoef command with the const option.
listcoef, const

logistic (N=200): Factor Change in Odds 

  Odds of: 1 vs 0

----------------------------------------------------------------------
     honcomp |      b         z     P>|z|    e^b    e^bStdX      SDofX
-------------+--------------------------------------------------------
      female |   1.15480    2.660   0.008   3.1734   1.7798     0.4992
        read |   0.07524    2.728   0.006   1.0781   2.1629    10.2529
        math |   0.13171    4.058   0.000   1.1408   3.4347     9.3684
       _cons | -13.12749   -7.093   0.000
----------------------------------------------------------------------
The column labeled 'b' (above) are the logistic regression coefficients. The column 'z' is the Wald z-test that the coefficient is different from zero and the next column (P>|z|) gives the p-value for the z-test. The column 'e^b' are e raised to the logistic coefficient power (exponentiated coefficients) which are the odds ratios. Next comes 'e^bStdX' which gives the odds ratios for a one standard deviation change in the covariate. Finally, there is a column giving the standard deviation of the covariate (SDofX).

Let's try listcoef again, this time with the std option.

listcoef, const std

logistic (N=200): Unstandardized and Standardized Estimates 

 Observed SD: .4424407
   Latent SD: 2.1498873

  Odds of: 1 vs 0

-------------------------------------------------------------------------------
     honcomp |      b         z     P>|z|    bStdX    bStdY   bStdXY      SDofX
-------------+-----------------------------------------------------------------
      female |   1.15480    2.660   0.008   0.5765   0.5371   0.2682     0.4992
        read |   0.07524    2.728   0.006   0.7715   0.0350   0.3588    10.2529
        math |   0.13171    4.058   0.000   1.2339   0.0613   0.5740     9.3684
       _cons | -13.12749   -7.093   0.000
-------------------------------------------------------------------------------
The columns 'b', 'z' and 'P>|z|' are the same as before. The next column, 'bStdX', gives the amount of change in the log odds for a one standard deviation change in the covariate. The column 'bStdY' gives the amount of change, in standard deviations, in the log odds for a one unit change in the covariate. The next column, 'bStdXY', gives the amount of change, in standard deviations, in the log odds for a one standard deviation change in the covariate. Finally, there is a column giving the standard deviation of the covariate (SDofX).

The fitstat command presents a number of indices for model fit.

fitstat

Measures of Fit for logit of honcomp

Log-Lik Intercept Only:     -115.644     Log-Lik Full Model:          -75.210
D(196):                      150.420     LR(3):                        80.869
                                         Prob > LR:                     0.000
McFadden's R2:                 0.350     McFadden's Adj R2:             0.315
Maximum Likelihood R2:         0.333     Cragg & Uhler's R2:            0.485
McKelvey and Zavoina's R2:     0.524     Efron's R2:                    0.379
Variance of y*:                6.912     Variance of error:             3.290
Count R2:                      0.830     Adj Count R2:                  0.358
AIC:                           0.792     AIC*n:                       158.420
BIC:                        -888.051     BIC':                        -64.974
This table contains the log-likelihood for a model with the intercept only (-115.644) and for the full model (-75.210). These measures are followed by the deviance (150.420) with degrees of freedom. Deviance in logistic regression is analogous to the sum of squared residuals in OLS regression. The second line also contains the likelihood ratio chi-square (80.869) with degrees of freedom (3), which tests whether the model with three covariates predicts better than a model with intercept only. The p-value for this chi-square (0.000) is on the next line.

Next comes a series of pseudo-R2 measures. Unlike OLS regression, logistic regression does not have a single measure that reflects proportion of the variance accounted for. The different R2 measures each have their own interpretation. The item labeled 'Variance of y*' is the variance of the underlying latent dependent variable while 'Variance or error' for logistic regression is p2/3 (which is approximately 3.29).

Finally, there are several information criterion measures including Akaike's Information Criterion (AIC & AIC*n) and Bayesian Information Criterion (BIC & BIC'). These are useful in comparing models with the same response variable but different covariates.

Now, let's move on to another program prchange. prchange computes discrete and marginal change for categorical and count regression models.

prchange

logit: Changes in Predicted Probabilities for honcomp

        min->max      0->1     -+1/2    -+sd/2  MargEfct
female    0.1539    0.1539    0.1592    0.0789    0.1577
  read    0.5079    0.0003    0.0103    0.1058    0.0103
  math    0.7731    0.0000    0.0180    0.1703    0.0180

              0       1
Pr(y|x)  0.8368  0.1632

         female     read     math
    x=     .545    52.23   52.645
sd(x)=   .49922  10.2529  9.36845
For each covariate, prchange computes the min/max change, a zero/one change, a one unit change, a one standard deviation change, and the marginal effect (instantaneous change) in the probability.

With the fromto option, prchange displays the beginning and ending probabilities in addition to the change in probability.

prchange, fromto

logit: Changes in Predicted Probabilities for honcomp

            from:       to:      dif:     from:       to:      dif:     from:
           x=min     x=max  min->max       x=0       x=1      0->1     x-1/2
female    0.0942    0.2481    0.1539    0.0942    0.2481    0.1539    0.0987
  read    0.0305    0.5385    0.5079    0.0038    0.0041    0.0003    0.1582
  math    0.0145    0.7875    0.7731    0.0002    0.0002    0.0000    0.1545

              to:      dif:     from:       to:      dif:          
           x+1/2     -+1/2   x-1/2sd   x+1/2sd    -+sd/2  MargEfct
female    0.2579    0.1592    0.1276    0.2065    0.0789    0.1577
  read    0.1685    0.0103    0.1171    0.2230    0.1058    0.0103
  math    0.1724    0.0180    0.0952    0.2656    0.1703    0.0180

              0       1
Pr(y|x)  0.8368  0.1632

         female     read     math
    x=     .545    52.23   52.645
sd(x)=   .49922  10.2529  9.36845
A great feature of prchange is the ability to display results for specific values of the covariates. Let's look at the prchange for females with reading scores of 60.
prchange, x(female=1 read=60)

logit: Changes in Predicted Probabilities for honcomp

        min->max      0->1     -+1/2    -+sd/2  MargEfct
female    0.2146    0.2146    0.2639    0.1339    0.2697
  read    0.6131    0.0005    0.0176    0.1784    0.0176
  math    0.8757    0.0001    0.0308    0.2811    0.0308

              0       1
Pr(y|x)  0.6281  0.3719

         female     read     math
    x=        1       60   52.645
sd(x)=   .49922  10.2529  9.36845
The prtab command is used to display the predicted probabilities for different values of the covariates. Let's look at female first.
prtab female

logit: Predicted probabilities of positive outcome for honcomp

----------------------
   female | Prediction
----------+-----------
     male |     0.0942
   female |     0.2481
----------------------

    female    read    math
x=    .545   52.23  52.645
The table gives the probability of males and females being in honcomp 1 with the other variables (read and math) held at their mean.

Next we will try the same thing for the variable read while holding female and math at their mean.

prtab read

logit: Predicted probabilities of positive outcome for honcomp

----------------------
reading   |
score     | Prediction
----------+-----------
       28 |     0.0305
       31 |     0.0380
       34 |     0.0472
       35 |     0.0507
       36 |     0.0544
       37 |     0.0584
       39 |     0.0673
       41 |     0.0773
       42 |     0.0829
       43 |     0.0888
       44 |     0.0950
       45 |     0.1017
       46 |     0.1088
       47 |     0.1163
       48 |     0.1243
       50 |     0.1416
       52 |     0.1609
       53 |     0.1713
       54 |     0.1823
       55 |     0.1937
       57 |     0.2183
       60 |     0.2593
       61 |     0.2740
       63 |     0.3049
       65 |     0.3377
       66 |     0.3548
       68 |     0.3899
       71 |     0.4447
       73 |     0.4821
       76 |     0.5385
----------------------

    female    read    math
x=    .545   52.23  52.645
Now, let's combine female and reading into a single table.
prtab read female

logit: Predicted probabilities of positive outcome for honcomp

--------------------------
reading   |     female    
score     |   male  female
----------+---------------
       28 | 0.0165  0.0506
       31 | 0.0206  0.0626
       34 | 0.0257  0.0772
       35 | 0.0277  0.0828
       36 | 0.0297  0.0887
       37 | 0.0320  0.0949
       39 | 0.0370  0.1087
       41 | 0.0428  0.1241
       42 | 0.0459  0.1326
       43 | 0.0494  0.1414
       44 | 0.0530  0.1508
       45 | 0.0569  0.1607
       46 | 0.0611  0.1711
       47 | 0.0656  0.1821
       48 | 0.0703  0.1936
       50 | 0.0808  0.2181
       52 | 0.0927  0.2449
       53 | 0.0992  0.2591
       54 | 0.1062  0.2738
       55 | 0.1135  0.2890
       57 | 0.1296  0.3208
       60 | 0.1572  0.3719
       61 | 0.1675  0.3896
       63 | 0.1895  0.4259
       65 | 0.2137  0.4631
       66 | 0.2266  0.4818
       68 | 0.2541  0.5194
       71 | 0.2992  0.5753
       73 | 0.3316  0.6116
       76 | 0.3834  0.6637
--------------------------

    female    read    math
x=    .545   52.23  52.645
If we wish to add the predicted probabilites to our data we can use the prgen command. Let's generate the predicted probabilities for each of the values of reading while holding the other variables at their mean.
prgen read, gen(nr) from(28) to(76) ncases(49)

logit: Predicted values as read varies from 28 to 76.

    female    read    math
x=    .545   52.23  52.645

list nrx nrp0 nrp1 in 1/50

           nrx       nrp0       nrp1
  1.        28   .9694503   .0305497
  2.        29   .9671414   .0328586
  3.        30   .9646644   .0353356
  4.        31    .962008    .037992
  5.        32   .9591603   .0408397
  6.        33   .9561089    .043891
  7.        34   .9528408   .0471592
  8.        35   .9493423   .0506578
  9.        36   .9455989   .0544011
 10.        37    .941596    .058404
 11.        38   .9373181   .0626819
 12.        39   .9327492   .0672508
 13.        40   .9278729   .0721271
 14.        41   .9226723   .0773277
 15.        42   .9171303   .0828697
 16.        43   .9112293   .0887707
 17.        44   .9049516   .0950484
 18.        45   .8982795   .1017204
 19.        46   .8911955   .1088045
 20.        47   .8836819   .1163181
 21.        48   .8757218   .1242782
 22.        49   .8672988   .1327012
 23.        50   .8583972   .1416027
 24.        51   .8490025   .1509975
 25.        52   .8391013   .1608987
 26.        53   .8286819   .1713181
 27.        54   .8177343   .1822657
 28.        55   .8062507   .1937493
 29.        56   .7942256   .2057744
 30.        57   .7816563   .2183437
 31.        58    .768543    .231457
 32.        59   .7548891   .2451109
 33.        60   .7407016   .2592985
 34.        61   .7259908   .2740092
 35.        62   .7107713   .2892287
 36.        63   .6950616   .3049384
 37.        64   .6788841   .3211159
 38.        65   .6622654   .3377346
 39.        66   .6452361   .3547639
 40.        67   .6278306   .3721694
 41.        68   .6100873   .3899128
 42.        69   .5920476   .4079524
 43.        70   .5737565   .4262435
 44.        71   .5552613   .4447387
 45.        72   .5366117   .4633883
 46.        73   .5178592   .4821408
 47.        74   .4990562   .5009438
 48.        75   .4802559   .5197441
 49.        76   .4615113   .5384887
 50.         .          .          .
The column 'nrx' gives the values of read, 'nrp0' the probability of being in honcomp 0, and 'nrp1' the probability of being in honcomp 1. Note that the column 'nrp1' is the same as the results from prtab read (above), just with more decimal places.

The prvalue command is used to compute the conditional probability being in each of the levels of response variable. In this first example, we will hold all of the covariates at their mean value.

prvalue
 
logit: Predictions for honcomp

  Pr(y=1|x):          0.1632   95% ci: (0.1068,0.2415)
  Pr(y=0|x):          0.8368   95% ci: (0.7585,0.8932)

    female    read    math
x=    .545   52.23  52.645
Next, let's look at the predicted values for males and females while holding reading and math at their means.
prvalue, x(female=0)
 
logit: Predictions for honcomp

  Pr(y=1|x):          0.0942   95% ci: (0.0468,0.1805)
  Pr(y=0|x):          0.9058   95% ci: (0.8195,0.9532)

    female    read    math
x=       0   52.23  52.645

prvalue, x(female=1)
 
logit: Predictions for honcomp

  Pr(y=1|x):          0.2481   95% ci: (0.1604,0.3630)
  Pr(y=0|x):          0.7519   95% ci: (0.6370,0.8396)

    female    read    math
x=       1   52.23  52.645
Now, let's look at the predicted values for males and females when all of the other variables are held are their maximum values.
prvalue, x(female=0) rest(max)
 
logit: Predictions for honcomp

  Pr(y=1|x):          0.9220   95% ci: (0.7812,0.9751)
  Pr(y=0|x):          0.0780   95% ci: (0.0249,0.2188)

    female    read    math
x=       0      76      75

prvalue, x(female=1) rest(max)
 
logit: Predictions for honcomp

  Pr(y=1|x):          0.9740   95% ci: (0.9097,0.9929)
  Pr(y=0|x):          0.0260   95% ci: (0.0071,0.0903)

    female    read    math
x=       1      76      75

Multinomial Logistic Regression

Now let's tray a multinomial logistic regression using prog as the response variable. prog has three levels; general, academic and vocational. We will continue to use female, read, and math as covariates even though female is not significant in any of the equations.
mlogit prog female read math

Iteration 0:   log likelihood = -204.09667
Iteration 1:   log likelihood = -177.18429
Iteration 2:   log likelihood = -175.51578
Iteration 3:   log likelihood = -175.46792
Iteration 4:   log likelihood = -175.46786

Multinomial regression                            Number of obs   =        200
                                                  LR chi2(6)      =      57.26
                                                  Prob > chi2     =     0.0000
Log likelihood = -175.46786                       Pseudo R2       =     0.1403

------------------------------------------------------------------------------
        prog |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
general      |
      female |  -.1565038   .3793112    -0.41   0.680    -.8999401    .5869325
        read |  -.0332741   .0247308    -1.35   0.178    -.0817457    .0151975
        math |    -.06964   .0280339    -2.48   0.013    -.1245854   -.0146946
       _cons |   4.707234   1.332392     3.53   0.000     2.095794    7.318675
-------------+----------------------------------------------------------------
vocation     |
      female |  -.1843515   .3973611    -0.46   0.643     -.963165    .5944619
        read |  -.0572131   .0265751    -2.15   0.031    -.1092994   -.0051268
        math |  -.1180523   .0309737    -3.81   0.000    -.1787597   -.0573449
       _cons |   8.303956   1.491516     5.57   0.000     5.380637    11.22727
------------------------------------------------------------------------------
(Outcome prog==academic is the comparison group)
Next, we will use listcoef, fitstat, and prchange on the mlogit.
listcoef

mlogit (N=200): Factor Change in the Odds of prog 

Variable: female (sd=  .49922)

    Odds comparing|
Group 1 vs Group 2|      b         z     P>|z|     e^b   e^bStdX
------------------+---------------------------------------------
general -vocation |   0.02785    0.066   0.947   1.0282   1.0140
general -academic |  -0.15650   -0.413   0.680   0.8551   0.9248
vocation-general  |  -0.02785   -0.066   0.947   0.9725   0.9862
vocation-academic |  -0.18435   -0.464   0.643   0.8316   0.9121
academic-general  |   0.15650    0.413   0.680   1.1694   1.0813
academic-vocation |   0.18435    0.464   0.643   1.2024   1.0964
----------------------------------------------------------------

Variable: read (sd= 10.2529)

    Odds comparing|
Group 1 vs Group 2|      b         z     P>|z|     e^b   e^bStdX
------------------+---------------------------------------------
general -vocation |   0.02394    0.846   0.397   1.0242   1.2782
general -academic |  -0.03327   -1.345   0.178   0.9673   0.7109
vocation-general  |  -0.02394   -0.846   0.397   0.9763   0.7824
vocation-academic |  -0.05721   -2.153   0.031   0.9444   0.5562
academic-general  |   0.03327    1.345   0.178   1.0338   1.4066
academic-vocation |   0.05721    2.153   0.031   1.0589   1.7979
----------------------------------------------------------------

Variable: math (sd= 9.36845)

    Odds comparing|
Group 1 vs Group 2|      b         z     P>|z|     e^b   e^bStdX
------------------+---------------------------------------------
general -vocation |   0.04841    1.470   0.142   1.0496   1.5739
general -academic |  -0.06964   -2.484   0.013   0.9327   0.5208
vocation-general  |  -0.04841   -1.470   0.142   0.9527   0.6354
vocation-academic |  -0.11805   -3.811   0.000   0.8886   0.3309
academic-general  |   0.06964    2.484   0.013   1.0721   1.9202
academic-vocation |   0.11805    3.811   0.000   1.1253   3.0221
----------------------------------------------------------------

fitstat

Measures of Fit for mlogit of prog

Log-Lik Intercept Only:     -204.097     Log-Lik Full Model:         -175.468
D(192):                      350.936     LR(6):                        57.258
                                         Prob > LR:                     0.000
McFadden's R2:                 0.140     McFadden's Adj R2:             0.101
Maximum Likelihood R2:         0.249     Cragg & Uhler's R2:            0.286
Count R2:                      0.600     Adj Count R2:                  0.158
AIC:                           1.835     AIC*n:                       366.936
BIC:                        -666.341     BIC':                        -25.468

prchange

mlogit: Changes in Predicted Probabilities for prog

female
            Avg|Chg|     general    vocation    academic
    0->1   .02790104  -.02005389  -.02179766   .04185158

read
            Avg|Chg|     general    vocation    academic
Min->Max    .3214145  -.13366224  -.34845948   .48212177
   -+1/2   .00725731  -.00335795  -.00752802   .01088595
  -+sd/2    .0741224  -.03413837  -.07704523   .11118361
MargEfct    .0217727   -.0033582  -.00752815   .01088635

math
            Avg|Chg|     general    vocation    academic
Min->Max   .49183498   -.1889001  -.54885237   .73775247
   -+1/2   .01506265  -.00711139  -.01548259   .02259398
  -+sd/2   .13917912  -.06468265  -.14408605   .20876867
MargEfct   .04519518  -.00711382  -.01548377   .02259759

           general   vocation   academic
Pr(y|x)  .25057834  .20160662  .54781502

         female     read     math
    x=     .545    52.23   52.645
sd(x)=   .49922  10.2529  9.36845
The mlogtest command computes a variety of tests for multinomial logit models. The user selects the tests they want by specifying the appropriate options. For each independent variable, mlogtest can perform either a likelihood-ratio or Wald test of the null hypothesis that the coefficients of the variable equal zero across all equations. We will use mlogtest with the all option.
mlogtest, all

**** Likelihood-ratio tests for independent variables

 Ho: All coefficients associated with given variable(s) are 0.

        prog |       chi2   df   P>chi2
-------------+-------------------------
      female |      0.275    2    0.872
        read |      5.121    2    0.077
        math |     17.874    2    0.000
---------------------------------------

**** Wald tests for independent variables

 Ho: All coefficients associated with given variable(s) are 0.

        prog |       chi2   df   P>chi2
-------------+-------------------------
      female |      0.274    2    0.872
        read |      4.928    2    0.085
        math |     15.780    2    0.000
---------------------------------------

**** Hausman tests of IIA assumption

 Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.

 Omitted |      chi2   df   P>chi2   evidence
---------+------------------------------------
 general |   -21.017    4     ---    for Ho    
vocation |     0.630    4    0.960   for Ho    
----------------------------------------------
 Note: If chi2<0, the estimated model does not
 meet asymptotic assumptions of the test.

**** Small-Hsiao tests of IIA assumption

 Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.

 Omitted |  lnL(full)  lnL(omit)    chi2   df   P>chi2   evidence
---------+---------------------------------------------------------
 general |    -41.797    -39.076   5.442    4    0.245   for Ho    
vocation |    -46.130    -44.449   3.364    4    0.499   for Ho    
-------------------------------------------------------------------

**** Wald tests for combining outcome categories

 Ho: All coefficients except intercepts associated with given pair
     of outcomes are 0 (i.e., categories can be collapsed).

Categories tested |      chi2   df   P>chi2
------------------+------------------------
 general-vocation |     5.716    3    0.126
 general-academic |    17.157    3    0.001
vocation-academic |    35.665    3    0.000
-------------------------------------------

**** LR tests for combining outcome categories

 Ho: All coefficients except intercepts associated with given pair
     of outcomes are 0 (i.e., categories can be collapsed).

Categories tested |      chi2   df   P>chi2
------------------+------------------------
 general-vocation |     6.035    3    0.110
 general-academic |    20.262    3    0.000
vocation-academic |    52.127    3    0.000
------------------------------------------

Poisson and Negative Binomial Regression

To illustrate the utilities with count data we will run poisson regression with the lahigh dataset. The response variable will be daysabs, the number of days absent during the school year. The covariates will be gender (1=female, 2=male) and langnce, language normal curve equivalence score.
use http://www.ats.ucla.edu/stat/stata/notes/lahigh

poisson daysabs gender langnce

Iteration 0:   log likelihood = -1549.8567  
Iteration 1:   log likelihood = -1549.8567  

Poisson regression                                Number of obs   =        316
                                                  LR chi2(2)      =     171.50
                                                  Prob > chi2     =     0.0000
Log likelihood = -1549.8567                       Pseudo R2       =     0.0524

------------------------------------------------------------------------------
     daysabs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      gender |  -.4093528   .0482192    -8.49   0.000    -.5038606   -.3148449
     langnce |    -.01467   .0012934   -11.34   0.000    -.0172051   -.0121349
       _cons |   3.056329   .1005107    30.41   0.000     2.859332    3.253327
------------------------------------------------------------------------------

fitstat

Measures of Fit for poisson of daysabs

Log-Lik Intercept Only:    -1635.608     Log-Lik Full Model:        -1549.857
D(313):                     3099.713     LR(2):                       171.503
                                         Prob > LR:                     0.000
McFadden's R2:                 0.052     McFadden's Adj R2:             0.051
Maximum Likelihood R2:         0.419     Cragg & Uhler's R2:            0.419
AIC:                           9.828     AIC*n:                      3105.713
BIC:                        1298.166     BIC':                       -159.991

nbreg daysabs gender langnce

Fitting comparison Poisson model:

Iteration 0:   log likelihood = -1549.8567  
Iteration 1:   log likelihood = -1549.8567  

Fitting constant-only model:

Iteration 0:   log likelihood = -897.78991  
Iteration 1:   log likelihood = -891.24455  
Iteration 2:   log likelihood = -891.24271  
Iteration 3:   log likelihood = -891.24271  

Fitting full model:

Iteration 0:   log likelihood = -881.55269  
Iteration 1:   log likelihood =  -880.9306  
Iteration 2:   log likelihood =  -880.9274  
Iteration 3:   log likelihood =  -880.9274  

Negative binomial regression                      Number of obs   =        316
                                                  LR chi2(2)      =      20.63
                                                  Prob > chi2     =     0.0000
Log likelihood =  -880.9274                       Pseudo R2       =     0.0116

------------------------------------------------------------------------------
     daysabs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      gender |  -.4312069   .1396913    -3.09   0.002    -.7049968   -.1574169
     langnce |  -.0156493   .0039485    -3.96   0.000    -.0233882   -.0079104
       _cons |   3.134647   .3187701     9.83   0.000     2.509869    3.759425
-------------+----------------------------------------------------------------
    /lnalpha |     .25394    .095509                      .0667457    .4411342
-------------+----------------------------------------------------------------
       alpha |   1.289094   .1231201                      1.069024    1.554469
------------------------------------------------------------------------------
Likelihood ratio test of alpha=0:  chibar2(01) = 1337.86 Prob>=chibar2 = 0.000

fitstat

Measures of Fit for nbreg of daysabs

Log-Lik Intercept Only:     -891.243     Log-Lik Full Model:         -880.927
D(312):                     1761.855     LR(2):                        20.631
                                         Prob > LR:                     0.000
McFadden's R2:                 0.012     McFadden's Adj R2:             0.007
Maximum Likelihood R2:         0.063     Cragg & Uhler's R2:            0.063
AIC:                           5.601     AIC*n:                      1769.855
BIC:                         -33.937     BIC':                         -9.119
It is clear from the fitstat results and the likelihood ratio test that alpha=0 that the negative binomial regression model fits much better than the poisson regression model, so we will keep working the nbreg model.

Let's start with the listcoef and prchange commands.

listcoef

nbreg (N=316): Factor Change in Expected Count 

 Observed SD: 7.4490028

----------------------------------------------------------------------
     daysabs |      b         z     P>|z|    e^b    e^bStdX      SDofX
-------------+--------------------------------------------------------
      gender |  -0.43121   -3.087   0.002   0.6497   0.8058     0.5006
     langnce |  -0.01565   -3.963   0.000   0.9845   0.7552    17.9392
-------------+--------------------------------------------------------
    ln alpha |   0.25394
       alpha |   1.28909   SE(alpha) = 0.12312  
----------------------------------------------------------------------
 LR test of alpha=0: 1337.86  Prob>=LRX2 = 0.000
----------------------------------------------------------------------

prchange

nbreg: Changes in Predicted Rate for daysabs

         min->max      0->1     -+1/2    -+sd/2  MargEfct
 gender   -2.3892   -3.6772   -2.4022   -1.1957   -2.3837
langnce   -9.3413   -0.1879   -0.0865   -1.5570   -0.0865

exp(xb):   5.5280

         gender  langnce
    x=  1.48734  50.0638
sd(x)=  .500633  17.9392
Next, we will try the prcount command. prcounts computes the predicted rate and probabilities of counts from 0 through the specified maximum count based on the last estimates from the count models poisson, nbreg, zip, zinb. We will include the plot option so that we can plot the observed counts versus the counts predicted by the negative binomial regression.
prcounts nb, plot

summ nbrate-nbprgt

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
      nbrate |     316    5.826237   1.952521   2.060741   14.69753
       nbpr0 |     316    .2007168   .0438959   .0980942   .3657711
       nbpr1 |     316    .1346654   .0235343   .0722805   .2061429
       nbpr2 |     316    .1036036   .0138286   .0609582   .1329723
       nbpr3 |     316    .0832123   .0079426   .0535737   .0936423
       nbpr4 |     316     .068304   .0043128   .0480348    .072282
       nbpr5 |     316     .056829   .0025965   .0425366   .0588553
       nbpr6 |     316    .0477302   .0026664   .0297483   .0496365
       nbpr7 |     316    .0403748   .0033407   .0209201    .042915
       nbpr8 |     316    .0343472   .0039261   .0147727   .0377968
       nbpr9 |     316    .0293571   .0043162   .0104651   .0337698
       nbcu0 |     316    .2007168   .0438959   .0980942   .3657711
       nbcu1 |     316    .3353822   .0673746   .1703747    .571914
       nbcu2 |     316    .4389857   .0810372   .2313329   .7048863
       nbcu3 |     316    .5221981   .0886133   .2849067   .7942708
       nbcu4 |     316     .590502   .0921422   .3329415    .855569
       nbcu5 |     316     .647331   .0929128   .3765217   .8981056
       nbcu6 |     316    .6950612   .0917955   .4163698   .9278539
       nbcu7 |     316    .7354359   .0893983   .4530075    .948774
       nbcu8 |     316    .7697832   .0861544   .4868328   .9635468
       nbcu9 |     316    .7991402   .0823758   .5181618   .9740119
      nbprgt |     316    .2008598   .0823758   .0259881   .4818382


list daysabs nbrate in 1/30

       daysabs     nbrate
  1.         0   7.965481
  2.         4   4.220754
  3.         9   14.69753
  4.         7   4.852018
  5.         2   5.112275
  6.         1   7.821788
  7.         0    4.19794
  8.        18   9.475632
  9.         3   7.422179
 10.         3   6.660331
 11.         3   5.352007
 12.        23   6.884264
 13.         0   4.852018
 14.         3   5.112275
 15.        17    3.46986
 16.         4   5.634378
 17.         0   7.297377
 18.         1   5.466737
 19.         3   7.359331
 20.         3   8.193631
 21.         2   4.919608
 22.         0   3.171716
 23.         2   5.352007
 24.         0   6.120268
 25.         3     9.7476
 26.         3   6.715691
 27.        19   5.959813
 28.         8   8.193631
 29.        34   9.010157
 30.        13   7.683804
 
graph nbpreq nbobeq nbval, c(ll)
 
The graph of the observed and predicted counts backs up our contention that the negative binomial regression provides a good fit.

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California