### Stata Library Categorical and Count Data Analysis Utilities

#### Introduction

Interpreting the results of categorical (logistic, probit, ordered logistic, and multinomial logistic) and count data (poisson and negative binomial) regression analyses can be a difficult and demanding task. J. Scott Long (www.indiana.edu/~jsl650/) and Jeremy Freese (www.ssc.wisc.edu/~jfreese/) have written a marvelous collection of utilities to make the process easier and less daunting. You may also want to check out their book due out August 16, 2001 Long, J.S. & Freese, J (2001) Regression Models for Categorical Dependent Variables. College Station, TX: Stata Press.

#### Obtaining the Programs

We will be demonstrating the Stata 7 version of the commands.  You can download these programs from within Stata by typing findit spostado (see How can I used the findit command to search for programs and get additional help? for more information about using findit). You can then get more information by typing
help spostado

-------------------------------------------------------------------------------------
models for categorical and limited dependent variables.
-------------------------------------------------------------------------------------

Commands by J. Scott Long and Jeremy Freese. These commands will work with
the estimation commands: clogit, cloglog, cnreg, intreg, logit, mlogit,
nbreg, ologit, oprobit, poisson, probit, regress, tobit, zinb,
zip, and in most cases gologit. The commands are:

fitstat computes fit statistics following model estimation.

listcoef lists factor change, percent change coefficients, and
standardized coefficients. The help option provides details on
interpretation.

mlogtest computes a variety of tests useful for the mlogit model.

mlogview opens a dialog plot for constructing discrete change plots
and odds ratio plots for mlogit.

prchange computes the marginal and discrete change coefficients.

prcounts computes predicted rates and probabilities for count models.

prgen computes predicted values for a range of values. These values
can then be plotted.

prtab computes a table of predicted values for given values of the
independent variables.

prvalue computes predicted values for set values of the independent
variables.

-------------------------------------------------------------------------------

Authors: J. Scott Long - jslong@indiana.edu - www.indiana.edu/~jsl650
Jeremy Freese - jfreese@ssc.wisc.edu

#### Logistic Regression Examples

We will use a dataset called hsblog which was derived from the 1987 High School and Beyond Survey. The response variable is honcomp which indicates whether students werre in honors composition or not. We will use female, read (standardized reading test) and math (standardized math test) as covariates.
use http://www.ats.ucla.edu/stat/stata/webbooks/logistic/hsblog
(highschool and beyond (200 cases))

storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
female          float  %9.0g       fl
math            float  %9.0g                  math score
honcomp         float  %9.0g                  honors composition

Iteration 0:   log likelihood = -115.64441
Iteration 1:   log likelihood = -79.367198
Iteration 2:   log likelihood = -75.475342
Iteration 3:   log likelihood = -75.211846
Iteration 4:   log likelihood = -75.209827

Logit estimates                                   Number of obs   =        200
LR chi2(3)      =      80.87
Prob > chi2     =     0.0000
Log likelihood = -75.209827                       Pseudo R2       =     0.3496

------------------------------------------------------------------------------
honcomp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
female |   1.154801   .4340856     2.66   0.008      .304009    2.005593
read |   .0752424    .027577     2.73   0.006     .0211924    .1292924
math |   .1317117   .0324607     4.06   0.000       .06809    .1953335
_cons |  -13.12749   1.850769    -7.09   0.000    -16.75493    -9.50005
------------------------------------------------------------------------------

Logit estimates                                   Number of obs   =        200
LR chi2(3)      =      80.87
Prob > chi2     =     0.0000
Log likelihood = -75.209827                       Pseudo R2       =     0.3496

------------------------------------------------------------------------------
honcomp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
female |   3.173393   1.377524     2.66   0.008     1.355281    7.430502
read |   1.078145   .0297321     2.73   0.006     1.021419    1.138023
math |   1.140779   .0370305     4.06   0.000     1.070462    1.215716
------------------------------------------------------------------------------
These results are pretty straight forward but there are other ways of looking at the coefficients. Let's try the listcoef command with the const option.
listcoef, const

logistic (N=200): Factor Change in Odds

Odds of: 1 vs 0

----------------------------------------------------------------------
honcomp |      b         z     P>|z|    e^b    e^bStdX      SDofX
-------------+--------------------------------------------------------
female |   1.15480    2.660   0.008   3.1734   1.7798     0.4992
read |   0.07524    2.728   0.006   1.0781   2.1629    10.2529
math |   0.13171    4.058   0.000   1.1408   3.4347     9.3684
_cons | -13.12749   -7.093   0.000
----------------------------------------------------------------------
The column labeled 'b' (above) are the logistic regression coefficients. The column 'z' is the Wald z-test that the coefficient is different from zero and the next column (P>|z|) gives the p-value for the z-test. The column 'e^b' are e raised to the logistic coefficient power (exponentiated coefficients) which are the odds ratios. Next comes 'e^bStdX' which gives the odds ratios for a one standard deviation change in the covariate. Finally, there is a column giving the standard deviation of the covariate (SDofX).

Let's try listcoef again, this time with the std option.

listcoef, const std

logistic (N=200): Unstandardized and Standardized Estimates

Observed SD: .4424407
Latent SD: 2.1498873

Odds of: 1 vs 0

-------------------------------------------------------------------------------
honcomp |      b         z     P>|z|    bStdX    bStdY   bStdXY      SDofX
-------------+-----------------------------------------------------------------
female |   1.15480    2.660   0.008   0.5765   0.5371   0.2682     0.4992
read |   0.07524    2.728   0.006   0.7715   0.0350   0.3588    10.2529
math |   0.13171    4.058   0.000   1.2339   0.0613   0.5740     9.3684
_cons | -13.12749   -7.093   0.000
-------------------------------------------------------------------------------
The columns 'b', 'z' and 'P>|z|' are the same as before. The next column, 'bStdX', gives the amount of change in the log odds for a one standard deviation change in the covariate. The column 'bStdY' gives the amount of change, in standard deviations, in the log odds for a one unit change in the covariate. The next column, 'bStdXY', gives the amount of change, in standard deviations, in the log odds for a one standard deviation change in the covariate. Finally, there is a column giving the standard deviation of the covariate (SDofX).

The fitstat command presents a number of indices for model fit.

fitstat

Measures of Fit for logit of honcomp

Log-Lik Intercept Only:     -115.644     Log-Lik Full Model:          -75.210
D(196):                      150.420     LR(3):                        80.869
Prob > LR:                     0.000
Maximum Likelihood R2:         0.333     Cragg & Uhler's R2:            0.485
McKelvey and Zavoina's R2:     0.524     Efron's R2:                    0.379
Variance of y*:                6.912     Variance of error:             3.290
Count R2:                      0.830     Adj Count R2:                  0.358
AIC:                           0.792     AIC*n:                       158.420
BIC:                        -888.051     BIC':                        -64.974
This table contains the log-likelihood for a model with the intercept only (-115.644) and for the full model (-75.210). These measures are followed by the deviance (150.420) with degrees of freedom. Deviance in logistic regression is analogous to the sum of squared residuals in OLS regression. The second line also contains the likelihood ratio chi-square (80.869) with degrees of freedom (3), which tests whether the model with three covariates predicts better than a model with intercept only. The p-value for this chi-square (0.000) is on the next line.

Next comes a series of pseudo-R2 measures. Unlike OLS regression, logistic regression does not have a single measure that reflects proportion of the variance accounted for. The different R2 measures each have their own interpretation. The item labeled 'Variance of y*' is the variance of the underlying latent dependent variable while 'Variance or error' for logistic regression is p2/3 (which is approximately 3.29).

Finally, there are several information criterion measures including Akaike's Information Criterion (AIC & AIC*n) and Bayesian Information Criterion (BIC & BIC'). These are useful in comparing models with the same response variable but different covariates.

Now, let's move on to another program prchange. prchange computes discrete and marginal change for categorical and count regression models.

prchange

logit: Changes in Predicted Probabilities for honcomp

min->max      0->1     -+1/2    -+sd/2  MargEfct
female    0.1539    0.1539    0.1592    0.0789    0.1577
read    0.5079    0.0003    0.0103    0.1058    0.0103
math    0.7731    0.0000    0.0180    0.1703    0.0180

0       1
Pr(y|x)  0.8368  0.1632

x=     .545    52.23   52.645
sd(x)=   .49922  10.2529  9.36845
For each covariate, prchange computes the min/max change, a zero/one change, a one unit change, a one standard deviation change, and the marginal effect (instantaneous change) in the probability.

With the fromto option, prchange displays the beginning and ending probabilities in addition to the change in probability.

prchange, fromto

logit: Changes in Predicted Probabilities for honcomp

from:       to:      dif:     from:       to:      dif:     from:
x=min     x=max  min->max       x=0       x=1      0->1     x-1/2
female    0.0942    0.2481    0.1539    0.0942    0.2481    0.1539    0.0987
read    0.0305    0.5385    0.5079    0.0038    0.0041    0.0003    0.1582
math    0.0145    0.7875    0.7731    0.0002    0.0002    0.0000    0.1545

to:      dif:     from:       to:      dif:
x+1/2     -+1/2   x-1/2sd   x+1/2sd    -+sd/2  MargEfct
female    0.2579    0.1592    0.1276    0.2065    0.0789    0.1577
read    0.1685    0.0103    0.1171    0.2230    0.1058    0.0103
math    0.1724    0.0180    0.0952    0.2656    0.1703    0.0180

0       1
Pr(y|x)  0.8368  0.1632

x=     .545    52.23   52.645
sd(x)=   .49922  10.2529  9.36845
A great feature of prchange is the ability to display results for specific values of the covariates. Let's look at the prchange for females with reading scores of 60.
prchange, x(female=1 read=60)

logit: Changes in Predicted Probabilities for honcomp

min->max      0->1     -+1/2    -+sd/2  MargEfct
female    0.2146    0.2146    0.2639    0.1339    0.2697
read    0.6131    0.0005    0.0176    0.1784    0.0176
math    0.8757    0.0001    0.0308    0.2811    0.0308

0       1
Pr(y|x)  0.6281  0.3719

x=        1       60   52.645
sd(x)=   .49922  10.2529  9.36845
The prtab command is used to display the predicted probabilities for different values of the covariates. Let's look at female first.
prtab female

logit: Predicted probabilities of positive outcome for honcomp

----------------------
female | Prediction
----------+-----------
male |     0.0942
female |     0.2481
----------------------

x=    .545   52.23  52.645
The table gives the probability of males and females being in honcomp 1 with the other variables (read and math) held at their mean.

Next we will try the same thing for the variable read while holding female and math at their mean.

prtab read

logit: Predicted probabilities of positive outcome for honcomp

----------------------
score     | Prediction
----------+-----------
28 |     0.0305
31 |     0.0380
34 |     0.0472
35 |     0.0507
36 |     0.0544
37 |     0.0584
39 |     0.0673
41 |     0.0773
42 |     0.0829
43 |     0.0888
44 |     0.0950
45 |     0.1017
46 |     0.1088
47 |     0.1163
48 |     0.1243
50 |     0.1416
52 |     0.1609
53 |     0.1713
54 |     0.1823
55 |     0.1937
57 |     0.2183
60 |     0.2593
61 |     0.2740
63 |     0.3049
65 |     0.3377
66 |     0.3548
68 |     0.3899
71 |     0.4447
73 |     0.4821
76 |     0.5385
----------------------

x=    .545   52.23  52.645
Now, let's combine female and reading into a single table.
prtab read female

logit: Predicted probabilities of positive outcome for honcomp

--------------------------
score     |   male  female
----------+---------------
28 | 0.0165  0.0506
31 | 0.0206  0.0626
34 | 0.0257  0.0772
35 | 0.0277  0.0828
36 | 0.0297  0.0887
37 | 0.0320  0.0949
39 | 0.0370  0.1087
41 | 0.0428  0.1241
42 | 0.0459  0.1326
43 | 0.0494  0.1414
44 | 0.0530  0.1508
45 | 0.0569  0.1607
46 | 0.0611  0.1711
47 | 0.0656  0.1821
48 | 0.0703  0.1936
50 | 0.0808  0.2181
52 | 0.0927  0.2449
53 | 0.0992  0.2591
54 | 0.1062  0.2738
55 | 0.1135  0.2890
57 | 0.1296  0.3208
60 | 0.1572  0.3719
61 | 0.1675  0.3896
63 | 0.1895  0.4259
65 | 0.2137  0.4631
66 | 0.2266  0.4818
68 | 0.2541  0.5194
71 | 0.2992  0.5753
73 | 0.3316  0.6116
76 | 0.3834  0.6637
--------------------------

x=    .545   52.23  52.645
If we wish to add the predicted probabilites to our data we can use the prgen command. Let's generate the predicted probabilities for each of the values of reading while holding the other variables at their mean.
prgen read, gen(nr) from(28) to(76) ncases(49)

logit: Predicted values as read varies from 28 to 76.

x=    .545   52.23  52.645

list nrx nrp0 nrp1 in 1/50

nrx       nrp0       nrp1
1.        28   .9694503   .0305497
2.        29   .9671414   .0328586
3.        30   .9646644   .0353356
4.        31    .962008    .037992
5.        32   .9591603   .0408397
6.        33   .9561089    .043891
7.        34   .9528408   .0471592
8.        35   .9493423   .0506578
9.        36   .9455989   .0544011
10.        37    .941596    .058404
11.        38   .9373181   .0626819
12.        39   .9327492   .0672508
13.        40   .9278729   .0721271
14.        41   .9226723   .0773277
15.        42   .9171303   .0828697
16.        43   .9112293   .0887707
17.        44   .9049516   .0950484
18.        45   .8982795   .1017204
19.        46   .8911955   .1088045
20.        47   .8836819   .1163181
21.        48   .8757218   .1242782
22.        49   .8672988   .1327012
23.        50   .8583972   .1416027
24.        51   .8490025   .1509975
25.        52   .8391013   .1608987
26.        53   .8286819   .1713181
27.        54   .8177343   .1822657
28.        55   .8062507   .1937493
29.        56   .7942256   .2057744
30.        57   .7816563   .2183437
31.        58    .768543    .231457
32.        59   .7548891   .2451109
33.        60   .7407016   .2592985
34.        61   .7259908   .2740092
35.        62   .7107713   .2892287
36.        63   .6950616   .3049384
37.        64   .6788841   .3211159
38.        65   .6622654   .3377346
39.        66   .6452361   .3547639
40.        67   .6278306   .3721694
41.        68   .6100873   .3899128
42.        69   .5920476   .4079524
43.        70   .5737565   .4262435
44.        71   .5552613   .4447387
45.        72   .5366117   .4633883
46.        73   .5178592   .4821408
47.        74   .4990562   .5009438
48.        75   .4802559   .5197441
49.        76   .4615113   .5384887
50.         .          .          .
The column 'nrx' gives the values of read, 'nrp0' the probability of being in honcomp 0, and 'nrp1' the probability of being in honcomp 1. Note that the column 'nrp1' is the same as the results from prtab read (above), just with more decimal places.

The prvalue command is used to compute the conditional probability being in each of the levels of response variable. In this first example, we will hold all of the covariates at their mean value.

prvalue

logit: Predictions for honcomp

Pr(y=1|x):          0.1632   95% ci: (0.1068,0.2415)
Pr(y=0|x):          0.8368   95% ci: (0.7585,0.8932)

x=    .545   52.23  52.645
Next, let's look at the predicted values for males and females while holding reading and math at their means.
prvalue, x(female=0)

logit: Predictions for honcomp

Pr(y=1|x):          0.0942   95% ci: (0.0468,0.1805)
Pr(y=0|x):          0.9058   95% ci: (0.8195,0.9532)

x=       0   52.23  52.645

prvalue, x(female=1)

logit: Predictions for honcomp

Pr(y=1|x):          0.2481   95% ci: (0.1604,0.3630)
Pr(y=0|x):          0.7519   95% ci: (0.6370,0.8396)

x=       1   52.23  52.645
Now, let's look at the predicted values for males and females when all of the other variables are held are their maximum values.
prvalue, x(female=0) rest(max)

logit: Predictions for honcomp

Pr(y=1|x):          0.9220   95% ci: (0.7812,0.9751)
Pr(y=0|x):          0.0780   95% ci: (0.0249,0.2188)

x=       0      76      75

prvalue, x(female=1) rest(max)

logit: Predictions for honcomp

Pr(y=1|x):          0.9740   95% ci: (0.9097,0.9929)
Pr(y=0|x):          0.0260   95% ci: (0.0071,0.0903)

x=       1      76      75

#### Multinomial Logistic Regression

Now let's tray a multinomial logistic regression using prog as the response variable. prog has three levels; general, academic and vocational. We will continue to use female, read, and math as covariates even though female is not significant in any of the equations.
mlogit prog female read math

Iteration 0:   log likelihood = -204.09667
Iteration 1:   log likelihood = -177.18429
Iteration 2:   log likelihood = -175.51578
Iteration 3:   log likelihood = -175.46792
Iteration 4:   log likelihood = -175.46786

Multinomial regression                            Number of obs   =        200
LR chi2(6)      =      57.26
Prob > chi2     =     0.0000
Log likelihood = -175.46786                       Pseudo R2       =     0.1403

------------------------------------------------------------------------------
prog |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
general      |
female |  -.1565038   .3793112    -0.41   0.680    -.8999401    .5869325
read |  -.0332741   .0247308    -1.35   0.178    -.0817457    .0151975
math |    -.06964   .0280339    -2.48   0.013    -.1245854   -.0146946
_cons |   4.707234   1.332392     3.53   0.000     2.095794    7.318675
-------------+----------------------------------------------------------------
vocation     |
female |  -.1843515   .3973611    -0.46   0.643     -.963165    .5944619
read |  -.0572131   .0265751    -2.15   0.031    -.1092994   -.0051268
math |  -.1180523   .0309737    -3.81   0.000    -.1787597   -.0573449
_cons |   8.303956   1.491516     5.57   0.000     5.380637    11.22727
------------------------------------------------------------------------------
(Outcome prog==academic is the comparison group)
Next, we will use listcoef, fitstat, and prchange on the mlogit.
listcoef

mlogit (N=200): Factor Change in the Odds of prog

Variable: female (sd=  .49922)

Odds comparing|
Group 1 vs Group 2|      b         z     P>|z|     e^b   e^bStdX
------------------+---------------------------------------------
general -vocation |   0.02785    0.066   0.947   1.0282   1.0140
general -academic |  -0.15650   -0.413   0.680   0.8551   0.9248
vocation-general  |  -0.02785   -0.066   0.947   0.9725   0.9862
vocation-academic |  -0.18435   -0.464   0.643   0.8316   0.9121
academic-general  |   0.15650    0.413   0.680   1.1694   1.0813
academic-vocation |   0.18435    0.464   0.643   1.2024   1.0964
----------------------------------------------------------------

Odds comparing|
Group 1 vs Group 2|      b         z     P>|z|     e^b   e^bStdX
------------------+---------------------------------------------
general -vocation |   0.02394    0.846   0.397   1.0242   1.2782
general -academic |  -0.03327   -1.345   0.178   0.9673   0.7109
vocation-general  |  -0.02394   -0.846   0.397   0.9763   0.7824
vocation-academic |  -0.05721   -2.153   0.031   0.9444   0.5562
academic-general  |   0.03327    1.345   0.178   1.0338   1.4066
academic-vocation |   0.05721    2.153   0.031   1.0589   1.7979
----------------------------------------------------------------

Variable: math (sd= 9.36845)

Odds comparing|
Group 1 vs Group 2|      b         z     P>|z|     e^b   e^bStdX
------------------+---------------------------------------------
general -vocation |   0.04841    1.470   0.142   1.0496   1.5739
general -academic |  -0.06964   -2.484   0.013   0.9327   0.5208
vocation-general  |  -0.04841   -1.470   0.142   0.9527   0.6354
vocation-academic |  -0.11805   -3.811   0.000   0.8886   0.3309
academic-general  |   0.06964    2.484   0.013   1.0721   1.9202
academic-vocation |   0.11805    3.811   0.000   1.1253   3.0221
----------------------------------------------------------------

fitstat

Measures of Fit for mlogit of prog

Log-Lik Intercept Only:     -204.097     Log-Lik Full Model:         -175.468
D(192):                      350.936     LR(6):                        57.258
Prob > LR:                     0.000
Maximum Likelihood R2:         0.249     Cragg & Uhler's R2:            0.286
Count R2:                      0.600     Adj Count R2:                  0.158
AIC:                           1.835     AIC*n:                       366.936
BIC:                        -666.341     BIC':                        -25.468

prchange

mlogit: Changes in Predicted Probabilities for prog

female
0->1   .02790104  -.02005389  -.02179766   .04185158

Min->Max    .3214145  -.13366224  -.34845948   .48212177
-+1/2   .00725731  -.00335795  -.00752802   .01088595
-+sd/2    .0741224  -.03413837  -.07704523   .11118361
MargEfct    .0217727   -.0033582  -.00752815   .01088635

math
Min->Max   .49183498   -.1889001  -.54885237   .73775247
-+1/2   .01506265  -.00711139  -.01548259   .02259398
-+sd/2   .13917912  -.06468265  -.14408605   .20876867
MargEfct   .04519518  -.00711382  -.01548377   .02259759

Pr(y|x)  .25057834  .20160662  .54781502

x=     .545    52.23   52.645
sd(x)=   .49922  10.2529  9.36845
The mlogtest command computes a variety of tests for multinomial logit models. The user selects the tests they want by specifying the appropriate options. For each independent variable, mlogtest can perform either a likelihood-ratio or Wald test of the null hypothesis that the coefficients of the variable equal zero across all equations. We will use mlogtest with the all option.
mlogtest, all

**** Likelihood-ratio tests for independent variables

Ho: All coefficients associated with given variable(s) are 0.

prog |       chi2   df   P>chi2
-------------+-------------------------
female |      0.275    2    0.872
math |     17.874    2    0.000
---------------------------------------

**** Wald tests for independent variables

Ho: All coefficients associated with given variable(s) are 0.

prog |       chi2   df   P>chi2
-------------+-------------------------
female |      0.274    2    0.872
math |     15.780    2    0.000
---------------------------------------

**** Hausman tests of IIA assumption

Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.

Omitted |      chi2   df   P>chi2   evidence
---------+------------------------------------
general |   -21.017    4     ---    for Ho
vocation |     0.630    4    0.960   for Ho
----------------------------------------------
Note: If chi2<0, the estimated model does not
meet asymptotic assumptions of the test.

**** Small-Hsiao tests of IIA assumption

Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.

Omitted |  lnL(full)  lnL(omit)    chi2   df   P>chi2   evidence
---------+---------------------------------------------------------
general |    -41.797    -39.076   5.442    4    0.245   for Ho
vocation |    -46.130    -44.449   3.364    4    0.499   for Ho
-------------------------------------------------------------------

**** Wald tests for combining outcome categories

Ho: All coefficients except intercepts associated with given pair
of outcomes are 0 (i.e., categories can be collapsed).

Categories tested |      chi2   df   P>chi2
------------------+------------------------
general-vocation |     5.716    3    0.126
-------------------------------------------

**** LR tests for combining outcome categories

Ho: All coefficients except intercepts associated with given pair
of outcomes are 0 (i.e., categories can be collapsed).

Categories tested |      chi2   df   P>chi2
------------------+------------------------
general-vocation |     6.035    3    0.110
------------------------------------------

#### Poisson and Negative Binomial Regression

To illustrate the utilities with count data we will run poisson regression with the lahigh dataset. The response variable will be daysabs, the number of days absent during the school year. The covariates will be gender (1=female, 2=male) and langnce, language normal curve equivalence score.
use http://www.ats.ucla.edu/stat/stata/notes/lahigh

poisson daysabs gender langnce

Iteration 0:   log likelihood = -1549.8567
Iteration 1:   log likelihood = -1549.8567

Poisson regression                                Number of obs   =        316
LR chi2(2)      =     171.50
Prob > chi2     =     0.0000
Log likelihood = -1549.8567                       Pseudo R2       =     0.0524

------------------------------------------------------------------------------
daysabs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
gender |  -.4093528   .0482192    -8.49   0.000    -.5038606   -.3148449
langnce |    -.01467   .0012934   -11.34   0.000    -.0172051   -.0121349
_cons |   3.056329   .1005107    30.41   0.000     2.859332    3.253327
------------------------------------------------------------------------------

fitstat

Measures of Fit for poisson of daysabs

Log-Lik Intercept Only:    -1635.608     Log-Lik Full Model:        -1549.857
D(313):                     3099.713     LR(2):                       171.503
Prob > LR:                     0.000
Maximum Likelihood R2:         0.419     Cragg & Uhler's R2:            0.419
AIC:                           9.828     AIC*n:                      3105.713
BIC:                        1298.166     BIC':                       -159.991

nbreg daysabs gender langnce

Fitting comparison Poisson model:

Iteration 0:   log likelihood = -1549.8567
Iteration 1:   log likelihood = -1549.8567

Fitting constant-only model:

Iteration 0:   log likelihood = -897.78991
Iteration 1:   log likelihood = -891.24455
Iteration 2:   log likelihood = -891.24271
Iteration 3:   log likelihood = -891.24271

Fitting full model:

Iteration 0:   log likelihood = -881.55269
Iteration 1:   log likelihood =  -880.9306
Iteration 2:   log likelihood =  -880.9274
Iteration 3:   log likelihood =  -880.9274

Negative binomial regression                      Number of obs   =        316
LR chi2(2)      =      20.63
Prob > chi2     =     0.0000
Log likelihood =  -880.9274                       Pseudo R2       =     0.0116

------------------------------------------------------------------------------
daysabs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
gender |  -.4312069   .1396913    -3.09   0.002    -.7049968   -.1574169
langnce |  -.0156493   .0039485    -3.96   0.000    -.0233882   -.0079104
_cons |   3.134647   .3187701     9.83   0.000     2.509869    3.759425
-------------+----------------------------------------------------------------
/lnalpha |     .25394    .095509                      .0667457    .4411342
-------------+----------------------------------------------------------------
alpha |   1.289094   .1231201                      1.069024    1.554469
------------------------------------------------------------------------------
Likelihood ratio test of alpha=0:  chibar2(01) = 1337.86 Prob>=chibar2 = 0.000

fitstat

Measures of Fit for nbreg of daysabs

Log-Lik Intercept Only:     -891.243     Log-Lik Full Model:         -880.927
D(312):                     1761.855     LR(2):                        20.631
Prob > LR:                     0.000
Maximum Likelihood R2:         0.063     Cragg & Uhler's R2:            0.063
AIC:                           5.601     AIC*n:                      1769.855
BIC:                         -33.937     BIC':                         -9.119
It is clear from the fitstat results and the likelihood ratio test that alpha=0 that the negative binomial regression model fits much better than the poisson regression model, so we will keep working the nbreg model.

listcoef

nbreg (N=316): Factor Change in Expected Count

Observed SD: 7.4490028

----------------------------------------------------------------------
daysabs |      b         z     P>|z|    e^b    e^bStdX      SDofX
-------------+--------------------------------------------------------
gender |  -0.43121   -3.087   0.002   0.6497   0.8058     0.5006
langnce |  -0.01565   -3.963   0.000   0.9845   0.7552    17.9392
-------------+--------------------------------------------------------
ln alpha |   0.25394
alpha |   1.28909   SE(alpha) = 0.12312
----------------------------------------------------------------------
LR test of alpha=0: 1337.86  Prob>=LRX2 = 0.000
----------------------------------------------------------------------

prchange

nbreg: Changes in Predicted Rate for daysabs

min->max      0->1     -+1/2    -+sd/2  MargEfct
gender   -2.3892   -3.6772   -2.4022   -1.1957   -2.3837
langnce   -9.3413   -0.1879   -0.0865   -1.5570   -0.0865

exp(xb):   5.5280

gender  langnce
x=  1.48734  50.0638
sd(x)=  .500633  17.9392
Next, we will try the prcount command. prcounts computes the predicted rate and probabilities of counts from 0 through the specified maximum count based on the last estimates from the count models poisson, nbreg, zip, zinb. We will include the plot option so that we can plot the observed counts versus the counts predicted by the negative binomial regression.
prcounts nb, plot

summ nbrate-nbprgt

Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
nbrate |     316    5.826237   1.952521   2.060741   14.69753
nbpr0 |     316    .2007168   .0438959   .0980942   .3657711
nbpr1 |     316    .1346654   .0235343   .0722805   .2061429
nbpr2 |     316    .1036036   .0138286   .0609582   .1329723
nbpr3 |     316    .0832123   .0079426   .0535737   .0936423
nbpr4 |     316     .068304   .0043128   .0480348    .072282
nbpr5 |     316     .056829   .0025965   .0425366   .0588553
nbpr6 |     316    .0477302   .0026664   .0297483   .0496365
nbpr7 |     316    .0403748   .0033407   .0209201    .042915
nbpr8 |     316    .0343472   .0039261   .0147727   .0377968
nbpr9 |     316    .0293571   .0043162   .0104651   .0337698
nbcu0 |     316    .2007168   .0438959   .0980942   .3657711
nbcu1 |     316    .3353822   .0673746   .1703747    .571914
nbcu2 |     316    .4389857   .0810372   .2313329   .7048863
nbcu3 |     316    .5221981   .0886133   .2849067   .7942708
nbcu4 |     316     .590502   .0921422   .3329415    .855569
nbcu5 |     316     .647331   .0929128   .3765217   .8981056
nbcu6 |     316    .6950612   .0917955   .4163698   .9278539
nbcu7 |     316    .7354359   .0893983   .4530075    .948774
nbcu8 |     316    .7697832   .0861544   .4868328   .9635468
nbcu9 |     316    .7991402   .0823758   .5181618   .9740119
nbprgt |     316    .2008598   .0823758   .0259881   .4818382

list daysabs nbrate in 1/30

daysabs     nbrate
1.         0   7.965481
2.         4   4.220754
3.         9   14.69753
4.         7   4.852018
5.         2   5.112275
6.         1   7.821788
7.         0    4.19794
8.        18   9.475632
9.         3   7.422179
10.         3   6.660331
11.         3   5.352007
12.        23   6.884264
13.         0   4.852018
14.         3   5.112275
15.        17    3.46986
16.         4   5.634378
17.         0   7.297377
18.         1   5.466737
19.         3   7.359331
20.         3   8.193631
21.         2   4.919608
22.         0   3.171716
23.         2   5.352007
24.         0   6.120268
25.         3     9.7476
26.         3   6.715691
27.        19   5.959813
28.         8   8.193631
29.        34   9.010157
30.        13   7.683804

twoway scatter nbpreq nbobeq nbval, c(l l)


The graph of the observed and predicted counts backs up our contention that the negative binomial regression provides a good fit.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.