UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Textbook Examples
An Introduction to Categorical Analysis by Alan Agresti
Chapter 7: Building and Applying Logit and Log-linear Models

This chapter makes extensive use of the xi3 and fitstat program, which is not part of base Stata. Prior to using either the xi3 or fitstat command, they need to be downloaded by typing either findit xi3 or findit fitstat in the command line (see How can I use the findit command to search for programs and get additional help? for more information about using findit).

Table 7.2 on page 178 using data on usage of alcohol, cigarettes and marijuana.
use http://www.ats.ucla.edu/stat/stata/examples/icda/drug, clear

*MODEL 1
quietly xi3: glm count i.gender*i.race i.alcohol i.smoke i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 1325.1408  df=25

*MODEL 2
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol i.gender*i.smoke ///
         i.gender*i.marij i.race*i.alcohol i.race*i.smoke i.race*i.marij ///
         i.alcohol*i.smoke i.alcohol*i.marij i.smoke*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 15.340343  df=16

*MODEL 3
quietly xi3: glm count i.gender*i.race*i.alcohol i.gender*i.race*i.smoke ///
          i.gender*i.race*i.marij i.gender*i.alcohol*i.smoke ///
          i.gender*i.alcohol*i.marij i.gender*i.smoke*i.marij ///
          i.race*i.alcohol*i.smoke i.race*i.alcohol*i.marij ///
          */ i.race*i.smoke*i.marij i.alcohol*i.smoke*i.marij,  fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 5.2720008  df=6
 
*MODEL 4a
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol i.gender*i.smoke ///
         i.gender*i.marij i.race*i.alcohol i.race*i.smoke i.race*i.marij ///
         i.alcohol*i.marij i.smoke*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 201.19931  df=17
 
*MODEL 4b
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol i.gender*i.smoke ///
         i.gender*i.marij i.race*i.alcohol i.race*i.smoke i.race*i.marij ///
         i.alcohol*i.smoke i.smoke*i.marij, fam(poi)
di "deviance = " e(deviance) "  df=" e(df)
deviance = 106.958  df=17

*MODEL 4c
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol i.gender*i.smoke ///
         i.gender*i.marij i.race*i.alcohol i.race*i.smoke i.race*i.marij ///
         i.alcohol*i.smoke i.alcohol*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 513.47218  df=17

*MODEL 4d
quietly xi3: glm count i.gender*i.race i.gender*i.smoke ///
         i.gender*i.marij i.race*i.alcohol i.race*i.smoke i.race*i.marij ///
         i.alcohol*i.smoke i.alcohol*i.marij i.smoke*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 18.716951  df=17

*MODEL 4e
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol i.gender*i.smoke ///
         i.gender*i.marij  i.race*i.smoke i.race*i.marij ///
         i.alcohol*i.smoke i.alcohol*i.marij i.smoke*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 20.320861  df=17

*MODEL 4f
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol ///
         i.gender*i.marij i.race*i.alcohol i.race*i.smoke i.race*i.marij ///
         i.alcohol*i.smoke i.alcohol*i.marij i.smoke*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 16.317184  df=17

*MODEL 4g
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol i.gender*i.smoke ///
         i.gender*i.marij i.race*i.alcohol i.race*i.marij ///
         i.alcohol*i.smoke i.alcohol*i.marij i.smoke*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 15.783467  df=17

*MODEL 4h 
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol i.gender*i.smoke ///
         i.race*i.alcohol i.race*i.smoke i.race*i.marij ///
         i.alcohol*i.smoke i.alcohol*i.marij i.smoke*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 25.161008  df=17

*MODEL 4i
quietly xi3: glm count i.gender*i.race i.gender*i.alcohol i.gender*i.smoke ///
         i.gender*i.marij i.race*i.alcohol i.race*i.smoke ///
         i.alcohol*i.smoke i.alcohol*i.marij i.smoke*i.marij, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 18.928935  df=17

*MODEL 5
quietly xi3: glm count i.alcohol*i.smoke i.alcohol*i.marij ///
         i.smoke*i.marij i.alcohol*i.gender i.alcohol*i.race ///
         i.gender*i.marij i.gender*i.race i.marij*i.race, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 16.73504  df=18

*MODEL 6
quietly xi3: glm count i.alcohol*i.smoke i.alcohol*i.marij ///
         i.smoke*i.marij i.alcohol*i.gender i.alcohol*i.race ///
         i.gender*i.marij i.gender*i.race, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 19.908587  df=19

*MODEL 7
qui xi3: glm count i.alcohol*i.smoke i.alcohol*i.marij ///
         i.smoke*i.marij i.alcohol*i.gender i.alcohol*i.race ///
         i.gender*i.race, fam(poi) 
di "deviance = " e(deviance) "  df=" e(df)
deviance = 28.80508  df=20
Table 7.3 on page 181.
use http://www.ats.ucla.edu/stat/stata/examples/icda/sex, clear

xi: glm count i.premar i.birth, fam(poi)

i.premar          _Ipremar_1-4        (naturally coded; _Ipremar_1 omitted)
i.birth           _Ibirth_1-4         (naturally coded; _Ibirth_1 omitted)
Generalized linear models                          No. of obs      =        16
Optimization     : ML: Newton-Raphson              Residual df     =         9
                                                   Scale parameter =         1
Deviance         =  127.6529373                    (1/df) Deviance =  14.18366
Pearson          =  128.6835978                    (1/df) Pearson  =  14.29818
Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM
Log likelihood   =  -109.164565                    AIC             =  14.52057
BIC              =  102.6996388
------------------------------------------------------------------------------
       count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _Ipremar_2 |  -.9767888   .1216605    -8.03   0.000    -1.215239   -.7383387
  _Ipremar_3 |  -.3446024   .0988072    -3.49   0.000     -.538261   -.1509438
  _Ipremar_4 |   .5092049   .0805088     6.32   0.000     .3514105    .6669993
   _Ibirth_2 |   .1885912   .1072271     1.76   0.079      -.02157    .3987523
   _Ibirth_3 |   .7118393   .0968283     7.35   0.000     .5220592    .9016194
   _Ibirth_4 |   .4565487   .1013576     4.50   0.000     .2578914    .6552061
       _cons |   3.747418   .0962184    38.95   0.000     3.558834    3.936003
------------------------------------------------------------------------------

predict pred
(option mu assumed; predicted mean count)

predict resid, p 
predict h, h 
gen aresid = resid/sqrt(1-h) 
drop h
table premar birth, cont(mean count mean pred mean aresid)

------------------------------------------------------
          |                   birth                   
   premar |         1          2          3          4
----------+-------------------------------------------
        1 |        81         68         60         38
          |  42.41145   51.21382   86.42332    66.9514
          |  7.603182   3.076708  -4.116706   -4.83965
          | 
        2 |        24         26         29         14
          |  15.96868   19.28294   32.53996   25.20842
          |  2.328322   1.811479  -.8114836  -2.756819
          | 
        3 |        18         41         74         42
          |   30.0486    36.2851    61.2311    47.4352
          | -2.681746   .9762287   2.247295  -1.026372
          | 
        4 |        36         57        161        157
          |  70.57127   85.21814   143.8056    111.405
          | -6.063329  -4.603856   2.384558   6.784545
------------------------------------------------------
Table 7.4 on page 184.
gen asso = premar*birth
xi: glm count i.premar i.birth asso, fam(poi)

i.premar          _Ipremar_1-4        (naturally coded; _Ipremar_1 omitted)
i.birth           _Ibirth_1-4         (naturally coded; _Ibirth_1 omitted)
Generalized linear models                          No. of obs      =        16
Optimization     : ML: Newton-Raphson              Residual df     =         8
                                                   Scale parameter =         1
Deviance         =  11.53369221                    (1/df) Deviance =  1.441712
Pearson          =   11.5084629                    (1/df) Pearson  =  1.438558
Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM
Log likelihood   =  -51.1049425                    AIC             =  7.388118
BIC              = -10.64701757
------------------------------------------------------------------------------
       count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _Ipremar_2 |  -1.645964   .1347333   -12.22   0.000    -1.910037   -1.381892
  _Ipremar_3 |  -1.770023   .1646445   -10.75   0.000     -2.09272   -1.447326
  _Ipremar_4 |  -1.753685   .2343172    -7.48   0.000    -2.212938   -1.294432
   _Ibirth_2 |   -.464105   .1195237    -3.88   0.000    -.6983672   -.2298428
   _Ibirth_3 |  -.7245224   .1620067    -4.47   0.000     -1.04205    -.406995
   _Ibirth_4 |  -1.879664   .2491019    -7.55   0.000    -2.367895   -1.391433
        asso |   .2858355    .028238    10.12   0.000     .2304901    .3411809
       _cons |   4.106841   .0895105    45.88   0.000     3.931404    4.282279
------------------------------------------------------------------------------

predict pred
(option mu assumed; predicted mean count)

table  premar birth, cont(mean count mean pred)

--------------------------------------------------
          |                 birth                 
   premar |        1         2         3         4
----------+---------------------------------------
        1 |       81        68        60        38
          | 80.85658  67.65406  69.39574  29.09363
          | 
        2 |       24        26        29        14
          | 20.75004   23.1065   31.5435  17.59996
          | 
        3 |       18        41        74        42
          |  24.3937  36.15178  65.68137  48.77315
          | 
        4 |       36        57       161       157
          | 32.99969  65.08765  157.3794  155.5333
--------------------------------------------------
Table 7.5 and calculation on page 186.
use http://www.ats.ucla.edu/stat/stata/examples/icda/cmh, clear
The command tab3way can be downloaded from the internet by typing findit tab3way in the command line (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
tab3way  income satisf female [fw=count]

Frequency weights are based on the expression: count
Table entries are cell frequencies
Missing categories ignored
------------------------------------------------------------
          |                female and satisf                
          | ---------- M ---------    ---------- F ---------
   income |    1     3     4     5       1     3     4     5
----------+-------------------------------------------------
        3 |    1     1     2     1       1     3    11     2
       10 |          3     5     1       2     3    17     3
       20 |                7     3             1     8     5
       35 |          1     9     6             2     4     2
------------------------------------------------------------

quietly xi: poisson count i.female*i.income i.female*i.satisf 
fitstat, saving(m0)

Measures of Fit for poisson of count
Log-Lik Intercept Only:      -96.116     Log-Lik Full Model:          -48.006
D(18):                        96.012     LR(13):                       96.220
                                         Prob > LR:                     0.000
McFadden's R2:                 0.501     McFadden's Adj R2:             0.355
Maximum Likelihood R2:         0.951     Cragg & Uhler's R2:            0.953
AIC:                           3.875     AIC*n:                       124.012
BIC:                          33.628     BIC':                        -51.165
(Indices saved in matrix fs_m0)

quietly xi: poisson count i.income*i.satisf i.female*i.income i.female*i.satisf 
fitstat, using(m0)

Measures of Fit for poisson of count
                             Current            Saved       Difference
Model:                       poisson          poisson
N:                                32               32                0
Log-Lik Intercept Only:      -96.116          -96.116            0.000
Log-Lik Full Model:          -41.868          -48.006            6.137
D:                            83.737(9)        96.012(18)       12.275(9)
LR:                          108.495(22)       96.220(13)       12.275(9)
Prob > LR:                     0.000            0.000            0.198
McFadden's R2:                 0.564            0.501            0.064
McFadden's Adj R2:             0.325            0.355           -0.030
Maximum Likelihood R2:         0.966            0.951            0.016
Cragg & Uhler's R2:            0.969            0.953            0.016
AIC:                           4.054            3.875            0.179
AIC*n:                       129.737          124.012            5.725
BIC:                          52.545           33.628           18.917
BIC':                        -32.248          -51.165           18.917
Difference of   18.917 in BIC' provides very strong support for saved model.
Note: p-value for difference in LR is only valid if models are nested.
Section 7.3.4.
recode income 3 = 1 10 = 2 20 = 3 35 = 4, gen(score1)
(32 differences between income and score)

gen score2 = 1
replace score2 = satisf - 1 if satisf ~=1
(24 real changes made)

gen asso=score1*score2
xi3: poisson count i.female*i.income i.female*i.satisf asso

Poisson regression                                Number of obs   =         32
                                                  LR chi2(14)     =     103.26
                                                  Prob > chi2     =     0.0000
Log likelihood = -44.485002                       Pseudo R2       =     0.5372
------------------------------------------------------------------------------
       count |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _Ifemale_1 |   1.395172   1.208614     1.15   0.248    -.9736688    3.764013
 _Iincome_10 |  -.5137616   .6911784    -0.74   0.457    -1.868446    .8409231
 _Iincome_20 |  -1.586161   1.030348    -1.54   0.124    -3.605606    .4332843
 _Iincome_35 |  -2.360276   1.476453    -1.60   0.110     -5.25407    .5335184
  _Ife1Xin10 |    -.19793   .6440272    -0.31   0.759      -1.4602     1.06434
  _Ife1Xin20 |  -.8761533   .6673139    -1.31   0.189    -2.184064    .4317579
  _Ife1Xin35 |  -1.894792   .6889112    -2.75   0.006    -3.245033   -.5445512
  _Isatisf_3 |   .7452832    1.12599     0.66   0.508    -1.461616    2.952182
  _Isatisf_4 |   1.233372   1.207738     1.02   0.307    -1.133751    3.600495
  _Isatisf_5 |  -.7053711   1.552878    -0.45   0.650    -3.748956    2.338213
   _Ife1Xsa3 |  -.3253637    1.28563    -0.25   0.800    -2.845153    2.194425
   _Ife1Xsa4 |  -.1124099   1.201721    -0.09   0.925    -2.467741    2.242921
   _Ife1Xsa5 |  -.3043782   1.271368    -0.24   0.811    -2.796214    2.187458
        asso |   .3878149   .1547002     2.51   0.012     .0846081    .6910216
       _cons |  -1.354202   1.054484    -1.28   0.199    -3.420953    .7125493
------------------------------------------------------------------------------

fitstat, using(m0)

Measures of Fit for poisson of count
                             Current            Saved       Difference
Model:                       poisson          poisson
N:                                32               32                0
Log-Lik Intercept Only:      -96.116          -96.116            0.000
Log-Lik Full Model:          -44.485          -48.006            3.521
D:                            88.970(17)       96.012(18)        7.042(1)
LR:                          103.261(14)       96.220(13)        7.042(1)
Prob > LR:                     0.000            0.000            0.008
McFadden's R2:                 0.537            0.501            0.037
McFadden's Adj R2:             0.381            0.355            0.026
Maximum Likelihood R2:         0.960            0.951            0.010
Cragg & Uhler's R2:            0.963            0.953            0.010
AIC:                           3.718            3.875           -0.158
AIC*n:                       118.970          124.012           -5.042
BIC:                          30.052           33.628           -3.576
BIC':                        -54.741          -51.165           -3.576
Difference of    3.576 in BIC' provides positive support for current model.
Note: p-value for difference in LR is only valid if models are nested.

test asso

 ( 1)  [count]asso = 0
           chi2(  1) =    6.28
         Prob > chi2 =    0.0122

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California