UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Textbook Examples
Applied Survival Analysis by Hosmer and Lemeshow
Chapter 6: Assessment of Model Adequacy

The data files used for the examples in this text can be downloaded in a zip file from the Wiley FTP website or the Stata Web site.  You can then use a program such as WinZip to unzip the data files.  If you need assistance getting data into Stata, please see our Stata Class Notes, especially the unit on Entering Data.  (NOTE:  The *.dat files are the data files, and the *.txt files contain the codebook information.)
Generating the variables needed for the model shown in Table 5.11.
use uis, clear

fracgen ndrugtx -1 -1
gen ivhx_3 = (ivhx == 3)
gen ndrugfp1 = ndrugt_1
gen ndrugfp2 = ndrugt_2
gen agesite = age*site
gen racesite = race*site
In order to use the stcox command we also need to stset.
stset time, failure(censor)

     failure event:  censor ~= 0 & censor ~= .
obs. time interval:  (0, time]
 exit on or before:  failure

------------------------------------------------------------------------------
      628  total obs.
        0  exclusions
------------------------------------------------------------------------------
      628  obs. remaining, representing
      508  failures in single record/single failure data
   147394  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =      1172
Table 6.4, page 213.

Using the model shown in Table 5.11 for the UIS data as a first step of assessing the proportional hazard assumption we will test all the predictors for proportionality by considering the interactions of the predictors and log(time). This can be done through the stcox command by using the tvc and texp options. The tvc is used to specify those variables that vary continuously with respect to time, i.e., time-varying covariates; the texp option is used in conjunction with tvc() to specify which function of analysis time should be multiplied by the time-varying covariates. In this example specifying texp(ln(_t)) causes the time-varying covariates to be multiplied by the logarithm of analysis time. The lrtest command is used after the model has been estimated in order to perform a likelihood ratio test testing all the interaction variables at once, see page. 213.
stcox age becktota ndrugfp1 ndrugfp2 ivhx_3 race treat site agesite ///
	racesite, nohr nolog noshow tvc(age becktota ndrugfp1 ivhx_3 race treat site) ///
	texp( ln(_t) )
lrtest, saving(0)

Cox regression -- Breslow method for ties

No. of subjects =          575                     Number of obs   =       575
No. of failures =          464
Time at risk    =       138900
                                                   LR chi2(17)     =     72.67
Log likelihood  =   -2627.6492                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t |
          _d |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
rh           |
         age |  -.0503561   .0415478    -1.21   0.226    -.1317883     .031076
    becktota |    .042473    .024727     1.72   0.086    -.0059911     .090937
    ndrugfp1 |  -.4986301   .1496103    -3.33   0.001    -.7918608   -.2053994
    ndrugfp2 |  -.2138032   .0486972    -4.39   0.000    -.3092479   -.1183584
      ivhx_3 |   .3705803   .5472171     0.68   0.498    -.7019454    1.443106
        race |  -1.017344   .6218219    -1.64   0.102    -2.236093    .2014043
       treat |  -.8566666    .482856    -1.77   0.076    -1.803047    .0897138
        site |   -1.21759   .7359162    -1.65   0.098    -2.659959    .2247794
     agesite |   .0327811   .0161217     2.03   0.042     .0011832     .064379
    racesite |   .8699072    .248368     3.50   0.000     .3831148      1.3567
-------------+----------------------------------------------------------------
t            |
         age |   .0017404   .0084981     0.20   0.838    -.0149156    .0183964
    becktota |  -.0071221   .0051428    -1.38   0.166    -.0172018    .0029577
    ndrugfp1 |  -.0155999   .0175918    -0.89   0.375    -.0500792    .0188795
      ivhx_3 |  -.0300747   .1133362    -0.27   0.791    -.2522096    .1920603
        race |   .1134445   .1249742     0.91   0.364    -.1315004    .3583893
       treat |    .127591   .0997829     1.28   0.201    -.0679799    .3231619
        site |  -.0226427   .1137032    -0.20   0.842    -.2454969    .2002115
------------------------------------------------------------------------------

note: second equation contains variables that continuously vary with respect to time; variables
      are interacted with current values of ln(_t).
Estimating the Cox model without the interactions in order to perform the likelihood ratio test.
quietly stcox age becktota ndrugfp1 ndrugfp2 ivhx_3 race treat site agesite racesite
lrtest, using(0)  

Cox:  likelihood-ratio test                           chi2(7)     =       5.54
                                                      Prob > chi2 =     0.5947
Figure. 6.4a, page 215.

Graphs of a scaled Schoenfeld residual and the lowess smooth obtained from the model in table 5.11 for predictors age, becktota, ndrugfp1, ivhx_3, race, treat, site, and racesite. The stph command requires that you run the stcox command with the scsch option which saves the scaled schoenfeld residual. The easiest way to specify this option is scsch(stub*) where stub is a short name of your choosing. Stata then creates variables stub1, stub2, etc. Alternatively, you may specify each variable name explicitly, in which case there must be as many (and no more) variables specified in scsch() as there are predictors in the model.
quietly stcox age becktota ndrugfp1 ndrugfp2 ivhx_3 race treat site agesite racesite, nohr sca(sca*)  

* Stata 8 code.
stphtest,log plot(age) yline(0)

* Stata 9 code and graph.
estat phtest, log plot(age) yline(0)
* Stata 8 code.
stphtest, log plot(becktota) yline(0)

* Stata 9 code and graph.
estat phtest, log plot(becktota) yline(0)
* Stata 8 code.
stphtest, log plot(ndrugfp1) yline(0)

* Stata 9 code and graph.
estat phtest,log plot(ndrugfp1) yline(0)
* Stata 8 code.
stphtest, log plot(ivhx_3) yline(0)

* Stata 9 code and graph.
estat phtest, log plot(ivhx_3) yline(0)
* Stata 8 code.
stphtest, log plot(race) yline(0)

* Stata 9 code and graph.
estat phtest, log plot(race) yline(0)
* Stata 8 code.
stphtest, log plot(treat) yline(0)

* Stata 9 code and graph.
estat phtest, log plot(treat) yline(0)
* Stata 8 code.
stphtest, log plot(site) yline(0)

* Stata 9 code and graph.
estat phtest, log plot(site) yline(0)
* Stata 8 code.
stphtest,log plot(racesite) yline(0)

* Stata 9 code and graph.
estat phtest,log plot(racesite) yline(0)
Then we want to drop the extra variables generated.
drop sca1-sca10
Figure. 6.5, page 217.

Graphs of the score residuals computed from the model in Table 5.11 for age, becktota, ndrugfp1 and agesite. The score residuals are obtained by using the ers option in the stcox command.
quietly stcox age becktota ndrugfp1 ndrugfp2 ivhx_3 race treat site agesite racesite, ///	
	nohr nolog esr(scr*)

graph twoway scatter scr1 age, ylabel(-26.26 30.76) xlabel(20 56)
graph twoway scatter scr2 becktota, ylabel(-54.25 31.37) xlabel(0 54)
graph twoway scatter scr3 ndrugtx, ylabel(-15.20 8.51) xlabel(0 40)
graph twoway scatter scr9 age, ylabel(-68.62 36.21) xlabel(20 56)
We drop the extra variables that we have created.
drop scr1-scr10 

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California