UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Textbook Examples
Applied Survival Analysis by Hosmer and Lemeshow
Chapter 7: Extensions of the Proportional Hazard Model

The data files used for the examples in this text can be downloaded in a zip file from the Wiley FTP website or the Stata Web site.  You can then use a program such as WinZip to unzip the data files.  If you need assistance getting data into Stata, please see our Stata Class Notes, especially the unit on Entering Data.  (NOTE:  The *.dat files are the data files, and the *.txt files contain the codebook information.)
We are using the UIS data and we will be considering the model specified in Table 5.11. First we create an id variable and then we stset the data in order to run the Cox model.
use uis, clear

gen id = _n
stset time, failure(censor) id(id)
Creating the dummy variables and the fractional polynomial variable used in the model in Table 5.11.
gen ivhx3 = ivhx==3
gen agsi = age*site
gen rasi = race*site

fracgen ndrugtx -1 -1 
rename ndrugt_1 fp1
rename  ndrugt_2 fp2
Table 7.1, page 245. The proportional hazard model stratified by site. The bases option is used to obtain the baseline survivorship function.
stcox age becktota fp1 fp2 ivhx3 race treat agsi rasi, nohr bases(bs0) strata(site)

                                                   LR chi2(9)      =     66.18
Log likelihood  =   -2349.7757                     Prob > chi2     =    0.0000
------------------------------------------------------------------------------
          _t |
          _d |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0413808   .0099136    -4.17   0.000    -.0608111   -.0219504
    becktota |   .0087201   .0049705     1.75   0.079    -.0010218    .0184621
         fp1 |  -.5726626     .12524    -4.57   0.000    -.8181285   -.3271967
         fp2 |  -.2135928   .0486094    -4.39   0.000    -.3088654   -.1183202
       ivhx3 |   .2313393   .1087515     2.13   0.033     .0181902    .4444883
        race |   -.463756   .1348757    -3.44   0.001    -.7281075   -.1994044
       treat |  -.2486273   .0944097    -2.63   0.008     -.433667   -.0635876
        agsi |   .0328459   .0161159     2.04   0.042     .0012592    .0644325
        rasi |   .8469653   .2479214     3.42   0.001     .3610484    1.332882
------------------------------------------------------------------------------
                                                            Stratified by site
Calculations in the middle of page 246.
gen bs00 = bs0^exp(-2.0079) if site==0
gen bs10 = bs0^exp(-0.8903) if site==1
gen bs01 = bs0^exp(-2.0079 - .2486) if site==0
gen bs11 = bs0^exp(-0.8903 -0.2486) if site==1
Figure 7.1, page 247. Graphs of the modified risk-score adjusted stratum-specific survivorship functions for treatment by site.
graph twoway scatter bs00 bs10 bs01 bs11 time, ///
	msymbol(i i i i) connect(J J J J) sort ylabel(0(.25)1) xlabel(0(220)880)
Create dataset that will include the dichotomous time-dependent variable off_trt which is equal to zero when time < los and equal to 1 for time >= los.  In order to accomplish this we need to return to the original UIS data set. For more information about how to create categorical time-dependent variable in Stata please see the supplementary notes.
use uis, clear

gen id = _n
summ los 
gen enter = 404 - los
gen exit = time + (404 - los)
stset exit, enter(t enter) failure(censor) id(id)
stsplit off_trt, at(0, 404)
replace off_trt = 1 if off_trt ==404
gen t1 = _t - _t0
rename _t oldt
rename _t0 oldt0

stset oldt, origin(oldt0) failure(censor) id(id)
Creating all the variable used in the model in Table 5.11.
gen ivhx3 = ivhx==3
gen agsi = age*site
gen rasi = race*site

fracgen ndrugtx -1 -1 
rename ndrugt_1 fp1
rename  ndrugt_2 fp2
Table 7.3, page 252. Proportional hazard model including the dichotomous time-dependent variable off_trt.
stcox age becktota fp1 fp2 ivhx3 race treat site agsi rasi off_trt, nohr

Log likelihood  =   -2461.6565                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t |
          _d |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   -.037877   .0100552    -3.77   0.000    -.0575849   -.0181691
    becktota |   .0079745   .0049148     1.62   0.105    -.0016584    .0176073
         fp1 |  -.6086314   .1283462    -4.74   0.000    -.8601853   -.3570775
         fp2 |  -.2255842   .0495962    -4.55   0.000     -.322791   -.1283774
       ivhx3 |   .2746718   .1089458     2.52   0.012     .0611419    .4882016
        race |  -.5169569   .1344993    -3.84   0.000    -.7805707   -.2533432
       treat |   .0193972   .0961311     0.20   0.840    -.1690164    .2078108
        site |  -.9690888   .5158775    -1.88   0.060     -1.98019    .0420125
        agsi |   .0363502   .0158037     2.30   0.021     .0053755    .0673249
        rasi |   .5109338    .256902     1.99   0.047     .0074152    1.014452
     off_trt |   2.571152   .1567619    16.40   0.000     2.263904      2.8784
------------------------------------------------------------------------------
In order to run the Cox model with delayed entry we use the original UIS data and stset it again using the los variable as the entry time by using the time0 option in the stset command.
use uis, clear

stset time, failure(censor) time0(los)
Creating all the variable used in the model in Table 5.11.
gen ivhx3 = ivhx==3
gen agsi = age*site
gen rasi = race*site

fracgen ndrugtx -1 -1 
rename ndrugt_1 fp1
rename  ndrugt_2 fp2
Table 7.4, page 255. The proportional hazard model based on a model using delayed entry of subjects.
stcox age becktota fp1 fp2 ivhx3 race treat site agsi rasi, nohr

Log likelihood  =   -1910.1611                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t |
          _d |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0331928   .0109011    -3.04   0.002    -.0545586    -.011827
    becktota |   .0045483   .0053431     0.85   0.395    -.0059239    .0150205
         fp1 |  -.5460619    .142715    -3.83   0.000    -.8257783   -.2663456
         fp2 |  -.2038103    .054865    -3.71   0.000    -.3113438   -.0962769
       ivhx3 |   .2150811   .1185216     1.81   0.070     -.017217    .4473791
        race |   -.494499   .1424449    -3.47   0.001    -.7736858   -.2153122
       treat |   .1395693   .1050129     1.33   0.184    -.0662521    .3453908
        site |  -.9601262   .5559031    -1.73   0.084    -2.049676    .1294238
        agsi |   .0398465   .0171074     2.33   0.020     .0063166    .0733763
        rasi |    .220328   .2896135     0.76   0.447     -.347304      .78796
------------------------------------------------------------------------------
Creating the long data set described on page 262 from the UIS data set.
use uis, clear

gen id = _n
gen month = time/ 30.41667
gen inter = recode( month, 6, 12,18, 24)
gen count = inter/6
sort id
expand count
sort id
quietly by id: gen intnew = _n
quietly by id: gen censor1 = cond( _n == _N, censor, 0)
tab intnew, gen(int)
Table 7.5, page 263.
list id month inter censor1 age if id==1|id==2|id==3|id==4|id==5|id==31| id==388, nodi nolabel

             id      month      inter    censor1        age
   1.         1   6.180821         12          0         39
   2.         1   6.180821         12          1         39
   3.         2   .8547944          6          1         33
   4.         3   6.805479         12          0         33
   5.         3   6.805479         12          1         33
   6.         4   4.734246          6          1         32
   7.         5   18.11507         24          0         24
   8.         5   18.11507         24          0         24
   9.         5   18.11507         24          0         24
  10.         5   18.11507         24          0         24
  55.        31   17.19452         18          0         39
  56.        31   17.19452         18          0         39
  57.        31   17.19452         18          0         39
 696.       388   18.67397         24          0         43
 697.       388   18.67397         24          0         43
 698.       388   18.67397         24          0         43
 699.       388   18.67397         24          1         43
Creating all the variable used in the model in Table 5.11.
gen ivhx3 = ivhx==3
gen agsi = age*site
gen rasi = race*site

fracgen ndrugtx -1 -1 
rename ndrugt_1 fp1
rename  ndrugt_2 fp2
Table 7.6, page 264. Interval censored proportional hazard model.

Since we are modeling the variable censor1 which is dichotomous we use the binomial distribution in the family option. The link function is the complementary log-log function.
glm  censor1 age becktota fp1 fp2 ivhx3 race treat site agsi rasi /// 
	int1 int2 int3 int4, nocons nolog family(bin) link(cl)

Generalized linear models                          No. of obs      =      1069
Optimization     : ML: Newton-Raphson              Residual df     =      1055
                                                   Scale param     =         1
Deviance         =  1296.932997                    (1/df) Deviance =   1.22932
Pearson          =  1057.610631                    (1/df) Pearson  =  1.002475

Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = ln(-ln(1-u))             [Complementary log-log]
Standard errors  : OIM

Log likelihood   = -648.4664986                    AIC             =  1.239413
BIC              =  1199.290293

------------------------------------------------------------------------------
     censor1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0397573   .0101431    -3.92   0.000    -.0596373   -.0198773
    becktota |   .0061059   .0051259     1.19   0.234    -.0039407    .0161525
         fp1 |  -.5310227   .1295652    -4.10   0.000    -.7849659   -.2770795
         fp2 |  -.1967928   .0502341    -3.92   0.000    -.2952498   -.0983357
       ivhx3 |   .2355583   .1119112     2.10   0.035     .0162163    .4549003
        race |  -.4442589   .1378599    -3.22   0.001    -.7144594   -.1740584
       treat |  -.2351556   .0966823    -2.43   0.015    -.4246494   -.0456618
        site |  -1.219912   .5440598    -2.24   0.025    -2.286249    -.153574
        agsi |   .0289788   .0164471     1.76   0.078     -.003257    .0612145
        rasi |   .8248473   .2535759     3.25   0.001     .3278477    1.321847
        int1 |   1.827326   .4366061     4.19   0.000     .9715935    2.683058
        int2 |   1.745001   .4504445     3.87   0.000     .8621463    2.627856
        int3 |   .9035544    .479056     1.89   0.059    -.0353782    1.842487
        int4 |  -.8158451   .7317465    -1.11   0.265    -2.250042    .6183517
------------------------------------------------------------------------------
Creating the data set and the variables to be used in Figure 7.2, page 267. We no longer need to use the long data set so we return to the original UIS data set. We also need to create the variables use in the model in Table 5.11.
use uis, clear

gen ivhx3 = ivhx==3
gen agsi = age*site
gen rasi = race*site
fracgen ndrugtx -1 -1 
rename ndrugt_1 fp1
rename ndrugt_2 fp2
We have to generate the modified risk score, called rm in this example, described on pages 144-145. We then calculate the median of rm which is used to generate the modified risk-score adjusted survivorship functions in Figure 7.2.
gen rm = -.0397573*age + .0061059*becktota+ -.5310227*fp1 + -.1967928*fp2 + ///
2355583*ivhx3+ -.4442589*race + -1.219912*site+ .0289788*agsi + .8248473*rasi 

sum rm, detail
We use the estimates from the int1-int4 variables in the model in Table 7.6 to calculate the baseline survivorship function called s0. The modified risk-score adjusted survivorship functions for treat = 0, short treatment, is then equal to s0 raised to the exponential of the median of rm. For treat = 1, the long treatment, it is equal to s0 raised to the exponential of the median of rm plus the coefficient estimate of treat in the model in Table 7.6.
gen month = time/ 30.41667
gen inter = recode( month, 6, 12,18, 24)

scalar c1 = exp( -exp(1.827326 ))
scalar c2 = c1*exp( -exp(1.745001 ))
scalar c3 = c2*exp(-exp(.9035544  ))
scalar c4 = c3*exp( -exp(-.8158451  ))
di c1-c4

gen s0 = c1*(inter==6) + c2*(inter==12) + c3*(inter==18) + c4*(inter==4)
gen s0_short = s0^exp(-2.027)
gen s0_long = s0^exp(-2.027 - .235)
In order to make the graph look like the one in the book we need to create an extra observation where time will equal zero and both survival functions will equal 1. This added observation is purely for cosmetic purposes and can be left out. The count command tells us how many observations we have in the dataset. The set obs 629 creates an extra observation added to the end of the data set which in this case makes it observation number 629. Then we specify that for this observation time=0 and both survival functions equal 1.
count
set obs 629
replace s0_short = 1 if _n == 629
replace s0_long = 1 if _n == 629
replace inter = 0 if _n == 629
Figure 7.2, page 267.
graph twoway scatter s0_short s0_long inter, ///
	connect(J J) sort ylabel(0(.25)1) xlabel(0(6)24)

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California