|
|
|
||||
|
|
|||||
The data files used for the examples in this text can be downloaded in a zip file from the Wiley FTP website or the Stata Web site. You can then use a program such as WinZip to unzip the data files. If you need assistance getting data into Stata, please see our Stata Class Notes, especially the unit on Entering Data. (NOTE: The *.dat files are the data files, and the *.txt files contain the codebook information.)
We are using the UIS data and we will be considering the model specified in Table 5.11. First we create an id variable and then we stset the data in order to run the Cox model.
use uis, clear gen id = _n stset time, failure(censor) id(id)
Creating the dummy variables and the fractional polynomial variable used in the model in Table 5.11.
gen ivhx3 = ivhx==3 gen agsi = age*site gen rasi = race*site fracgen ndrugtx -1 -1 rename ndrugt_1 fp1 rename ndrugt_2 fp2
Table 7.1, page 245. The proportional hazard model stratified by site. The bases option is used to obtain the baseline survivorship function.
stcox age becktota fp1 fp2 ivhx3 race treat agsi rasi, nohr bases(bs0) strata(site)
LR chi2(9) = 66.18
Log likelihood = -2349.7757 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t |
_d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0413808 .0099136 -4.17 0.000 -.0608111 -.0219504
becktota | .0087201 .0049705 1.75 0.079 -.0010218 .0184621
fp1 | -.5726626 .12524 -4.57 0.000 -.8181285 -.3271967
fp2 | -.2135928 .0486094 -4.39 0.000 -.3088654 -.1183202
ivhx3 | .2313393 .1087515 2.13 0.033 .0181902 .4444883
race | -.463756 .1348757 -3.44 0.001 -.7281075 -.1994044
treat | -.2486273 .0944097 -2.63 0.008 -.433667 -.0635876
agsi | .0328459 .0161159 2.04 0.042 .0012592 .0644325
rasi | .8469653 .2479214 3.42 0.001 .3610484 1.332882
------------------------------------------------------------------------------
Stratified by site
Calculations in the middle of page 246.
gen bs00 = bs0^exp(-2.0079) if site==0 gen bs10 = bs0^exp(-0.8903) if site==1 gen bs01 = bs0^exp(-2.0079 - .2486) if site==0 gen bs11 = bs0^exp(-0.8903 -0.2486) if site==1
Figure 7.1, page 247. Graphs of the modified risk-score adjusted stratum-specific survivorship functions for treatment by site.
graph twoway scatter bs00 bs10 bs01 bs11 time, /// msymbol(i i i i) connect(J J J J) sort ylabel(0(.25)1) xlabel(0(220)880)
Create dataset that will include the dichotomous time-dependent variable off_trt which is equal to zero when time < los and equal to 1 for time >= los. In order to accomplish this we need to return to the original UIS data set. For more information about how to create categorical time-dependent variable in Stata please see the supplementary notes.
use uis, clear gen id = _n summ los gen enter = 404 - los gen exit = time + (404 - los) stset exit, enter(t enter) failure(censor) id(id) stsplit off_trt, at(0, 404) replace off_trt = 1 if off_trt ==404 gen t1 = _t - _t0 rename _t oldt rename _t0 oldt0 stset oldt, origin(oldt0) failure(censor) id(id)
Creating all the variable used in the model in Table 5.11.
gen ivhx3 = ivhx==3 gen agsi = age*site gen rasi = race*site fracgen ndrugtx -1 -1 rename ndrugt_1 fp1 rename ndrugt_2 fp2
Table 7.3, page 252. Proportional hazard model including the dichotomous time-dependent variable off_trt.
stcox age becktota fp1 fp2 ivhx3 race treat site agsi rasi off_trt, nohr
Log likelihood = -2461.6565 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t |
_d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.037877 .0100552 -3.77 0.000 -.0575849 -.0181691
becktota | .0079745 .0049148 1.62 0.105 -.0016584 .0176073
fp1 | -.6086314 .1283462 -4.74 0.000 -.8601853 -.3570775
fp2 | -.2255842 .0495962 -4.55 0.000 -.322791 -.1283774
ivhx3 | .2746718 .1089458 2.52 0.012 .0611419 .4882016
race | -.5169569 .1344993 -3.84 0.000 -.7805707 -.2533432
treat | .0193972 .0961311 0.20 0.840 -.1690164 .2078108
site | -.9690888 .5158775 -1.88 0.060 -1.98019 .0420125
agsi | .0363502 .0158037 2.30 0.021 .0053755 .0673249
rasi | .5109338 .256902 1.99 0.047 .0074152 1.014452
off_trt | 2.571152 .1567619 16.40 0.000 2.263904 2.8784
------------------------------------------------------------------------------
In order to run the Cox model with delayed entry we use the original UIS data and stset it again using the los variable as the entry time by using the time0 option in the stset command.
use uis, clear stset time, failure(censor) time0(los)
Creating all the variable used in the model in Table 5.11.
gen ivhx3 = ivhx==3 gen agsi = age*site gen rasi = race*site fracgen ndrugtx -1 -1 rename ndrugt_1 fp1 rename ndrugt_2 fp2
Table 7.4, page 255. The proportional hazard model based on a model using delayed entry of subjects.
stcox age becktota fp1 fp2 ivhx3 race treat site agsi rasi, nohr
Log likelihood = -1910.1611 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t |
_d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0331928 .0109011 -3.04 0.002 -.0545586 -.011827
becktota | .0045483 .0053431 0.85 0.395 -.0059239 .0150205
fp1 | -.5460619 .142715 -3.83 0.000 -.8257783 -.2663456
fp2 | -.2038103 .054865 -3.71 0.000 -.3113438 -.0962769
ivhx3 | .2150811 .1185216 1.81 0.070 -.017217 .4473791
race | -.494499 .1424449 -3.47 0.001 -.7736858 -.2153122
treat | .1395693 .1050129 1.33 0.184 -.0662521 .3453908
site | -.9601262 .5559031 -1.73 0.084 -2.049676 .1294238
agsi | .0398465 .0171074 2.33 0.020 .0063166 .0733763
rasi | .220328 .2896135 0.76 0.447 -.347304 .78796
------------------------------------------------------------------------------
Creating the long data set described on page 262 from the UIS data set.
use uis, clear gen id = _n gen month = time/ 30.41667 gen inter = recode( month, 6, 12,18, 24) gen count = inter/6 sort id expand count sort id quietly by id: gen intnew = _n quietly by id: gen censor1 = cond( _n == _N, censor, 0) tab intnew, gen(int)
Table 7.5, page 263.
list id month inter censor1 age if id==1|id==2|id==3|id==4|id==5|id==31| id==388, nodi nolabel
id month inter censor1 age
1. 1 6.180821 12 0 39
2. 1 6.180821 12 1 39
3. 2 .8547944 6 1 33
4. 3 6.805479 12 0 33
5. 3 6.805479 12 1 33
6. 4 4.734246 6 1 32
7. 5 18.11507 24 0 24
8. 5 18.11507 24 0 24
9. 5 18.11507 24 0 24
10. 5 18.11507 24 0 24
55. 31 17.19452 18 0 39
56. 31 17.19452 18 0 39
57. 31 17.19452 18 0 39
696. 388 18.67397 24 0 43
697. 388 18.67397 24 0 43
698. 388 18.67397 24 0 43
699. 388 18.67397 24 1 43
Creating all the variable used in the model in Table 5.11.
gen ivhx3 = ivhx==3 gen agsi = age*site gen rasi = race*site fracgen ndrugtx -1 -1 rename ndrugt_1 fp1 rename ndrugt_2 fp2
Table 7.6, page 264. Interval censored proportional hazard model.
Since we are modeling the variable censor1 which is dichotomous we use the binomial distribution in the family option. The link function is the complementary log-log function.
glm censor1 age becktota fp1 fp2 ivhx3 race treat site agsi rasi ///
int1 int2 int3 int4, nocons nolog family(bin) link(cl)
Generalized linear models No. of obs = 1069
Optimization : ML: Newton-Raphson Residual df = 1055
Scale param = 1
Deviance = 1296.932997 (1/df) Deviance = 1.22932
Pearson = 1057.610631 (1/df) Pearson = 1.002475
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = ln(-ln(1-u)) [Complementary log-log]
Standard errors : OIM
Log likelihood = -648.4664986 AIC = 1.239413
BIC = 1199.290293
------------------------------------------------------------------------------
censor1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0397573 .0101431 -3.92 0.000 -.0596373 -.0198773
becktota | .0061059 .0051259 1.19 0.234 -.0039407 .0161525
fp1 | -.5310227 .1295652 -4.10 0.000 -.7849659 -.2770795
fp2 | -.1967928 .0502341 -3.92 0.000 -.2952498 -.0983357
ivhx3 | .2355583 .1119112 2.10 0.035 .0162163 .4549003
race | -.4442589 .1378599 -3.22 0.001 -.7144594 -.1740584
treat | -.2351556 .0966823 -2.43 0.015 -.4246494 -.0456618
site | -1.219912 .5440598 -2.24 0.025 -2.286249 -.153574
agsi | .0289788 .0164471 1.76 0.078 -.003257 .0612145
rasi | .8248473 .2535759 3.25 0.001 .3278477 1.321847
int1 | 1.827326 .4366061 4.19 0.000 .9715935 2.683058
int2 | 1.745001 .4504445 3.87 0.000 .8621463 2.627856
int3 | .9035544 .479056 1.89 0.059 -.0353782 1.842487
int4 | -.8158451 .7317465 -1.11 0.265 -2.250042 .6183517
------------------------------------------------------------------------------
Creating the data set and the variables to be used in Figure 7.2, page 267. We no longer need to use the long data set so we return to the original UIS data set. We also need to create the variables use in the model in Table 5.11.
use uis, clear gen ivhx3 = ivhx==3 gen agsi = age*site gen rasi = race*site fracgen ndrugtx -1 -1 rename ndrugt_1 fp1 rename ndrugt_2 fp2
We have to generate the modified risk score, called rm in this example, described on pages 144-145. We then calculate the median of rm which is used to generate the modified risk-score adjusted survivorship functions in Figure 7.2.
gen rm = -.0397573*age + .0061059*becktota+ -.5310227*fp1 + -.1967928*fp2 + /// 2355583*ivhx3+ -.4442589*race + -1.219912*site+ .0289788*agsi + .8248473*rasi sum rm, detail
We use the estimates from the int1-int4 variables in the model in Table 7.6 to calculate the baseline survivorship function called s0. The modified risk-score adjusted survivorship functions for treat = 0, short treatment, is then equal to s0 raised to the exponential of the median of rm. For treat = 1, the long treatment, it is equal to s0 raised to the exponential of the median of rm plus the coefficient estimate of treat in the model in Table 7.6.
gen month = time/ 30.41667 gen inter = recode( month, 6, 12,18, 24) scalar c1 = exp( -exp(1.827326 )) scalar c2 = c1*exp( -exp(1.745001 )) scalar c3 = c2*exp(-exp(.9035544 )) scalar c4 = c3*exp( -exp(-.8158451 )) di c1-c4 gen s0 = c1*(inter==6) + c2*(inter==12) + c3*(inter==18) + c4*(inter==4) gen s0_short = s0^exp(-2.027) gen s0_long = s0^exp(-2.027 - .235)
In order to make the graph look like the one in the book we need to create an extra observation where time will equal zero and both survival functions will equal 1. This added observation is purely for cosmetic purposes and can be left out. The count command tells us how many observations we have in the dataset. The set obs 629 creates an extra observation added to the end of the data set which in this case makes it observation number 629. Then we specify that for this observation time=0 and both survival functions equal 1.
count set obs 629 replace s0_short = 1 if _n == 629 replace s0_long = 1 if _n == 629 replace inter = 0 if _n == 629
Figure 7.2, page 267.
graph twoway scatter s0_short s0_long inter, /// connect(J J) sort ylabel(0(.25)1) xlabel(0(6)24)
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services