UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Textbook Examples
Applied Survival Analysis by Hosmer and Lemeshow
Chapter 1: Introduction

The data files used for the examples in this text can be downloaded in a zip file from the Wiley FTP website or the Stata Web site.  You can then use a program such as WinZip to unzip the data files.  If you need assistance getting data into Stata, please see our Stata Class Notes, especially the unit on Entering Data.  (NOTE:  The *.dat files are the data files, and the *.txt files contain the codebook information.)
use hivdata, clear
Table 1.1, page 4.

Note: The variable Censor in the book is called died in the dataset.
list

            id    entdate    enddate       time        age       drug       died
  1.         1    15may90    14oct90          5         46          0          1
  2.         2    19sep89    20mar90          6         35          1          0
  3.         3    21apr91    20dec91          8         30          1          1
  4.         4    03jan91    04apr91          3         30          1          1
  5.         5    18sep89    19jul91         22         36          0          1
  6.         6    18mar91    17apr91          1         32          1          0
  7.         7    11nov89    11jun90          7         36          1          1
  8.         8    25nov89    25aug90          9         31          1          1
  9.         9    11feb91    13may91          3         48          0          1
 10.        10    11aug89    11aug90         12         47          0          1
.. [ remainder of data omitted]
Figure 1.1, page 6.
graph twoway (scatter time age if censor == 1, msymbol(X)) (scatter time age if censor == 0, msymbol(Oh)), ///
	ylabel(0(10)60) xlabel(15(5)55) legend(off)
Figure 1.2, page 7.
generate age2 = 1000/age
graph twoway (scatter time age2 if censor == 1, msymbol(X)) (scatter time age2 if censor == 0, msymbol(Oh)), ///
	ylabel(0(10)60) xlabel(15(5)55) legend(off)
Table 1.2, page 14.
stset time, failure(died==0)

     failure event:  died == 0
obs. time interval:  (0, time]
 exit on or before:  failure

------------------------------------------------------------------------------
      100  total obs.
        0  exclusions
------------------------------------------------------------------------------
      100  obs. remaining, representing
       80  failures in single record/single failure data
     1136  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =        60

streg age, dist(exp) nohr

         failure _d:  died == 0
   analysis time _t:  time

Iteration 0:   log likelihood = -157.25531  
Iteration 1:   log likelihood = -142.27706  
Iteration 2:   log likelihood = -140.00561  
Iteration 3:   log likelihood = -140.00523  
Iteration 4:   log likelihood = -140.00523  

Exponential regression -- log relative-hazard form 

No. of subjects =          100                     Number of obs   =       100
No. of failures =           80
Time at risk    =         1136
                                                   LR chi2(1)      =     34.50
Log likelihood  =   -140.00523                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0939301   .0157719     5.96   0.000     .0630179    .1248424
       _cons |  -5.859021   .5852682   -10.01   0.000    -7.006126   -4.711917
------------------------------------------------------------------------------
Figure 1.3, page 16.
predict xb, xb
generate t = exp(-xb)
graph twoway (scatter time age if censor == 1, msymbol(X)) (scatter time age if censor == 0, msymbol(Oh)), ///
         (line t age, sort), ylabel(0(10)60) xlabel(15(5)55) legend(off)
Figure 1.4, page 19.
clear
input subj tp censored str11 datestr
1 1 0 "1 jan 1990"
1 2 0 "1 mar 1991"
2 1 1 "1 feb 1990"
2 2 1 "1 feb 1991"
3 1 1 "1 jun 1990"
3 2 1 "31 dec 1991"
4 1 0 "1 sep 1990"
4 2 0 "1 apr 1991"
end

generate date = date(datestr, "dmy")
format date %dmy
sort subj
graph twoway (scatter subj date, connect(L) msymbol(oh)) ///
	(scatter subj date if censored == 0, msymbol(X))
Figure 1.5, page 20.
generate time = 0 if tp==1
replace time = (date-date[_n-1])/30.5 if tp==2
graph twoway (scatter subj time, connect(L) msymbol(oh)) ///
	(scatter subj time if censored == 0, msymbol(X))

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.