|
|
|
||||
|
|
|||||
use http://www.ats.ucla.edu/stat/stata/library/cancer
describe
Contains data from cancer.dta
obs: 48 Patient Survival in Drug Trial
vars: 7 2 Jan 1904 13:58
size: 1,248 (99.1% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id float %9.0g
studytime int %8.0g Months to death or end of exp.
died int %8.0g 1 if patient died
drug float %9.0g
age int %8.0g Patient's age at start of exp.
distime float %9.0g
censor float %9.0g
-------------------------------------------------------------------------------
tab distime
distime | Freq. Percent Cum.
------------+-----------------------------------
1 | 11 22.92 22.92
2 | 13 27.08 50.00
3 | 6 12.50 62.50
4 | 8 16.67 79.17
5 | 4 8.33 87.50
6 | 6 12.50 100.00
------------+-----------------------------------
Total | 48 100.00
univar age
-------------- Quantiles --------------
Variable n Mean S.D. Min .25 Mdn .75 Max
-------------------------------------------------------------------------------
age 48 55.88 5.66 47.00 50.50 56.00 60.00 67.00
-------------------------------------------------------------------------------
list distime drug age died censor if id==5
distime drug age died censor
5. 1 0 56 1 0
list distime drug age died censor if id==10
distime drug age died censor
10. 2 0 58 0 1
list distime drug age died censor if id==20
distime drug age died censor
20. 4 0 52 1 0
Patient 5 (56 years old, did not receive a drug treatment) was observed for one
time period, died. So, the observation for patient was not censored. Patient 10
(58, no drug) was observed for two time periods did not die, i.e., observation
was censored. Finally, patient 20 (52, no drug)
was observed for four time periods, died (not censored).In this dataset there is one observation for each patient. In order to do discrete time survival analysis we to have as many observations as there are time periods for each patient. For patients that die we need a response variable that is zero until the last time period when it is coded one. For patients that don't die the response variable will be zero for every observation.
A collection of Stata commands written by Alexis Dinno (Harvard School of Public Health) will help us with the analysis. You can download this family of commands from within Stata by typing findit dthaz (see How can I use the findit command to search for programs and get additional help? for more information about using findit).The command that we are interested in is prsnperd which creates the type of dataset that we want. prsnperd wants a variable that indicates whether the observation is censored or not which in our dataset is the variable censor. prsnperd creates the following variables: _period which is the time period, _Y which is the response variable and _d1 through _d6 which are the dummy coded time periods. Here is what it looks like.
prsnperd id distime censor
list id _period _Y if id==5
id _period _Y
5. 5 1 1
list id _period _Y if id==10
id _period _Y
11. 10 1 0
12. 10 2 0
list id _period _Y if id==20
id _period _Y
35. 20 1 0
36. 20 2 0
37. 20 3 0
38. 20 4 1
Now we can actually do the discrete time survival analysis using the logit command.
We will run logit with and without the cluster and nocons options. The
nocons options is used so that the dummy variables for all of the time periods will
be included.
logit _Y drug age _d1-_d6, cluster(id) nocons
Logit estimates Number of obs = 143
Wald chi2(8) = 45.39
Log likelihood = -55.65503 Prob > chi2 = 0.0000
(standard errors adjusted for clustering on id)
------------------------------------------------------------------------------
| Robust
_Y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drug | -3.024052 .6859866 -4.41 0.000 -4.368561 -1.679543
age | .1607128 .0497324 3.23 0.001 .063239 .2581866
_d1 | -9.309867 2.754574 -3.38 0.001 -14.70873 -3.911001
_d2 | -8.335442 2.641359 -3.16 0.002 -13.51241 -3.158473
_d3 | -8.326742 2.533321 -3.29 0.001 -13.29196 -3.361525
_d4 | -7.071942 2.564526 -2.76 0.006 -12.09832 -2.045564
_d5 | -7.19799 2.490689 -2.89 0.004 -12.07965 -2.316328
_d6 | -7.622593 2.722941 -2.80 0.005 -12.95946 -2.285726
------------------------------------------------------------------------------
logit _Y drug age _d1-_d6, nocons
Logit estimates Number of obs = 143
LR chi2(8) = .
Log likelihood = -55.65503 Prob > chi2 = .
------------------------------------------------------------------------------
_Y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drug | -3.024052 .6347086 -4.76 0.000 -4.268058 -1.780046
age | .1607128 .051414 3.13 0.002 .0599433 .2614823
_d1 | -9.309867 2.922645 -3.19 0.001 -15.03815 -3.581589
_d2 | -8.335442 2.780394 -3.00 0.003 -13.78491 -2.885969
_d3 | -8.326742 2.823744 -2.95 0.003 -13.86118 -2.792306
_d4 | -7.071942 2.734906 -2.59 0.010 -12.43226 -1.711624
_d5 | -7.19799 2.811519 -2.56 0.010 -12.70847 -1.687513
_d6 | -7.622593 2.988678 -2.55 0.011 -13.48029 -1.764892
------------------------------------------------------------------------------
logit, or
Logit estimates Number of obs = 143
LR chi2(8) = .
Log likelihood = -55.65503 Prob > chi2 = .
------------------------------------------------------------------------------
_Y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drug | .0486039 .0308493 -4.76 0.000 .014009 .1686304
age | 1.174348 .0603779 3.13 0.002 1.061776 1.298854
_d1 | .0000905 .0002646 -3.19 0.001 2.94e-07 .0278315
_d2 | .0002399 .0006669 -3.00 0.003 1.03e-06 .0558007
_d3 | .000242 .0006832 -2.95 0.003 9.55e-07 .0612797
_d4 | .0008486 .0023208 -2.59 0.010 3.99e-06 .1805723
_d5 | .0007481 .0021033 -2.56 0.010 3.03e-06 .184979
_d6 | .0004893 .0014623 -2.55 0.011 1.40e-06 .1712052
------------------------------------------------------------------------------
Both drug and age are significant with the older patients more likely to
die and those on drug therapy less likely. It is useful to look at the hazard function
(and survival function) to ascertain the effects over time. The dthaz command (from Dinno)
will produce a table with hazard and survival values for each time period. We will
specify
the function for drug=1 (drug therapy) and age=56 (the median age).
dthaz drug age, specify(1 56)
Discrete-Time Estimation of Conditional Hazard and Survival Probabilities
------------------------------------------------------------------------------
Time Parameterization: Fully Discrete
Additional predictors specified as:
drug = 1
age = 56
-----------------------------------------
Period p(Hazard) p(Survival)
(T_j) ^H(T_j) ^S(T_j)
-----------------------------------------
0 -- 1
1 0.0344 0.9656
2 0.0863 0.8822
3 0.0870 0.8055
4 0.2505 0.6037
5 0.2276 0.4663
6 0.1616 0.3910
-----------------------------------------
Logit Link (assumes proportional odds)
Notice that the hazard maxes out at time period four and then declines.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services