Stata Library
Discrete Time Survival Analysis

Discrete time survival analysis treats time not as a continuous variable, but as being divided into discrete units or chunks. We will be able to analyze discrete time data using logistic regression with indicator variables for each of the time periods. We will illustrate discrete time survival analysis using the cancer.dta dataset.

Cancer Example

After reading in the dataset, we will describe the variables and list several variables for patient 5, 10 and 20.
use http://www.ats.ucla.edu/stat/stata/library/cancer

describe

Contains data from cancer.dta
  obs:            48                          Patient Survival in Drug Trial
 vars:             7                          2 Jan 1904 13:58
 size:         1,248 (99.1% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
id              float  %9.0g                  
studytime       int    %8.0g                  Months to death or end of exp.
died            int    %8.0g                  1 if patient died
drug            float  %9.0g                  
age             int    %8.0g                  Patient's age at start of exp.
distime         float  %9.0g                  
censor          float  %9.0g                  
-------------------------------------------------------------------------------

tab distime 

    distime |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         11       22.92       22.92
          2 |         13       27.08       50.00
          3 |          6       12.50       62.50
          4 |          8       16.67       79.17
          5 |          4        8.33       87.50
          6 |          6       12.50      100.00
------------+-----------------------------------
      Total |         48      100.00

univar age
                                        -------------- Quantiles --------------
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
-------------------------------------------------------------------------------
     age      48    55.88     5.66    47.00    50.50    56.00    60.00    67.00
-------------------------------------------------------------------------------

list distime drug age died censor if id==5

       distime       drug       age      died     censor
  5.         1          0        56         1          0
  
list distime drug age died censor if id==10

       distime       drug       age      died     censor
 10.         2          0        58         0          1

list distime drug age died censor if id==20

       distime       drug       age      died     censor
 20.         4          0        52         1          0
Patient 5 (56 years old, did not receive a drug treatment) was observed for one time period, died. So, the observation for patient was not censored. Patient 10 (58, no drug) was observed for two time periods did not die, i.e., observation was censored. Finally, patient 20 (52, no drug) was observed for four time periods, died (not censored).

In this dataset there is one observation for each patient. In order to do discrete time survival analysis we to have as many observations as there are time periods for each patient. For patients that die we need a response variable that is zero until the last time period when it is coded one. For patients that don't die the response variable will be zero for every observation.

A collection of Stata commands written by Alexis Dinno (Harvard School of Public Health) will help us with the analysis. You can download this family of commands from within Stata by typing findit dthaz (see How can I use the findit command to search for programs and get additional help? for more information about using findit).

The command that we are interested in is prsnperd which creates the type of dataset that we want. prsnperd wants a variable that indicates whether the observation is censored or not which in our dataset is the variable censor. prsnperd creates the following variables: _period which is the time period, _Y which is the response variable and _d1 through _d6 which are the dummy coded time periods. Here is what it looks like.

prsnperd id distime censor

list id _period _Y if id==5

            id    _period         _Y
  5.         5          1          1

list id _period _Y if id==10

            id    _period         _Y
 11.        10          1          0
 12.        10          2          0

list id _period _Y if id==20

            id    _period         _Y
 35.        20          1          0
 36.        20          2          0
 37.        20          3          0
 38.        20          4          1
Now we can actually do the discrete time survival analysis using the logit command. We will run logit with and without the cluster and nocons options. The nocons options is used so that the dummy variables for all of the time periods will be included.
logit _Y drug age _d1-_d6, cluster(id) nocons

Logit estimates                                   Number of obs   =        143
                                                  Wald chi2(8)    =      45.39
Log likelihood =  -55.65503                       Prob > chi2     =     0.0000

                               (standard errors adjusted for clustering on id)
------------------------------------------------------------------------------
             |               Robust
          _Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        drug |  -3.024052   .6859866    -4.41   0.000    -4.368561   -1.679543
         age |   .1607128   .0497324     3.23   0.001      .063239    .2581866
         _d1 |  -9.309867   2.754574    -3.38   0.001    -14.70873   -3.911001
         _d2 |  -8.335442   2.641359    -3.16   0.002    -13.51241   -3.158473
         _d3 |  -8.326742   2.533321    -3.29   0.001    -13.29196   -3.361525
         _d4 |  -7.071942   2.564526    -2.76   0.006    -12.09832   -2.045564
         _d5 |   -7.19799   2.490689    -2.89   0.004    -12.07965   -2.316328
         _d6 |  -7.622593   2.722941    -2.80   0.005    -12.95946   -2.285726
------------------------------------------------------------------------------

logit _Y drug age _d1-_d6, nocons

Logit estimates                                   Number of obs   =        143
                                                  LR chi2(8)      =          .
Log likelihood =  -55.65503                       Prob > chi2     =          .

------------------------------------------------------------------------------
          _Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        drug |  -3.024052   .6347086    -4.76   0.000    -4.268058   -1.780046
         age |   .1607128    .051414     3.13   0.002     .0599433    .2614823
         _d1 |  -9.309867   2.922645    -3.19   0.001    -15.03815   -3.581589
         _d2 |  -8.335442   2.780394    -3.00   0.003    -13.78491   -2.885969
         _d3 |  -8.326742   2.823744    -2.95   0.003    -13.86118   -2.792306
         _d4 |  -7.071942   2.734906    -2.59   0.010    -12.43226   -1.711624
         _d5 |   -7.19799   2.811519    -2.56   0.010    -12.70847   -1.687513
         _d6 |  -7.622593   2.988678    -2.55   0.011    -13.48029   -1.764892
------------------------------------------------------------------------------

logit, or

Logit estimates                                   Number of obs   =        143
                                                  LR chi2(8)      =          .
Log likelihood =  -55.65503                       Prob > chi2     =          .

------------------------------------------------------------------------------
          _Y | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        drug |   .0486039   .0308493    -4.76   0.000      .014009    .1686304
         age |   1.174348   .0603779     3.13   0.002     1.061776    1.298854
         _d1 |   .0000905   .0002646    -3.19   0.001     2.94e-07    .0278315
         _d2 |   .0002399   .0006669    -3.00   0.003     1.03e-06    .0558007
         _d3 |    .000242   .0006832    -2.95   0.003     9.55e-07    .0612797
         _d4 |   .0008486   .0023208    -2.59   0.010     3.99e-06    .1805723
         _d5 |   .0007481   .0021033    -2.56   0.010     3.03e-06     .184979
         _d6 |   .0004893   .0014623    -2.55   0.011     1.40e-06    .1712052
------------------------------------------------------------------------------
Both drug and age are significant with the older patients more likely to die and those on drug therapy less likely. It is useful to look at the hazard function (and survival function) to ascertain the effects over time. The dthaz command (from Dinno) will produce a table with hazard and survival values for each time period. We will specify the function for drug=1 (drug therapy) and age=56 (the median age).
dthaz drug age, specify(1 56)

Discrete-Time Estimation of Conditional Hazard and Survival Probabilities
------------------------------------------------------------------------------
Time Parameterization: Fully Discrete

Additional predictors specified as:
drug = 1
age = 56


-----------------------------------------
   Period      p(Hazard)   p(Survival)
   (T_j)       ^H(T_j)     ^S(T_j)
-----------------------------------------
     0            --        1
     1          0.0344      0.9656
     2          0.0863      0.8822
     3          0.0870      0.8055
     4          0.2505      0.6037
     5          0.2276      0.4663
     6          0.1616      0.3910
-----------------------------------------
Logit Link (assumes proportional odds)
Notice that the hazard maxes out at time period four and then declines.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.