UCLA Academic Technology Services HomeServicesClassesContactJobs
Stat Computing > Seminars > SurvivalSAS
Search

Statistical Computing Seminars
Survival Analysis with SAS

The SAS program on which the seminar is based.
The uis_small data file for the seminar.

Background for Survival Analysis

The goal of this seminar is to give a brief introduction to the topic of survival analysis. We will be using a smaller and slightly modified  version of the UIS data set from the book "Applied Survival Analysis" by Hosmer and Lemeshow. We strongly encourage everyone who is interested in learning survival analysis to read this text as it is a very good and thorough introduction to the topic.

Survival analysis is just another name for time to event analysis. The term survival analysis is used predominately in biomedical sciences where the interest is in observing time to death either of patients or of laboratory animals. Time to event analysis has also been used widely in the social sciences where interest is on analyzing time to events such as job changes, marriage, birth of children and so forth. The engineering sciences have also contributed to the development of survival analysis which is called "reliability analysis" or "failure time analysis" in this field, since the main focus is in modeling the time it takes for machines or electronic components to break down. The developments from these diverse fields have for the most part been consolidated into the field of "survival analysis". For more background please refer to the excellent discussion in Chapter 1 of Event History Analysis by Paul Allison.

There are certain aspects of survival analysis data, such as censoring and non-normality, that generate great difficulty when trying to analyze the data using traditional statistical models such as multiple linear regression. The non-normality aspect of the data violates the normality assumption of most commonly used statistical model such as regression or ANOVA, etc.  A censored observation is defined as an observation with incomplete information.  There are four different types of censoring possible: right truncation, left truncation, right censoring and left censoring.  We will focus exclusively on right censoring for a number of reasons.  Most data used in analyses have only right censoring.  Furthermore, right censoring is the most easily understood of all the four types of censoring and if a researcher can understand the concept of right censoring thoroughly it becomes much easier to understand the other three types.  When an observation is right censored it means that the information is incomplete because the subject did not have an event during the time that the subject was part of the study.  The point of survival analysis is to follow subjects over time and observe at which point in time they experience the event of interest. It often happens that the study does not span enough time in order to observe the event for all the subjects in the study. This could be due to a number of reasons. Perhaps subjects drop out of the study for reasons unrelated to the study (i.e. patients moving to another area and leaving no forwarding address). The common feature of all of these examples is that if the subject had been able to stay in the study then it would have been possible to observe the time of the event eventually. 

It is important to understand the difference between calendar time and time in the study. It is very common for subjects to enter the study continuously throughout the length of the study. This situation is reflected in the first graph where we can see the staggered entry of four subjects. The subjects in red were censored and the subjects in blue experienced an event. It would appear that subject 3 dropped out after only a short time (hit by a bus, very tragic) and that subject 4 did not experience an event by the time the study ended but if the study had gone on longer (had more funding) we would have know the time when this subject would have experienced an event.  Thus, in calendar time both the entry and the exit time of the subjects are staggered and can occur at any time throughout the course of the study.  Study time as the term would imply deals only with the length of time that the subjects were a part of the study.  Thus, every subject start at study time zero and have ending points corresponding to the entire length of time that they participated in the study, in other words, until they experienced an event or were censored.

     

     

The other important concept in survival analysis is the hazard rate.  From looking at data with discrete time (time measured in large intervals such as month, years or even decades) we can get an intuitive idea of the hazard rate.  For discrete time the hazard rate is the probability that an individual will experience an event at time t while that individual is at risk for having an event. Thus, the hazard rate is really just the unobserved rate at which events occur.  If the hazard rate is constant over time and it was equal to 1.5, for example, this would mean that one would expect 1.5 events to occur in a time interval that is one unit long.  Furthermore, if a person had a hazard rate of 1.2 at time t and a second person had a hazard rate of 2.4 at time t then it would be correct to say that the second person's risk of an event would be two times greater at time t.  It is important to realize that the hazard rate is an un-observed variable yet it controls both the occurrence and the timing of the events.  It is the fundamental dependent variable in survival analysis. 

Another important aspect of the hazard function is to understand how the shape of the hazard function will influence the other variables of interest such as the survival function. The first graph below illustrates a hazard function with a 'bathtub shape'. This graph is depicting the hazard function for the survival of organ transplant patients. At time equal to zero they are having the transplant and since this is a very dangerous operation they have a very high hazard (a great chance of dying). The first 10 days after the operation are also very dangerous with a high chance of the patient dying but the danger is less than during the actual operation and hence the hazard is decrease during this period. If the patient has survived past day 10 then they are in very good shape and have a very little chance of dying in the following 6 months. After 6 months the patients begin to experience deterioration and the chances of dying increase again and therefore the hazard function starts to increase.  After one year almost all patients are dead and hence the very high hazard function which will continue to increase.

The hazard function may not seem like an exciting variable to model but other indicators of interest, such as the survival function, are derived from the hazard rate.  Once we have modeled the hazard rate we can easily obtain these other functions of interest.  To summarize, it is important to understand the concept of the hazard function and to understand the shape of the hazard function.

An example of a hazard function for heart transplant patients.

We are generally unable to generate the hazard function instead we usually look at the cumulative hazard curve.


     

The UIS data

The goal of the UIS  data, and of the smaller version called uis_small that we are using here, is to model time until return to drug use for patients enrolled in two different residential treatment programs that differed in length (treat=0 is the short program and treat=1 is the long program).  The patients were randomly assigned to two different sites (site=0 is site A and site=1 is site B).  The variable age indicates age at enrollment, herco indicates heroine or cocaine use in the past three months (herco=1 indicates heroine and cocaine use, herco=2 indicates either heroine or cocaine use and herco=3 indicates neither heroine nor cocaine use) and ndrugtx indicates the number of previous drug treatments.  The variables time contains the time until return to drug use and the censor variable indicates whether the subject returned to drug use (censor=1 indicates return to drug use and censor=0 otherwise).
 
Let's look at the first 10 observations of the UIS data set.  Note that subject 5 is censored and did not experience an event while in the study.  Also note that the coding for censor is rather counter-intuitive since the value 1 indicates an event and 0 indicates censoring.  It would perhaps be more appropriate to call this variable "event".

proc print data=uis_small (obs=10);
run;

Obs    ID    age    ndrugtx    treat    site    time    censor    herco
  1     1     39        1        1        0      188       1        3
  2     2     33        8        1        0       26       1        3
  3     3     33        3        1        0      207       1        2
  4     4     32        1        0        0      144       1        3
  5     5     24        5        1        0      551       0        2
  6     6     30        1        1        0       32       1        1
  7     7     39       34        1        0      459       1        3
  8     8     27        2        1        0       22       1        3
  9     9     40        3        1        0      210       1        2
 10    10     36        7        1        0      184       1        2

Exploring the data: Univariate Analyses

In any data analysis it is always a great idea to do some univariate analysis before proceeding to more complicated models. In survival analysis it is highly recommended to look at the Kaplan-Meier curves for all the categorical predictors. This will provide insight into the shape of the survival function for each group and give an idea of whether or not the groups are proportional (i.e. the survival functions are approximately parallel). We also consider the tests of equality across strata to explore whether or not to include the predictor in the final model. For the categorical variables we will use the log-rank test of equality across strata which is a non-parametric test.  For the continuous variables we will use a univariate Cox proportional hazard regression which is a semi-parametric model.  We will consider including the predictor if the test has a p-value of 0.2 - 0.25 or less.  We are using this elimination scheme because all the predictors in the data set are variables that could be relevant to the model.  If the predictor has a p-value greater than 0.25 in a univariate analysis it is highly unlikely that it will contribute anything to a model which includes other predictors.

The log-rank test of equality across strata has a p-value of 0.0091 for the predictor treat, thus treat will be included a potential candidate for the final model.  From the graph we see that the survival function for each group of treat are not perfectly parallel but that they are separate except at the very beginning and at the very end of the study time.  The overlap at the very end should not cause too much concern because it is determined by only a very few number of censored subjects out of a sample with 628 subjects.  In general, the log-rank test places more emphasis on the differences in the curves at larger time values. This is why we get such a small p-value even though the two survival curves appear to be very close together for time less than 100 days.

proc lifetest data=uis_small plots=(s);
  time time*censor(0);
  strata treat;
run;

<output omitted>

       Test of Equality over Strata

                                   Pr >
Test      Chi-Square      DF    Chi-Square
Log-Rank      6.7979       1      0.0091
Wilcoxon      9.4608       1      0.0021
-2Log(LR)     7.8267       1      0.0051

 

     

The log-rank test of equality across strata for the predictor site has a p-value of 0.1240, thus site will be included as a potential candidate for the final model because this p-value is still less than our cut-off of 0.2. From the graph we see that the survival curves are not really parallel and that there are two periods ( [0, 100] and [200, 300] ) where the curves are very close together.  This would explain the rather high p-value from the log-rank test.

proc lifetest data=uis_small plots=(s);
  time time*censor(0);
  strata site;
run;

<output omitted>

Test of Equality over Strata

                                   Pr >
Test      Chi-Square      DF    Chi-Square
Log-Rank      2.3658       1      0.1240
Wilcoxon      3.1073       1      0.0779
-2Log(LR)     2.0784       1      0.1494

     

The log-rank test of equality across strata for the predictor herco has a p-value of 0.1473, thus herco will be included as potential candidate for the final model. From the graph we see that the three groups are not parallel and that especially the groups herco=1 and herco=3 overlap for most of the graph.  This lack of parallelism could pose a problem when we include this predictor in the Cox proportional hazard model since one of the assumptions is proportionality of all the predictors.

proc lifetest data=uis_small plots=(s);
  time time*censor(0);
  strata herco;
run;

<output omitted>

Test      Chi-Square      DF    Chi-Square
Log-Rank      3.8300       2      0.1473
Wilcoxon      2.4629       2      0.2919
-2Log(LR)     4.4300       2      0.1092

 

     

It is not feasible to calculate a Kaplan-Meier curve for the continuous predictors since there would be a curve for each level of the predictor and a continuous predictor simply has too many different levels. Instead we consider the Cox proportional hazard model with a single continuous predictor. Unfortunately it is not possibly to produce a plot from proc phreg. Instead we consider the Chi-squared test for ndrugtx which has a p-value of less than 0.0001 and therefore ndrugtx is a potential candidate for the final model since the p-value is less than our cut-off value of 0.2.

proc phreg data=uis_small;
  model time*censor(0) = ndrugtx;
run;

<output omitted>

                            Analysis of Maximum Likelihood Estimates

                 Parameter     Standard                               Hazard
Variable   DF     Estimate        Error   Chi-Square   Pr > ChiSq      Ratio   Variable Label
ndrugtx     1      0.02937      0.00750      15.3470       <.0001      1.030   Number of Prior Drug Treatments

In univariate Chi-squared test of age the p-value is 0.0735 and therefore it is a potential candidate for the final model.

proc phreg data=uis_small;
  model time*censor(0) = age;
run;

<output omitted>

                            Analysis of Maximum Likelihood Estimates

               Parameter   Standard                          Hazard
Variable   DF   Estimate      Error Chi-Square Pr > ChiSq    Ratio    Variable Label
age         1   -0.01286    0.00719     3.2022     0.0735    0.987    Age at Enrollment

Model Building

For our model building, we will first consider the model which will include all the predictors that had a p-value of less than 0.2 - 0.25 in the univariate analyses, which in this particular analysis means that we will include every predictor in our model. The categorical predictor herco has three levels and therefore we will include this predictor using dummy variables with the group herco=1 as the reference group. Proc phreg is a very powerful procedure and it is one of the few procedures where it is possible to program data steps inside the procedure and so, we create the dummy variables inside the proc phreg.

In the model statement we have to specify which variable contains the information about time, which variable contains the information about censoring and which value of the censoring variable indicates that the observation is censored. In the UIS data set the variable time and censor contain the information about time and censoring respectively. The number in the parenthesis next to censor has to be the number which corresponds to a subject being censored.  In this model we therefore specify zero since the coding for censor is that censor = 0 indicates that the subject has been censored and censor = 1 indicates that the subject experienced an event.  We can test the dummy variables for herco collectively in the test statement.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site herco2 herco3;
  herco2 = (herco=2);
  herco3 = (herco=3);
  herco: test herco2, herco3;
run;

<output omitted>

The PHREG Procedure

               Analysis of Maximum Likelihood Estimates

                   Parameter      Standard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq
age          1      -0.02375       0.00756        9.8702        0.0017
ndrugtx      1       0.03475       0.00775       20.0824        <.0001
treat        1      -0.25402       0.09100        7.7910        0.0053
site         1      -0.17239       0.10210        2.8509        0.0913
herco2       1       0.24677       0.12276        4.0409        0.0444
herco3       1       0.12567       0.10307        1.4865        0.2228

    Linear Hypotheses Testing Results

                Wald
 Label    Chi-Square      DF    Pr > ChiSq
 herco        4.3607       2        0.1130

The predictor herco is clearly not significant and we will drop it from the final model. The predictor site is also not significant but from prior research we know that this is a very important variable to have in the final model and therefore we will not eliminate site from the model. So, the final model of main effects include: age, ndrugtx, treat and site.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx  treat site;
run;

<output omitted>
                     Analysis of Maximum Likelihood Estimates

                   Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio

age          1      -0.02213       0.00751        8.6807        0.0032       0.978
ndrugtx      1       0.03503       0.00767       20.8689        <.0001       1.036
treat        1      -0.24368       0.09054        7.2433        0.0071       0.784
site         1      -0.16833       0.10041        2.8103        0.0937       0.845

Interactions

Next we need to consider interactions. We do not have any prior knowledge of specific interactions that we must include so we will consider all the possible interactions. Since our model is rather small this is manageable but the ideal situation is when all model building, including finding interactions, is theory driven.  Note that we do not need to use a data step in order to create our interaction terms because we can create all the interactions inside the proc phreg.

The interaction ndrugtx*age is not significant and will not be included in the model.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site drugage;
  drugage = ndrugtx*age;
run;

<output omitted>
                     Analysis of Maximum Likelihood Estimates

                   Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio

age          1      -0.01102       0.01001        1.2121        0.2709       0.989
ndrugtx      1       0.10541       0.04195        6.3135        0.0120       1.111
treat        1      -0.23528       0.09064        6.7373        0.0094       0.790
site         1      -0.17462       0.10045        3.0219        0.0821       0.840
drugage      1      -0.00210       0.00125        2.8274        0.0927       0.998

The interaction ndrugtx*treat is not significant and will not be included in the model.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site drugtreat;
  drugtreat = ndrugtx*treat;
run;

<output omitted>

               Analysis of Maximum Likelihood Estimates

                    Parameter      Standard
Variable     DF      Estimate         Error    Chi-Square    Pr > ChiSq
age           1      -0.02202       0.00750        8.6113        0.0033
ndrugtx       1       0.04050       0.01106       13.3959        0.0003
treat         1      -0.19488       0.11667        2.7899        0.0949
site          1      -0.17084       0.10046        2.8919        0.0890
drugtreat     1      -0.00992       0.01494        0.4412        0.5066

The interaction ndrugtx*site is not significant and will not be included in the model.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site drugsite;
  drugsite = ndrugtx*site;
run;

<output omitted>
                     Analysis of Maximum Likelihood Estimates

                    Parameter      Standard                                  Hazard
Variable     DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio

age           1      -0.02202       0.00750        8.6113        0.0033       0.978
ndrugtx       1       0.04050       0.01106       13.3959        0.0003       1.041
treat         1      -0.19488       0.11667        2.7899        0.0949       0.823
site          1      -0.17084       0.10046        2.8919        0.0890       0.843
drugtreat     1      -0.00992       0.01494        0.4412        0.5066       0.990

The interaction age*treat is not significant and will not be included in the model.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site agetreat;
  agetreat = age*treat;
run;

<output omitted>
                     Analysis of Maximum Likelihood Estimates

                   Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio

age          1      -0.01146       0.01040        1.2149        0.2704       0.989
ndrugtx      1       0.03577       0.00772       21.4917        <.0001       1.036
treat        1       0.44833       0.48092        0.8691        0.3512       1.566
site         1      -0.14927       0.10108        2.1809        0.1397       0.861
agetreat     1      -0.02147       0.01466        2.1450        0.1430       0.979

The interaction age*site is significant and will be included in the model.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site agesite;
  agesite = age*site;
run;

<output omitted>
                     Analysis of Maximum Likelihood Estimates

                   Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio

age          1      -0.03369       0.00929       13.1512        0.0003       0.967
ndrugtx      1       0.03646       0.00770       22.4092        <.0001       1.037
treat        1      -0.26741       0.09123        8.5921        0.0034       0.765
site         1      -1.24593       0.50873        5.9979        0.0143       0.288
agesite      1       0.03377       0.01551        4.7423        0.0294       1.034

The interaction treat*site is not significant and will not be included in the model.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site treatsite;
  treatsite = treat*site;
run;

<output omitted>
                     Analysis of Maximum Likelihood Estimates

                    Parameter      Standard                                  Hazard
Variable     DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio

age           1      -0.02386       0.00764        9.7584        0.0018       0.976
ndrugtx       1       0.03615       0.00775       21.7849        <.0001       1.037
treat         1      -0.34041       0.10768        9.9934        0.0016       0.711
site          1      -0.32385       0.13942        5.3959        0.0202       0.723
treatsite     1       0.33351       0.20093        2.7550        0.0969       1.396

The final model including interaction.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site agesite;
  agesite = age*site;
run;

<output omitted>

                     Analysis of Maximum Likelihood Estimates

                   Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio
age          1      -0.03369       0.00929       13.1512        0.0003       0.967
ndrugtx      1       0.03646       0.00770       22.4092        <.0001       1.037
treat        1      -0.26741       0.09123        8.5921        0.0034       0.765
site         1      -1.24593       0.50873        5.9979        0.0143       0.288
agesite      1       0.03377       0.01551        4.7423        0.0294       1.034

From looking at the hazard ratios (also called relative risks) the model indicates that as the number of previous drug treatment (ndrugtx) increases by one unit, and all other variables are held constant, the rate of relapse increases by 3.7%. If the treatment length is altered from short to long, while holding all other variables constant, the rate of relapse decreases by (100% - 76.5%) = 23.5%. As treatment is moved from site A to site B and age is equal to zero, and all other variables are held constant, the rate of relapse decreases by (100% - 28.8%) = 71.2%. If age is increased by 5 years and subject is at site A (site=0) and all other variables are held constant the hazard ratio is equal to exp(-0.3369*5) = .18553718. Thus, the rate of relapse is decreased by (100% - 18.5%) = 81.5% with an increase of 5 years in age. If age is increased by 5 years and the subject is at site B, while holding all other variables constant, then the hazard ratio is equal to exp(-0.3369*5 + 0.03377*5) = .21966536.  Thus, the rate of relapse decreases by (100% - 21.97%) =  78.03% with an increase of 5 years of age for subjects at site B.

Proportionality Assumption

One of the main assumptions of the Cox proportional hazard model is proportionality. There are several methods for verifying that a model satisfies the assumption of proportionality and for more information on this topic please refer to our FAQ Tests of proportionality in SAS, STATA, SPLUS and R.  We will check proportionality by including time-dependent covariates in the model because in proc phreg it is very easy and convenient to include data step programming inside the procedure. Time dependent covariates are interactions of the predictors with time. In this analysis we choose to use the interactions with log(time) because this is the most common function of time used in time-dependent covariates but any function of time could be used. If a time-dependent covariate is significant this indicates a violation of the proportionality assumption for that specific predictor. We use a test statement to test all the time-dependent covariates together in one collective test. 

The conclusion is that all of the time-dependent variables are not significant either collectively or individually thus supporting the assumption of proportional hazard.  Our faith in these results are bolstered by the Kaplan-Meier curves we created during our univariate analyses.  The curves for all the variables in the model were indeed separate and approximately parallel.  Looking at the Kaplan-Meier curves is not enough to be certain of proportionality since they are univariate analysis and do not show whether a predictor will still be proportional when included in a model with many other predictors but they support our argument for proportionality. 

proc phreg data=uis_small;
model time*censor(0) = age ndrugtx treat site agesite aget drugt treatt sitet;
  agesite = age*site;
  aget = age*log(time);
  drugt = ndrugtx*log(time);
  treatt = treat*log(time);
  sitet = site*log(time);
  test_proportionality: test aget, drugt, treatt, sitet;
run;

<output omitted>
                     Analysis of Maximum Likelihood Estimates

                   Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio

age          1      -0.03228       0.03408        0.8968        0.3436       0.968
ndrugtx      1       0.01738       0.03216        0.2920        0.5889       1.018
treat        1      -0.66710       0.41149        2.6282        0.1050       0.513
site         1      -1.63720       0.68019        5.7936        0.0161       0.195
agesite      1       0.03372       0.01555        4.7044        0.0301       1.034
aget         1    -0.0004057       0.00712        0.0032        0.9546       1.000
drugt        1       0.00428       0.00696        0.3784        0.5385       1.004
treatt       1       0.08605       0.08632        0.9937        0.3188       1.090
sitet        1       0.08435       0.09744        0.7493        0.3867       1.088

            Linear Hypotheses Testing Results

                               Wald
 Label                   Chi-Square      DF    Pr > ChiSq

 test_proportionality        2.0264       4        0.7309

The tests of all the time-dependent variables were not significant either individually or collectively, so we do not have enough evidence to reject proportionality and will assume that we have satisfied the assumption of proportionality for this model.

If one of the predictors was not proportional there are various solutions to consider. We can change from using a semi-parametric Cox regression model to using a parametric regression model. Another solution is to include the time-dependent variable for the non-proportional predictors. Finally, we can use a model where we stratify on the non-proportional predictors. The only change to the model is the addition of the strata statement. The assumption is that we are fitting separate models for each level of treat under the constraint that that the coefficients are equal but the baseline hazard functions are not equal. The following is an example of stratification on the predictor treat. Note that treat is no longer included in the model statement; instead, it is specified in the strata statement.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx site agesite; 
  agesite = age*site;
  strata treat;
run; 

<output omitted>

         Summary of the Number of Event and Censored Values
                                                            Percent
Stratum    treat          Total       Event    Censored    Censored

      1    0                310         257          53       17.10
      2    1                300         238          62       20.67
-------------------------------------------------------------------
  Total                     610         495         115       18.85

                     Analysis of Maximum Likelihood Estimates

                   Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio
age          1      -0.03475       0.00929       13.9940        0.0002       0.966
ndrugtx      1       0.03638       0.00770       22.3401        <.0001       1.037
site         1      -1.25130       0.50855        6.0541        0.0139       0.286
agesite      1       0.03399       0.01551        4.8041        0.0284       1.035

The parameter estimates are almost exactly the same as the parameter estimates in the model where treat was included as a proportional predictor. This leads us to believe that treat actually is proportional and that we do not need to stratify on treat. If treat truly violated the assumption of proportionality then we would expect the estimates of the stratified model to differ from the non-stratified model.

proc phreg data=uis_small;
  model time*censor(0) = age ndrugtx treat site agesite; 
  agesite = age*site;
run; 

<output omitted>

                     Analysis of Maximum Likelihood Estimates

                   Parameter      Standard                                  Hazard
Variable    DF      Estimate         Error    Chi-Square    Pr > ChiSq       Ratio
age          1      -0.03369       0.00929       13.1512        0.0003       0.967
ndrugtx      1       0.03646       0.00770       22.4092        <.0001       1.037
treat        1      -0.26741       0.09123        8.5921        0.0034       0.765
site         1      -1.24593       0.50873        5.9979        0.0143       0.288
agesite      1       0.03377       0.01551        4.7423        0.0294       1.034

Graphing Survival Functions from Proc phreg

It is useful to look at the survival function but unfortunately it is not possibly to obtain a graph through proc phreg. Instead we output a data set which includes the survival function using the baseline statement with an out option and then we will be able to produce a survival function for specific covariate patterns. Each covariate pattern will have a different survival function.  The default survival function is for the covariate pattern where each predictor is set equal to its mean. However, for many predictors the mean value is not meaningful. Consider the predictor site where the value 0 indicates site A and the value 1 indicates site B. The mean value for site is 0.292. What would it mean for a person to have the value 0.292 for site?  We are not interested in looking at the survival function for a subject with site = 0.292. It would be much more useful to specify a covariate pattern of interest and generate a survival function for subjects with this specific covariate pattern.

In the following example we want to graph the survival function for a subject who is 30 years old (age=30), has had 5 prior drug treatments (ndrugtx=5), and is currently getting the long treatment (treat=0) at site A (site=0 and agesite=30*0 = 0). We first create a covariate data set which must include all the covariates listed as predictor in the model statement of the proc phreg. The survival option indicates that we want to obtain the survival function and the covariates option indicates for which covariate pattern we want to generate the survival function.

data cov_treat1;
  age = 30;
  ndrugtx = 5;
  treat = 1;
  site = 0;
  agesite = 0;
run;
proc phreg data=uis_small noprint;
  model time*censor(0) = age ndrugtx treat site agesite; 
  agesite = age*site;
  baseline out=surv1 covariates=cov_treat1 survival=surv / nomean;
run;
goptions reset=all;
symbol1 c=red v=triangle h=.8 i=stepjll;
symbol2 c=blue v=circle h=.8 i=stepjll;
axis1 label=(a=90 'Survivorship function');
proc gplot data=surv1;
 plot surv*time / vaxis=axis1;
run;
quit;
     

Looking at the survival function for one covariate pattern is sometimes not sufficient. It is often very useful to have a graph where we can compare the survival functions of different groups. In the following example we generate a graph with the survival functions for the two treatment groups where all the subjects are 30 years old (age=30), have had 5 prior drug treatments (ndrugtx=5) and are currently being treated at site A (site=0 and agesite=30*0=0). Thus, the two covariate patterns differ only in their values for treat.

data cov_treat0;
  age = 30;
  ndrugtx = 5;
  treat = 0;
  site = 0;
  agesite = 0;
run;
proc phreg data=uis_small noprint;
  model time*censor(0) = age ndrugtx treat site agesite; 
  agesite = age*site;
  baseline out=surv0 covariates=cov_treat0 survival=surv / nomean;
run;
data combo;
  set surv1 surv0;
run;
goptions reset=all;
symbol1 c=red v=triangle h=.8 i=stepjll;
symbol2 c=blue v=circle h=.8 i=stepjll;
axis1 label=(a=90 'Survivorship function');
proc gplot data=combo;
 plot surv*time=treat / vaxis=axis1;
run;
quit;
     

Another short coming of the graphic output in SAS is that the survival function that is obtained through the baseline statement does not include the last censored observation. Both of the preceding graphs have survival functions for time < 700. But in fact as the following proc means shows we have subjects who have survived until time = 1172 when treat=1 and subjects who survived until time =805 when treat=0.  We also know this from looking at the Kaplan-Meier curves in the univariate analysis section.

proc sort data=uis_small out=sorted;
  by treat;
run;
proc means data=sorted max;
  by treat;
  var time;
run;

treat=0

The MEANS Procedure

Analysis Variable : time

     Maximum
------------
 805.0000000
------------

treat=1

Analysis Variable : time

     Maximum
------------
     1172.00
------------

Since this last observation at time = 1172 is censored the value of the survival function for this observation will be equal to the value of the survival function for the time just prior (time=659).

proc print data=combo ;
  where time > 600;
run;

Obs    age    ndrugtx    treat    site    agesite    time      surv
275     30       5         1        0        0        659    0.15060
550     30       5         0        0        0        659    0.08429

To make the graph include all the observations, even the last censored observation, all we have to do is include two extra data points, one for each treatment group, where time is equal to the maximum value of time (obtained from the proc means) and the survival function is equal to last survival function value generated by the baseline output (obtained from the proc print).

data combo1;
  set combo;
  if _n_ = 1 then do;
  treat=0;
  time = 805;
  surv = 0.08429;
  treat = 0; 
  output;
  treat=1;
  time = 1172;
  surv = 0.15060;
  output;
  end;
  output;
run;

We verify that the data step accomplished what we set out to do.

proc print data=combo1 ;
  where time > 600;
run;

Obs    treat    time      surv     age    ndrugtx    site    agesite
  1      0       805    0.08429      .       .         .        .
  2      1      1172    0.15060      .       .         .        .
277      1       659    0.15060     30       5         0        0
552      0       659    0.08429     30       5         0        0

We need to sort on the variable that will be the on the x-axis of our graph. In this case the variable is time.

proc sort data=combo1;
  by time;
run;
goptions reset=all;
symbol1 c=red v=triangle h=.8 i=stepjll;
symbol2 c=blue v=circle h=.8 i=stepjll;
axis1 label=(a=90 'Survivorship function');
proc gplot data=combo;
 plot surv*time=treat / vaxis=axis1;
run;
quit;