Supplemental notes to Applied Survival Analysis
Applied Survival Analysis
Supplementary notes
Tests of Proportionality in SAS, Stata and R
When modeling a Cox proportional hazard model a key assumption is proportional hazards. There are a number of basic concepts for testing proportionality but the implementation of these concepts differ across statistical packages. The goal of this page is to illustrate how to test for proportionality in Stata, SAS and R using an example from Applied Survival Analysis by Hosmer and Lemeshow. Data set for SAS, Stata and R (the R file is a csv file which can be read in using the read.csv function).
1. Kaplan-Meier Curves
Works best for time fixed covariates with few levels. If the predictor satisfy the proportional hazard assumption then the graph of the survival function versus the survival time should results in a graph with parallel curves, similarly the graph of the log(-log(survival)) versus log of survival time graph should result in parallel lines if the predictor is proportional. This method does not work well for continuous predictor or categorical predictors that have many levels because the graph becomes to "cluttered". Furthermore, the curves are sparse when there are fewer time points and it may be difficult to gage how close to parallel is close enough. Due to space limitations we will only show the graphs for the predictor treat.
SAS
It is very easy to create the graphs in SAS using proc lifetest.
The plot option in the model statement lets you specify both the survival
function versus time as well as the log(-log(survival) versus log(time).
proc lifetest data=uis plot=(s, lls) noprint; time time*censor(0); strata treat; run;
Stata
The sts graph command in Stata will generate the survival function
versus time graph.
sts graph, by(treat)
R
The plot function applied to a survfit object will generate a graph of the survival
function versus the survival time.
plot(survfit(formula = Surv(time, censor)~ treat, data = uis, conf.type="none"),
lty=0, xlab="Time", ylab="Survival Probability" )
2. Including Time Dependent Covariates in the Cox Model
Generate the time dependent covariates by creating interactions of the predictors and a function of survival time and include in the model. If any of the time dependent covariates are significant then those predictors are not proportional.
SAS
In SAS it is possible to create all the time dependent variable inside proc phreg
as demonstrated. Furthermore, by using the test statement is is
possibly to test all the time dependent covariates all at once.
proc phreg data=uis;
model time*censor(0) = age race treat site agesite aget racet treatt sitet;
aget = age*log(time);
racet = race*log(time);
treatt = treat*log(time);
sitet = site*log(time);
proportionality_test: test aget, racet, treatt, sitet;
run;
<output omitted>
Analysis of Maximum Likelihood Estimates
Parameter Standard
Variable DF Estimate Error Chi-Square Pr > ChiSq
age 1 -0.02932 0.03336 0.7729 0.3793
race 1 -0.99201 0.53252 3.4702 0.0625
treat 1 -0.52086 0.40742 1.6344 0.2011
site 1 -1.48805 0.67291 4.8901 0.0270
agesite 1 0.02889 0.01547 3.4881 0.0618
aget 1 0.00124 0.00693 0.0319 0.8583
racet 1 0.14554 0.10951 1.7663 0.1838
treatt 1 0.05876 0.08536 0.4739 0.4912
sitet 1 0.07357 0.09701 0.5752 0.4482
Linear Hypotheses Testing Results
Wald
Label Chi-Square DF Pr > ChiSq
proportionality_test 3.8875 4 0.4214
Stata
We use the tvc and the texp option in the stcox command.
We list the predictors that we would like to include as interaction with
log(time) in the tvc option (tvc = time varying covariates).
The texp option is where we can specify the function of time that we
would like used in the time dependent covariates. By using the lrtest commands
it is possible to tests all the time dependent covariates together by comparing
the smaller model without any time dependent covariates to the larger model that
includes all the time dependent covariates.
stcox age race treat site c.age#site , nohr nolog noshow tvc(age race treat site) ///
texp( ln(_t) )
Cox regression -- Breslow method for ties
No. of subjects = 617 Number of obs = 617
No. of failures = 500
Time at risk = 145294
LR chi2(9) = 27.48
Log likelihood = -2890.3829 Prob > chi2 = 0.0012
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
main |
age | -.029325 .033356 -0.88 0.379 -.0947016 .0360516
race | -.9920153 .5325233 -1.86 0.062 -2.035742 .0517113
treat | -.5208583 .4074197 -1.28 0.201 -1.319386 .2776696
site | -1.488051 .6729145 -2.21 0.027 -2.80694 -.1691632
|
site#c.age |
1 | .0288933 .0154704 1.87 0.062 -.0014281 .0592146
-------------+----------------------------------------------------------------
tvc |
age | .0012374 .0069302 0.18 0.858 -.0123454 .0148202
race | .1455428 .1095113 1.33 0.184 -.0690954 .3601809
treat | .0587615 .085358 0.69 0.491 -.1085371 .2260601
site | .0735743 .0970118 0.76 0.448 -.1165654 .2637139
------------------------------------------------------------------------------
Note: variables in tvc equation interacted with ln(_t)
lrtest, saving(0)
quietly stcox age race treat site c.age#site
lrtest, using(0)
Cox: likelihood-ratio test chi2(4) = 2.76
Prob > chi2 = 0.5988
3. Tests and Graps Based on the Schoenfeld Residuals
Testing the time dependent covariates is equivalent to testing for a non-zero
slope in a generalized linear regression of the scaled Schoenfeld residuals on
functions of time. A non-zero slope is an indication of a violation of the proportional hazard
assumption. As with any regression it is highly recommended that you look at the
graph of the regression in addition to performing the tests of non-zero slopes.
There are certain types on non-proportionality that will not be detected by the
tests of non-zero slopes alone but that might become obvious when looking at the
graphs of the residuals such as nonlinear relationship (i.e. a quadratic fit)
between the residuals and the function of time or
undue influence of outliers.
R
First we create the coxph object by using the coxph function. To
create the plots of the Schoenfeld residuals versus log(time) create a cox.zph
object by applying the cox.zph function to the cox.ph object. Then
the plot function will automatically create the Schoenfeld residual plots
for each of the predictors in the model including a lowess smoothing curve.
The order of the residuals in the time.dep.zph object corresponds to the order
in which they were entered in the coxph model. To plot one graph at a time
use the bracket notation with the number corresponding to the predictor of
interest. The abline function adds a reference line at y=0 to the
individual plots.
time.dep <- coxph( Surv(time, censor)~age+race+treat+ site+age:site,
uis, method="breslow", na.action=na.exclude)
time.dep.zph <- cox.zph(table5.11, transform = 'log')
summary(time.dep.zph)
plot(time.dep.zph)
<plots have been omitted>
rho chisq p
age 0.0245 0.283 0.595
race 0.0601 1.851 0.174
treat 0.0346 0.597 0.440
site 0.0355 0.587 0.444
age:site -0.0289 0.385 0.535
GLOBAL NA 3.069 0.689
#plots for the predictors: age and treat including the reference line at y=0 plot(time.dep.zph[1]) abline(h=0, lty=3)
plot(time.dep.zph[3]) abline(h=0, lty=3)
Stata
The tests of the non-zero slope developed by Therneau and Grambsch have been implemented in
Stata in the estat phtest command. The algorithms that Stata uses are
slightly different from the algorithms used by Therneau and Grambsch and therefore the results
may differ slightly. The estat phtest with the
detail option will perform
the tests of each predictor as well as a global test. There are different
functions of time available including the identity function, the log of survival
time and the rank of the survival times. The stphtest
command with the plot option will provide the graphs with a lowess
curve. The usual graphing options can be used to include a horizontal
reference line at y=0. The Stata graphs do not include 95% confidence intervals
for the lowess curves which makes
it more difficult to assess how much the curves may deviate from the y=0 line.
stcox age race treat site c.age#site, nolog noshow schoenfeld(sch*) scaledsch(sca*)
estat phtest, log detail
Test of proportional-hazards assumption
Time: Log(t)
----------------------------------------------------------------
| rho chi2 df Prob>chi2
------------+---------------------------------------------------
age | 0.02448 0.28 1 0.5946
race | 0.06006 1.85 1 0.1737
treat | 0.03464 0.60 1 0.4398
site | 0.03551 0.59 1 0.4437
0b.site#co~e| . . 1 .
1.site#c.age| -0.02890 0.38 1 0.5351
------------+---------------------------------------------------
global test | 3.07 5 0.6894
----------------------------------------------------------------
estat phtest, log plot(age) yline(0)
estat phtest, log plot(treat) yline(0)

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.