UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

S-Plus Textbook Examples
Applied Survival Analysis

Chapter 2: Descriptive Methods for Survival Data

Sections noted by R indicate differences between R and SPLUS.

SPLUS program
R program

Delimited data sets to be used in R:
hmohiv.csv
mini.txt
minitest.txt

To input the data sets in R use the following code which must be executed before running the R program:

>hmohiv <- read.csv("d:/hmohiv.csv", header=T)
>mini <- read.table("d:/mini.txt", header=T)
>minitest <- read.table("d:/minitest.txt", header=T)

Here are the same data sets for SPLUS: hmohiv, mini and minitest.

We will start by using the mini dataset.

Table 2.2, p. 32.
Note: We create the object "mini.surv" which includes the estimate of the survival function at each time point.  We use the conf.type="none" argument to specify that we do not want to include any confidence intervals for the survival function.

mini.surv <- survfit(Surv(time, censor)~ 1, mini, conf.type="none")
summary(mini.surv)

time n.risk n.event survival std.err
   3      5      1       0.8   0.179
   5      4      1       0.6   0.219
   8      2      1       0.3   0.239
  22      1      1       0.0      NA

Fig. 2.1, p. 32.
Note: The xlab="Time" argument creates the label for the x-axis; likewise the ylab="Survival Probability" argument creates the label for the y-axis.  The lty argument specifies which line type is going to used in the graph.

plot(mini.surv, lty = 0, xlab="Time", ylab="Survival Probability")

Point and Click:

STATISTICS
    SURVIVAL
        NONPARAMETRIC SURVIVAL...

Click on the Create Formula button.  Select time in the Choose Variable menu by clicking on time and then click on the Time 1 button.  Next, select censor in the Choose Variable and click on the Censor Codes button.  Click on Add Response button and then on the OK button.  This will bring you back to the original menu.  Next, click on the Plot tab at the top of the menu.  This brings up a new menu where you can unselect the Show Confidence Intervals argument.   Next, click on the Results tab at the top which is located next to the Plot.  Select Long Output.  Now click the OK button at the bottom.  This will create both the table and the graph in one shot.  

Movie for table 2.2 and fig. 2.1 using point and click.

Table 2.3, p. 35.
Note: SPLUS does not include the first and the last intervals ( [0,1) and [60, 60] ) in the output.

hmohiv.surv <- survfit( Surv(time, censor)~ 1, hmohiv, conf.type="none")
summary(hmohiv.surv)

time n.risk n.event survival std.err
   1    100      15   0.8500  0.0357
   2     83       5   0.7988  0.0402
   3     73      10   0.6894  0.0473
   4     61       4   0.6442  0.0493
   5     56       7   0.5636  0.0517

   ...
  58      3       1   0.0389  0.0253

Fig. 2.2, p. 34.

plot (hmohiv.surv, lty=0, xlab="Time", ylab="Survival Probability" )

 

Point and Click:

STATISTICS
    SURVIVAL
        NONPARAMETRIC SURVIVAL...

Click on the Create Formula button.  Select time in the Choose Variable menu by clicking on time and then click on the Time 1 button.  Next, select censor in the Choose Variable and click on the Censor Codes button.  Click on Add Response button and then on the OK button.  This will bring you back to the original menu.  Next, click on the Plot tab at the top of the menu.  This brings up a new menu where you can unselect the Show Confidence Intervals argument.   Next, click on the Results tab at the top which is located next to the Plot.  Select Long Output.  Now click the OK button at the bottom.  This will create both the table and the graph in one shot.  

Movie for table 2.3 and fig. 2.2 using point and click.

Create timecat, the categorical variable of time in the hmohiv dataset.
Note: In order to include the lower end point in the interval you need to specify a cut off point that is slightly lower. Also the lower endpoint of the whole data set needs to be smaller than the smallest number in the whole dataset. Likewise, the greatest number has to be bigger than the largest number in the whole data set. The object timecat1 is a categorical variable with categories 1,2,...,11 but we would like the categories to represent months so we must multiply them by 6.  So, we create timecat which is equal to timecat1 times 6 and then timecat now has categories are 6,12, ..., 66.

hmohiv$timecat1 <- cut(hmohiv$time, c(-.9, 5.9, 11.9, 17.9, 23.9, 29.9, 35.9, 41.9, 47.9,  
                       53.9, 59.9, 66.9) )

hmohiv$timecat <- hmohiv$timecat1*6

Table 2.4, p. 38

timecat.surv <- survfit( Surv(timecat, censor)~ 1, hmohiv, conf.type="none")
summary(
timecat.surv)

time n.risk n.event survival std.err
   6    100      41   0.5900  0.0492
  12     49      21   0.3371  0.0503
  18     25       6   0.2562  0.0479
  24     17       1   0.2412  0.0474
  36     14       5   0.1550  0.0434
  42      9       1   0.1378  0.0418
  48      8       1   0.1206  0.0400
  54      7       1   0.1034  0.0378
  60      6       3   0.0517  0.0283

Fig. 2.3, p. 39

plot( timecat.surv , xlab="Survival Time (Months)", ylab="Proportion Surviving" )

Movie for table 2.4 and fig. 2.3 using point and click.

Fig. 2.5, p. 46
Note1: Only the point wise confidence bands using the log-log survivorship function were included in the graph.
Note2: The legend function is a feature that can be added to any existing graph.  The first two arguments in the legend function are the x and y coordinates of the upper left hand corner of the legend box.  The following argument is the vector of names for the symbols shown in the legend.  The final argument is the line type to be displayed in the legend. 

time.conf <- survfit( Surv(time, censor)~ 1, hmohiv, conf.type="log-log")
plot(time.conf, lty=1:2, xlab="Survival Time (Months)", ylab="Survival Probability" )
legend(20, 1.0, c("Kaplan-Meier Curve", "Point-wise 95% Confidence Interval"), lty=1:2 )

Movie for fig. 2.5, table 2.6 and the mean of the survival function (p. 57) using point and click.

Table 2.5, p. 50
Computing the quartiles.

R-In R these functions are not available.

kfit <-kaplanMeier(censor(time,censor)~1, data=hmohiv)
qkaplanMeier(kfit, c(.25, .5, .75) )

     [,1]
0.25   3
0.5    7
0.75  15

Table 2.6, p. 52

timeconf.surv <- survfit( Surv(time, censor)~ 1, hmohiv, conf.type="log-log")
summary(timeconf.surv)

time n.risk n.event survival std.err lower 95% CI upper 95% CI
    4     61       4   0.6442  0.0493      0.53868        0.731
    5     56       7   0.5636  0.0517      0.45636        0.658
    6     49       2   0.5406  0.0521      0.43343        0.636
    7     46       6   0.4701  0.0526      0.36440        0.569
    8     39       4   0.4219  0.0525      0.31832        0.522
    9     35       3   0.3857  0.0520      0.28454        0.486


The mean of the survivorship function, p. 57

timeconf.surv

  n events mean se(mean) median 0.95LCL 0.95UCL
100     80 14.7     1.97      7       5       9

Fig. 2.7, p. 58

timestrata.surv <- survfit( Surv(time, censor)~ strata(drug), hmohiv, conf.type="log-log")
plot(timestrata.surv, lty=c(1,3), xlab="Time", ylab="Survival Probability")
legend(40, 1.0, c("Drug - No", "Drug - Yes") , lty=c(1,3) )

Using data set minitest in table 2.8, p. 63.

Table 2.10, 64.
Note: When rho=0 then the test is the log-rank test. However, when rho=1 then the weight is similar to the weights used in the the Peto and Prentice test. Similarly, when rho=.5 the tests appears to be close to the Tarone-Ware test.

survdiff(Surv(time, censor) ~ drug, data=minitest, rho=0)

       N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 5        4     5.38     0.353      1.36
drug=1 5        4     2.62     0.724      1.36

Chisq= 1.4 on 1 degrees of freedom, p= 0.243

survdiff(Surv(time, censor) ~ drug, data=minitest, rho=.5)

       N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 5     2.69     3.77      1.36      1.11
drug=1 5    
3.35     2.27     0.511      1.11

Chisq= 1.1 on 1 degrees of freedom, p= 0.291

       N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 5     2.02      2.9     0.267     0.927
drug=1 5     2.88
     2.0     0.387     0.927

Chisq= 0.9 on 1 degrees of freedom, p= 0.336

Table 2.11, p. 65
Testing for differences between drug group using the G-rho family of tests.
Note: When rho=0 then the test is the log-rank test. However, when rho=1 then the weight is similar to the weights used in the the Peto and Prentice test. Similarly, when rho=.5 the tests appears to be close to the Tarone-Ware test.

survdiff(Surv(time, censor) ~ drug, data=hmohiv, rho=0)

        N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 51       42     54.9      3.02      11.9
drug=1 49       38     25.1      6.60      11.9

Chisq= 11.9 on 1 degrees of freedom, p= 0.000575

survdiff(Surv(time, censor) ~ drug, data=hmohiv, rho=.5)

        N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 51     27.4     38.3      3.11      12.6
drug=1 49     32.0     21.1      5.65      12.6

Chisq=  

survdiff(Surv(time, censor) ~ drug, data=hmohiv, rho=1)

        N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 51     20.0     29.3      2.94      11.7
drug=1 49     27.7     18.4      4.69      11.7

Chisq= 11.7 on 1 degrees of freedom, p= 0.000622

Creating the categorical age variable, agecat.

hmohiv$agecat <- cut(hmohiv$age, c(19.9, 29, 34, 39, 54.1) )

Table 2.12, p. 65.

age.surv <- survfit( Surv(time, censor)~ strata(agecat), hmohiv, conf.type="log-log")
print(age.surv)

                                       n events   mean se(mean) median
strata(agecat)=agecat=19.9+ thru 29.0 12      8  34.12    7.35      43
strata(agecat)=agecat=29.0+ thru 34.0 34     29  13.83    2.49       9
strata(agecat)=agecat=34.0+ thru 39.0 25     20  12.84    3.70       7
strata(agecat)=agecat=39.0+ thru 54.1 29     23   6.25    1.82       4

                                      0.95LCL 0.95UCL
strata(agecat)=agecat=19.9+ thru 29.0       5      NA
strata(agecat)=agecat=29.0+ thru 34.0       6      12
strata(agecat)=agecat=34.0+ thru 39.0       3       9
strata(agecat)=agecat=39.0+ thru 54.1       2       5

 

Fig. 2.8, p. 69

plot(age.surv, lty=c(6, 1, 4, 3), xlab="Time", ylab="Survival Probability")
legend(40, 1.0, c("Group 1", "Group 2", "Group 3", "Group 4") , lty=c(6, 1, 4, 3) )

Movie for table 2.12 and fig. 2.8 using point and click.

Table 2.14, p. 70

survdiff(Surv(time, censor) ~ agecat, data=hmohiv, rho=0)

                        N Observed Expected (O-E)^2/E (O-E)^2/V
agecat=19.9+ thru 29.0 12        8     19.9   7.10608   12.4419
agecat=29.0+ thru 34.0 34       29     29.4   0.00641    0.0117
agecat=34.0+ thru 39.0 25       20     17.8   0.26894    0.3834
agecat=39.0+ thru 54.1 29       23     12.9   7.98170   11.1799

Chisq= 19.9 on 3 degrees of freedom, p= 0.000178

survdiff(Surv(time, censor) ~ agecat, data=hmohiv, rho=.5)

                        N Observed Expected (O-E)^2/E (O-E)^2/V
agecat=19.9+ thru 29.0 12     4.53     12.0     4.671     9.584
agecat=29.0+ thru 34.0 34    20.15     22.6     0.263     0.597
agecat=34.0+ thru 39.0 25    15.32     13.9     0.147     0.263
agecat=39.0+ thru 54.1 29    19.44     10.9     6.610    10.985

Chisq= 17.7 on 3 degrees of freedom, p= 0.000514

survdiff(Surv(time, censor) ~ agecat, data=hmohiv, rho=1)

                        N Observed Expected (O-E)^2/E (O-E)^2/V
agecat=19.9+ thru 29.0 12     3.05     8.37    3.3859     7.225
agecat=29.0+ thru 34.0 34    15.11    18.21    0.5267     1.320
agecat=34.0+ thru 39.0 25    12.48    11.48    0.0872     0.172
agecat=39.0+ thru 54.1 29    17.06     9.64    5.7134    10.355

Chisq= 15.4 on 3 degrees of freedom, p= 0.00154

 

Fig. 2.9 and table 2.16 was not reproduced.
 

 


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California