|
|
|
||||
|
|
|||||
S-Plus Textbook Examples
Applied Survival Analysis
Chapter 2: Descriptive
Methods for Survival Data
Sections noted by R indicate differences between R and SPLUS.
SPLUS program
R program
Delimited data sets to be used in R:
hmohiv.csv
mini.txt
minitest.txt
To input the data sets in R use the following code which must be executed before running the R program:
>hmohiv <- read.csv("d:/hmohiv.csv", header=T)
>mini <- read.table("d:/mini.txt", header=T)
>minitest <- read.table("d:/minitest.txt", header=T)
Here are the same data sets for SPLUS: hmohiv, mini and minitest.
We will start by using the mini dataset.
Table 2.2, p. 32.
Note: We create the object "mini.surv" which includes the estimate of the survival function at each time point. We use the conf.type="none" argument to specify that we do not want to include any confidence intervals for the survival function.
mini.surv
<- survfit(Surv(time, censor)~ 1, mini,
conf.type="none")
summary(mini.surv)
time n.risk n.event survival std.err
3
5 1
0.8 0.179
5
4 1
0.6 0.219
8
2 1
0.3 0.239
22
1 1
0.0 NA
Fig. 2.1, p. 32.
Note: The xlab="Time" argument creates the label for the x-axis; likewise the ylab="Survival Probability" argument creates the label for the y-axis. The lty argument specifies which line type is going to used in the graph.
plot(mini.surv, lty = 0, xlab="Time", ylab="Survival Probability")
Point and Click:
STATISTICS
SURVIVAL
NONPARAMETRIC SURVIVAL...
Click on the Create Formula button. Select time in the Choose Variable menu by clicking on time and then click on the Time 1 button. Next, select censor in the Choose Variable and click on the Censor Codes button. Click on Add Response button and then on the OK button. This will bring you back to the original menu. Next, click on the Plot tab at the top of the menu. This brings up a new menu where you can unselect the Show Confidence Intervals argument. Next, click on the Results tab at the top which is located next to the Plot. Select Long Output. Now click the OK button at the bottom. This will create both the table and the graph in one shot.
Movie for table 2.2 and fig. 2.1 using point and click.
Table 2.3, p. 35.
Note: SPLUS does not include the first and the last intervals ( [0,1) and [60, 60] ) in the output.
hmohiv.surv <- survfit( Surv(time, censor)~ 1, hmohiv, conf.type="none")
summary(hmohiv.surv)
time n.risk n.event survival std.err
1
100 15
0.8500 0.0357
2
83 5
0.7988 0.0402
3
73 10
0.6894 0.0473
4
61 4
0.6442 0.0493
5
56 7
0.5636 0.0517
...
58 3
1
0.0389 0.0253
Fig. 2.2, p. 34.
plot (hmohiv.surv, lty=0, xlab="Time", ylab="Survival Probability" )
Point and Click:
STATISTICS
SURVIVAL
NONPARAMETRIC SURVIVAL...
Click on the Create Formula button. Select time in the Choose Variable menu by clicking on time and then click on the Time 1 button. Next, select censor in the Choose Variable and click on the Censor Codes button. Click on Add Response button and then on the OK button. This will bring you back to the original menu. Next, click on the Plot tab at the top of the menu. This brings up a new menu where you can unselect the Show Confidence Intervals argument. Next, click on the Results tab at the top which is located next to the Plot. Select Long Output. Now click the OK button at the bottom. This will create both the table and the graph in one shot.
Movie for table 2.3 and fig. 2.2 using point and click.
Create timecat, the categorical variable of time in the hmohiv dataset.
Note: In order to include the lower end point in the interval you need to specify a cut off point that is slightly lower. Also the lower endpoint of the whole data set needs to be smaller than the smallest number in the whole dataset. Likewise, the greatest number has to be bigger than the largest number in the whole data set. The object timecat1 is a categorical variable with categories 1,2,...,11 but we would like the categories to represent months so we must multiply them by 6. So, we create timecat which is equal to timecat1 times 6 and then timecat now has categories are 6,12, ..., 66.
hmohiv$timecat1 <- cut(hmohiv$time, c(-.9,
5.9, 11.9,
17.9, 23.9,
29.9, 35.9,
41.9, 47.9,
53.9, 59.9,
66.9) )
hmohiv$timecat <- hmohiv$timecat1*6
Table 2.4, p. 38
timecat.surv
<- survfit( Surv(timecat, censor)~ 1, hmohiv,
conf.type="none")
summary(timecat.surv)
time n.risk n.event survival std.err
6
100 41
0.5900 0.0492
12
49 21
0.3371 0.0503
18
25 6
0.2562 0.0479
24
17 1
0.2412 0.0474
36
14 5
0.1550 0.0434
42
9 1
0.1378 0.0418
48
8 1
0.1206 0.0400
54
7 1
0.1034 0.0378
60
6 3
0.0517 0.0283
Fig. 2.3, p. 39
plot( timecat.surv , xlab="Survival Time (Months)", ylab="Proportion Surviving" )
Movie for table 2.4 and fig. 2.3 using point and click.
Fig. 2.5, p. 46
Note1: Only the point wise confidence bands using the log-log
survivorship function were included in the graph.
Note2: The legend function is a feature that can be added
to any existing graph. The first two arguments in the legend function
are the x and y coordinates of the upper left hand corner of the legend box.
The following argument is the vector of names for the symbols shown in the
legend. The final argument is the line type to be displayed in the
legend.
time.conf <- survfit( Surv(time, censor)~ 1, hmohiv, conf.type="log-log")
plot(time.conf, lty=1:2, xlab="Survival Time
(Months)", ylab="Survival Probability" )
legend(20, 1.0, c("Kaplan-Meier Curve",
"Point-wise 95% Confidence Interval"), lty=1:2 )
Movie for fig. 2.5, table 2.6 and the mean of the survival function (p. 57) using point and click.
Table 2.5, p. 50
Computing the quartiles.
R-In R these functions are not available.
kfit <-kaplanMeier(censor(time,censor)~1, data=hmohiv)
qkaplanMeier(kfit, c(.25, .5, .75) )
[,1]
0.25 3
0.5 7
0.75 15
Table 2.6, p. 52
timeconf.surv <- survfit( Surv(time, censor)~
1, hmohiv, conf.type="log-log")
summary(timeconf.surv)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
4 61 4 0.6442 0.0493 0.53868 0.731
5 56 7 0.5636 0.0517 0.45636 0.658
6 49 2 0.5406 0.0521 0.43343 0.636
7 46 6 0.4701 0.0526 0.36440 0.569
8 39 4 0.4219 0.0525 0.31832 0.522
9 35 3 0.3857 0.0520 0.28454 0.486
The mean of the survivorship function, p. 57
timeconf.surv
n events mean se(mean) median 0.95LCL 0.95UCL
100 80 14.7
1.97 7
5
9
Fig. 2.7, p. 58
timestrata.surv <- survfit( Surv(time,
censor)~ strata(drug), hmohiv, conf.type="log-log")
plot(timestrata.surv, lty=c(1,3), xlab="Time",
ylab="Survival Probability")
legend(40, 1.0, c("Drug - No", "Drug - Yes") ,
lty=c(1,3) )
Using data set minitest in table 2.8, p. 63.
Table 2.10, 64.
Note: When rho=0 then the test is the log-rank test. However, when rho=1 then
the weight is similar to the
weights used in the the Peto and Prentice test.
Similarly, when rho=.5 the tests
appears to be close to the Tarone-Ware test.
survdiff(Surv(time, censor) ~ drug, data=minitest, rho=0)
N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 5 4
5.38
0.353 1.36
drug=1 5 4
2.62
0.724 1.36
Chisq= 1.4 on 1 degrees of freedom, p= 0.243
survdiff(Surv(time, censor) ~ drug, data=minitest, rho=.5)
N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 5 2.69
3.77 1.36
1.11
drug=1 5
3.35
2.27
0.511
1.11
Chisq= 1.1 on 1 degrees of freedom, p= 0.291
N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 5 2.02
2.9
0.267 0.927
drug=1 5 2.88
2.0
0.387
0.927
Chisq= 0.9 on 1 degrees of freedom, p= 0.336
Table 2.11, p. 65
Testing for differences between drug group using the G-rho family of tests.
Note: When rho=0 then the test is the log-rank test. However, when rho=1 then
the weight is similar to the
weights used in the the Peto and Prentice test.
Similarly, when rho=.5 the tests
appears to be close to the Tarone-Ware test.
survdiff(Surv(time, censor) ~ drug, data=hmohiv, rho=0)
N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 51 42 54.9
3.02
11.9
drug=1 49 38 25.1
6.60 11.9
Chisq= 11.9 on 1 degrees of freedom, p= 0.000575
survdiff(Surv(time, censor) ~ drug, data=hmohiv, rho=.5)
N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 51 27.4
38.3 3.11
12.6
drug=1 49 32.0
21.1 5.65
12.6
Chisq=
survdiff(Surv(time, censor) ~ drug, data=hmohiv, rho=1)
N Observed Expected (O-E)^2/E (O-E)^2/V
drug=0 51
20.0 29.3
2.94 11.7
drug=1 49 27.7
18.4 4.69
11.7
Chisq= 11.7 on 1 degrees of freedom, p= 0.000622
Creating the categorical age variable, agecat.
hmohiv$agecat <- cut(hmohiv$age, c(19.9, 29, 34, 39, 54.1) )
Table 2.12, p. 65.
age.surv <- survfit( Surv(time, censor)~
strata(agecat), hmohiv, conf.type="log-log")
print(age.surv)
n events mean se(mean) median
strata(agecat)=agecat=19.9+ thru 29.0 12
8 34.12
7.35 43
strata(agecat)=agecat=29.0+ thru 34.0 34
29 13.83
2.49 9
strata(agecat)=agecat=34.0+ thru 39.0 25
20 12.84
3.70 7
strata(agecat)=agecat=39.0+ thru 54.1 29
23 6.25
1.82 4
0.95LCL 0.95UCL
strata(agecat)=agecat=19.9+ thru 29.0
5 NA
strata(agecat)=agecat=29.0+ thru 34.0
6 12
strata(agecat)=agecat=34.0+ thru 39.0
3 9
strata(agecat)=agecat=39.0+ thru 54.1
2 5
Fig. 2.8, p. 69
plot(age.surv, lty=c(6, 1, 4, 3), xlab="Time",
ylab="Survival Probability")
legend(40, 1.0, c("Group 1", "Group 2", "Group
3", "Group 4") , lty=c(6, 1, 4, 3) )
Movie for table 2.12 and fig. 2.8 using point and click.
Table 2.14, p. 70
survdiff(Surv(time, censor) ~ agecat, data=hmohiv, rho=0)
N Observed Expected (O-E)^2/E (O-E)^2/V
agecat=19.9+ thru 29.0 12
8 19.9
7.10608 12.4419
agecat=29.0+ thru 34.0 34
29 29.4
0.00641 0.0117
agecat=34.0+ thru 39.0 25
20 17.8
0.26894 0.3834
agecat=39.0+ thru 54.1 29
23 12.9
7.98170 11.1799
Chisq= 19.9 on 3 degrees of freedom, p= 0.000178
survdiff(Surv(time, censor) ~ agecat, data=hmohiv, rho=.5)
N Observed Expected (O-E)^2/E (O-E)^2/V
agecat=19.9+ thru 29.0 12 4.53
12.0
4.671 9.584
agecat=29.0+ thru 34.0 34 20.15
22.6
0.263 0.597
agecat=34.0+ thru 39.0 25 15.32
13.9
0.147 0.263
agecat=39.0+ thru 54.1 29 19.44
10.9
6.610 10.985
Chisq= 17.7 on 3 degrees of freedom, p= 0.000514
survdiff(Surv(time, censor) ~ agecat, data=hmohiv, rho=1)
N Observed Expected (O-E)^2/E (O-E)^2/V
agecat=19.9+ thru 29.0 12 3.05
8.37
3.3859 7.225
agecat=29.0+ thru 34.0 34 15.11
18.21
0.5267 1.320
agecat=34.0+ thru 39.0 25 12.48
11.48
0.0872 0.172
agecat=39.0+ thru 54.1 29 17.06
9.64
5.7134 10.355
Chisq= 15.4 on 3 degrees of freedom, p= 0.00154
Fig. 2.9 and table 2.16 was not reproduced.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services