Stata has the convenient feature of having a bootstrap prefix command which can be seamlessly incorporated with estimation commands (e.g., logistic regression or OLS regression) and non-estimation commands (e.g., summarize). The bootstrap command automates the bootstrap process for the statistic of interest and computes relevant summary measures (i.e., bias and confidence intervals). As convenient as this command is, however, there are instances when the statistic you want to bootstrap does not work within the command. For such instances, you need to write your own bootstrap program.
This Stata FAQ shows how to write your own bootstrap program. For the first example, we match results from the bootstrap command with results from writing a bootstrap program. Ideally, this should reveal how simple it is to write your own bootstrap program. This is followed by an example in which the statistic you want to bootstrap does not work within the bootstrap command, and therefore, requires you to write your own bootstrap program.
This example we use the bootstrap command and replicate the results by writing our own bootstrap program. We use the High School and Beyond dataset from which we are going to regress gender (female), math score (math), writing score (write) and socio-economic status (ses) on reading score (read) and bootstrap the root mean squared error (rmse). For the bootstrap we do 100 replications and specify the seed so that we can replicate the results.
use http://www.ats.ucla.edu/stat/stata/notes3/hsb2, clear
(highschool and beyond (200 cases))
regress read female math write ses
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 4, 195) = 52.58
Model | 10854.9318 4 2713.73294 Prob > F = 0.0000
Residual | 10064.4882 195 51.6127602 R-squared = 0.5189
-------------+------------------------------ Adj R-squared = 0.5090
Total | 20919.42 199 105.122714 Root MSE = 7.1842
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -2.450171 1.101524 -2.22 0.027 -4.622602 -.2777409
math | .4565641 .0721114 6.33 0.000 .3143457 .5987825
write | .3793564 .0732728 5.18 0.000 .2348475 .5238653
ses | 1.301982 .7400719 1.76 0.080 -.1575905 2.761555
_cons | 6.833418 3.279371 2.08 0.038 .3658287 13.30101
------------------------------------------------------------------------------
bootstrap rmse=e(rmse), reps(100) seed(12345): regress read female math write ses
(running regress on estimation sample)
Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
Linear regression Number of obs = 200
Replications = 100
command: regress read female math write ses
rmse: e(rmse)
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rmse | 7.184202 .2594069 27.69 0.000 6.675774 7.69263
------------------------------------------------------------------------------
estat bootstrap, all
Linear regression Number of obs = 200
Replications = 100
command: regress read female math write ses
rmse: e(rmse)
------------------------------------------------------------------------------
| Observed Bootstrap
| Coef. Bias Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
rmse | 7.1842021 -.1006956 .25940687 6.675774 7.69263 (N)
| 6.559784 7.636096 (P)
| 6.778425 7.741319 (BC)
------------------------------------------------------------------------------
(N) normal confidence interval
(P) percentile confidence interval
(BC) bias-corrected confidence interval
Writing our own bootstrap program requires four steps.
use http://www.ats.ucla.edu/stat/stata/notes3/hsb2, clear
(highschool and beyond (200 cases))
*Step 1
quietly regress read female math write ses
matrix observe = e(rmse)
*Step 2
capture program drop myboot
program define myboot, rclass
preserve
bsample
regress read female math write ses
return scalar rmse = e(rmse)
restore
end
*Step 3
simulate rmse=r(rmse), reps(100) seed(12345): myboot
command: myboot
rmse: r(rmse)
Simulations (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
*Step 4
bstat, stat(observe) n(200)
Bootstrap results Number of obs = 200
Replications = 100
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rmse | 7.184202 .2594069 27.69 0.000 6.675774 7.69263
------------------------------------------------------------------------------
estat bootstrap, all
Bootstrap results Number of obs = 200
Replications = 100
------------------------------------------------------------------------------
| Observed Bootstrap
| Coef. Bias Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
rmse | 7.1842021 -.1006956 .25940687 6.675774 7.69263 (N)
| 6.559784 7.636096 (P)
| 6.778425 7.741319 (BC)
------------------------------------------------------------------------------
(N) normal confidence interval
(P) percentile confidence interval
(BC) bias-corrected confidence interval
The results from Step 4 match the results from the bootstrap command in the example above.
In this example we write a bootstrap program where the usual bootstrap
command does not accommodate the statistic we want to bootstrap. The reason why
the bootstrap command does not accommodate all situations is because the
bootstrap command requires a statistic that falls directly out of the "analysis" command.
To see what statistics are accommodated, use either the ereturn list or
return list command following the "analysis" command. The distinction
between ereturn list or
return list depends whether the "analysis" command is an estimation
command or not.
Suppose we want to bootstrap the variance inflation factor (vif),
which requires us to run regress and then estat vif.
In such a situation, the statistic to bootstrap falls out from a post
estimation command, which is not obtainable from regress and therefore not accommodated by
the bootstrap command. Hence, we must
write our own bootstrap program to get a bootstrap estimate of the vif.
use http://www.ats.ucla.edu/stat/stata/notes3/hsb2, clear
(highschool and beyond (200 cases))
*Step 1
quietly regress read female math write ses
estat vif
Variable | VIF 1/VIF
-------------+----------------------
write | 1.86 0.537690
math | 1.76 0.568278
female | 1.17 0.857692
ses | 1.11 0.902671
-------------+----------------------
Mean VIF | 1.47
return list
scalars:
r(vif_4) = 1.107823014259338
r(vif_3) = 1.165920257568359
r(vif_2) = 1.759701371192932
r(vif_1) = 1.859809398651123
macros:
r(name_4) : "ses"
r(name_3) : "female"
r(name_2) : "math"
r(name_1) : "write"
matrix vif = ( r(vif_4), r(vif_3), r(vif_2), r(vif_1))
matrix list vif
vif[1,4]
c1 c2 c3 c4
r1 1.107823 1.1659203 1.7597014 1.8598094
*Step 2
capture program drop myboot2
program define myboot2, rclass
preserve
bsample
regress read female math write ses
estat vif
return scalar vif_4 = r(vif_4)
return scalar vif_3 = r(vif_3)
return scalar vif_2 = r(vif_2)
return scalar vif_1 = r(vif_1)
restore
end
*Step 3
simulate vif_4=r(vif_4) vif_3=r(vif_3) vif_2=r(vif_2) vif_1=r(vif_1), ///
reps(100) seed(12345): myboot2
command: myboot2
vif_4: r(vif_4)
vif_3: r(vif_3)
vif_2: r(vif_2)
vif_1: r(vif_1)
Simulations (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
bstat, stat(vif) n(200)
Bootstrap results Number of obs = 200
Replications = 100
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
vif_4 | 1.107823 .0344814 32.13 0.000 1.040241 1.175405
vif_3 | 1.16592 .0524449 22.23 0.000 1.06313 1.26871
vif_2 | 1.759701 .1349314 13.04 0.000 1.495241 2.024162
vif_1 | 1.859809 .1467453 12.67 0.000 1.572194 2.147425
------------------------------------------------------------------------------
estat bootstrap, all
Bootstrap results Number of obs = 200
Replications = 100
------------------------------------------------------------------------------
| Observed Bootstrap
| Coef. Bias Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
vif_4 | 1.107823 .0127056 .03448139 1.040241 1.175405 (N)
| 1.061917 1.197667 (P)
| 1.058617 1.172653 (BC)
vif_3 | 1.1659203 .0285308 .0524449 1.06313 1.26871 (N)
| 1.10246 1.30328 (P)
| 1.08448 1.255424 (BC)
vif_2 | 1.7597014 .0305828 .13493139 1.495241 2.024162 (N)
| 1.552449 2.081403 (P)
| 1.510279 2.026165 (BC)
vif_1 | 1.8598094 .0389828 .14674526 1.572194 2.147425 (N)
| 1.665374 2.196174 (P)
| 1.633619 2.160758 (BC)
------------------------------------------------------------------------------
(N) normal confidence interval
(P) percentile confidence interval
(BC) bias-corrected confidence interval
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.