|
|
|
||||
|
|
|||||
Example 1. In the 1980s there was a federal law restricting speedometer readings to no more than 85 mph. So if you wanted to try and predict a vehicle's top-speed from a combination of horse-power and engine size, you would get a reading no higher than 85, regardless of how fast the vehicle was really traveling. This is a classic case of right-censoring (censoring from above) of the data. The only thing we are certain of is that those vehicles were traveling at least 85 mph. Tobit models are designed to make improved estimates when there is either left- or right-censoring.
Example 2. A research project is studying the level of lead in home drinking water as a function of the age of a house and family income. The water testing kit cannot detect lead concentrations below 5 parts per billion (ppb). The EPA considers levels above 15 ppb to be dangerous. These data are an example of left-censoring (censoring from below) and can be analyzed using tobit analysis.
Example 3. Consider the situation in which we have a measure of academic aptitude (scaled 200-800) which we want to model using reading and math test scores and whether the student is enrolled in a public or private school. The problem here is that students who answer all questions on the academic aptitude test correctly receive a score of 800, even though it is likely that these students are not "truly" equal in aptitude.
We have a hypothetical data file, tobitex.dta with 200 observations.
The academic aptitude variable is apt
Let's look at the data.
The output looks very much like the output from an OLS regression. At the top, the number of
observations used in the analysis is given along a likelihood-ratio chi-squared. The
likelihood-ratio chi-squared tests the difference between the full model (with predictors)
and the constant only model. Below that is the p-value for the chi-squared with three
degrees of freedom. Obviously this model, as a whole, is statistically significant.
Next come the log-likelihood for the full model and the pseudo-R2 for
the model.
The pseudo-R2 in the output is obtained by computing 1 - LL(full model)/LL(constant only model),
which in this case is .065. This is McFadden's pseudo-R2 and it may not be the best measure
of fit. You can do better by calculating the R2 between the predicted and observed values.
The ancillary statistic /sigma is equivalent to the standard error of estimate in OLS regression.
The value of 73.63 can be compared to the standard deviation of academic aptitude which was
101.44. This shows a substantial reduction. The output also contains an estimate of the standard
error of /sigma as well as a 95% confidence interval.
Finally, the output provides a summary of the number of left-censored, uncensored and right-censored
values.
UCLA Researchers are invited to our Statistical Consulting Servicesuse http://www.ats.ucla.edu/stat/stata/dae/tobitex, clear
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
id | 200 100.5 57.87918 1 200
apt | 200 651.06 101.4404 420 800
read | 200 52.23 10.25294 28 76
math | 200 52.645 9.368448 33 75
public | 200 .545 .4992205 0 1
histogram apt, normal bin(10) xline(800)
tabulate public
public | Freq. Percent Cum.
------------+-----------------------------------
0 | 91 45.50 45.50
1 | 109 54.50 100.00
------------+-----------------------------------
Total | 200 100.00
correlate read math public apt
(obs=200)
| read math public apt
-------------+------------------------------------
read | 1.0000
math | 0.6623 1.0000
public | -0.0531 -0.0293 1.0000
apt | 0.5971 0.6171 0.2567 1.0000
graph matrix read math apt, half jitter(2)

Some Strategies You Might Be Tempted To Try
Before we show how you can analyze this with a tobit analysis, let's
consider some other methods that you might use.
Stata Tobit Analysis
tobit apt read math public, ul(800)
Tobit regression Number of obs = 200
LR chi2(3) = 149.03
Prob > chi2 = 0.0000
Log likelihood = -1072.2469 Pseudo R2 = 0.0650
------------------------------------------------------------------------------
apt | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 3.681712 .6873387 5.36 0.000 2.326225 5.037198
math | 4.557839 .7538493 6.05 0.000 3.071189 6.044489
public | 62.1633 10.57346 5.88 0.000 41.31159 83.01501
_cons | 188.3943 32.74961 5.75 0.000 123.8095 252.9791
-------------+----------------------------------------------------------------
/sigma | 73.63244 3.873908 65.99279 81.27209
------------------------------------------------------------------------------
Obs. summary: 0 left-censored observations
185 uncensored observations
15 right-censored observations at apt>=800
The ul() option in the tobit command indicates the value at which the right-censoring
begins. There is also a ll() option to indicate the value of the left-censoring which
was not needed in this example.predict p
quietly correlate p apt
(obs=200)
display r(rho)^2
.52608156
The calculated value of .53 is probably closer to what you would find in an OLS regression.
You can also make use of the Long and Freese utility command fitstat (findit spostado),
which provides a number of pseudo-R2s in addition to other measures of fit.
fitstat
Measures of Fit for tobit of apt
Log-Lik Intercept Only: -1146.761 Log-Lik Full Model: -1072.247
D(195): 2144.494 LR(3): 149.028
Prob > LR: 0.000
McFadden's R2: 0.065 McFadden's Adj R2: 0.061
ML (Cox-Snell) R2: 0.525 Cragg-Uhler(Nagelkerke) R2: 0.525
McKelvey & Zavoina's R2: 0.531
Variance of y*: 11565.884 Variance of error: 5421.737
AIC: 10.772 AIC*n: 2154.494
BIC: 1111.322 BIC': -133.133
BIC used by Stata: 2170.985 AIC used by Stata: 2154.494
In the main body of the table we have the tobit regression coefficients, the standard error
of the coefficients, a Wald t-test (coefficient/se) and the p-value associated with each t-test.
By default, we also get a 95% confidence interval for the coefficients. With the level()
option you can request a different confidence interval.Sample Write-Up of the Analysis
Before we begin the sample write-up we need to get the output into a form more acceptable for
publication. The estout command (findit estout by Ben Jann of ETH Zurich), will get
us closer to what we want.
estout, cells(b(star fmt(%8.2f)) se(par fmt(%8.2f))) stats(ll chi2 r2_p, fmt(%8.2f))
b/se
model
read 3.68***
(0.69)
math 4.56***
(0.75)
public 62.16***
(10.57)
_cons 188.39***
(32.75)
sigma
_cons 73.63***
(3.87)
ll -1072.25
chi2 149.03
r2_p 0.06
With a little bit of manual editing and remembering to change to a better pseudo-R2
we can produce an acceptable table of the output.
model
read 3.68***
(0.69)
math 4.56***
(0.75)
public 62.16***
(10.57)
constant 188.39***
(32.75)
standard error
of estimate 73.63
log-likelihood -1072.25
chi-squared 149.03
pseudo R-squared 0.53
legend: coefficient/(standard error) *** p<0.001
The tobit regression model predicting academic aptitude from reading, math and public
school
was statistically significant (chi-squared = 149.03, df = 3). Each of the predictor variables
in the model was also statically significant at the .001 level. The squared correlation between
the observed and predicted academic aptitude values was 0.53 indicating that these three predictors
accounted for over 50% of the variability in the outcome variable. A unit change in
read and math
lead to a 3.68 and 4.56 increase in the predicted aptitude, respectively. Attending a public school
increased the predicted aptitude by 62.16 points as compared with private school attendance.
Additional Example
See Also
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California