|
|
|
||||
|
|
|||||
Example 1. In the 1980s there was a federal law restricting speedometer readings to no more than 85 mph. So if you wanted to try and predict a vehicle's top-speed from a combination of horse-power and engine size, you would get a reading no higher than 85, regardless of how fast the vehicle was really traveling. This is a classic case of right-censoring (censoring from above) of the data. The only thing we are certain of is that those vehicles were traveling at least 85 mph. Tobit models are designed to make improved estimates when there is either left- or right-censoring.
Example 2. A research project is studying the level of lead in home drinking water as a function of the age of a house and family income. The water testing kit cannot detect lead concentrations below 5 parts per billion (ppb). The EPA considers levels above 15 ppb to be dangerous. These data are an example of left-censoring (censoring from below) and can be analyzed using tobit analysis.
Example 3. Consider the situation in which we have a measure of academic aptitude (scaled 200-800) which we want to model using reading and math test scores and whether the student is enrolled in a public or private school. The problem here is that students who answer all questions on the academic aptitude test correctly receive a score of 800, even though it is likely that these students are not "truly" equal in aptitude.
We have a hypothetical data file, tobitex.sas7bdat
, with 200 observations. The academic aptitude variable is apt
Let's look at the data.
At the top, the output provides a summary of the number of left- and right-censored
values. In the table entitled Parameter Estimates, we have the tobit
regression coefficients, the standard error of the coefficients, the t-Value and
the associated p-values. The ancillary statistic _sigma is equivalent to the standard error of estimate in OLS regression. The value of 73.63 can be compared
to the standard deviation of academic aptitude, which was
101.44. This shows a substantial reduction. The output also contains an estimate of the standard
error of _sigma, as well as the t-Value and p-value. That _sigma is
statistically significant means that 73.63 is statistically significantly
different from 0. The validity of this test of _sigma is a matter of
debate among statisticians, and some programs will produce the estimate and
standard error, but not the test of statistical significance.
To get an idea about model fit, you can use the squared multiple correlation
between the outcome variable (apt) and the predicted value. The predicted
values are obtained by creating an output data set (which we called temp1)
on the output statement with the predict option. The
following proc corr and data step are then used to get the desired value.
UCLA Researchers are invited to our Statistical Consulting Services
proc means data = tobitex;
run;
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------------
ID 200 100.5000000 57.8791845 1.0000000 200.0000000
APT 200 651.0600000 101.4403625 420.0000000 800.0000000
READ 200 52.2300000 10.2529368 28.0000000 76.0000000
MATH 200 52.6450000 9.3684478 33.0000000 75.0000000
PUBLIC 200 0.5450000 0.4992205 0 1.0000000
-------------------------------------------------------------------------------
proc univariate data = tobitex plot;
var apt;
histogram / normal;
run;
The UNIVARIATE Procedure
Variable: APT
Moments
N 200 Sum Weights 200
Mean 651.06 Sum Observations 130212
Std Deviation 101.440362 Variance 10290.1471
Skewness -0.4439105 Kurtosis -0.7559199
Uncorrected SS 86823564 Corrected SS 2047739.28
Coeff Variation 15.5808009 Std Error Mean 7.17291682
Basic Statistical Measures
Location Variability
Mean 651.0600 Std Deviation 101.44036
Median 663.0000 Variance 10290
Mode 717.0000 Range 380.00000
Interquartile Range 153.00000
< some output omitted to save space >
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
420 153 800 92
420 133 800 93
420 126 800 137
420 16 800 160
441 128 800 192
Stem Leaf # Boxplot
80 000000000000000 15 |
78 00000000 8 |
76 |
74 7777777777777777779999 22 |
72 66668888 8 +-----+
70 7777777777777777777777777 25 | |
68 666666666666 12 | |
66 33333333333333333222 20 *-----*
64 2222222222222221 16 | + |
62 11 2 | |
60 99999999999 11 | |
58 88 2 | |
56 7999999999 10 +-----+
54 6888888888888 13 |
52 555555555577 12 |
50 44444666 8 |
48 3335 4 |
46 2244 4 |
44 1111 4 |
42 0000 4 |
----+----+----+----+----+
Multiply Stem.Leaf by 10**+1
Normal Probability Plot
810+ ******** **
| ***
| ++
| *****
| **+
| ****
| ***+
| ***+
| ***+
| **+
| **
| ++*
| +**
| +***
| ****
| ***
| +**
| +**
| ***
430+** **
+----+----+----+----+----+----+----+----+----+----+
proc freq data = tobitex;
tables public;
run;
The FREQ Procedure
Cumulative Cumulative
PUBLIC Frequency Percent Frequency Percent
-----------------------------------------------------------
0 91 45.50 91 45.50
1 109 54.50 200 100.00
proc corr data = tobitex nosimple;
var read math public apt;
run;
The CORR Procedure
4 Variables: READ MATH PUBLIC APT
Pearson Correlation Coefficients, N = 200
Prob > |r| under H0: Rho=0
READ MATH PUBLIC APT
READ 1.00000 0.66228 -0.05308 0.59713
<.0001 0.4553 <.0001
MATH 0.66228 1.00000 -0.02934 0.61705
<.0001 0.6801 <.0001
PUBLIC -0.05308 -0.02934 1.00000 0.25665
0.4553 0.6801 0.0002
APT 0.59713 0.61705 0.25665 1.00000
<.0001 <.0001 0.0002
proc insight data = tobitex;
scatter read math apt * read math apt;
run;

Some Strategies You Might Be Tempted To Try
Before we show how you can analyze this with a tobit analysis, let's
consider some other methods that you might use.
SAS Tobit Analysis
proc qlim data = tobitex ;
model apt = read math public;
endogenous apt ~ censored (ub=800);
output out = temp1 predicted;
run;
The QLIM Procedure
Summary Statistics of Continuous Responses
N Obs N Obs
Standard Lower Upper Lower Upper
Variable Mean Error Type Bound Bound Bound Bound
apt 651.06 101.440362 Censored 800 15
Model Fit Summary
Number of Endogenous Variables 1
Endogenous Variable apt
Number of Observations 200
Log Likelihood -1072
Maximum Absolute Gradient 1.43701E-7
Number of Iterations 17
AIC 2154
Schwarz Criterion 2171
Algorithm converged.
Parameter Estimates
Standard Approx
Parameter Estimate Error t Value Pr > |t|
Intercept 188.394294 32.751473 5.75 <.0001
READ 3.681712 0.687378 5.36 <.0001
MATH 4.557839 0.753892 6.05 <.0001
PUBLIC 62.163301 10.574065 5.88 <.0001
_Sigma 73.632442 3.874453 19.00 <.0001
The ub = option on the endogenous statement indicates the value at which the right-censoring
begins. There is also a lb() option to indicate the value of the left-censoring, which
was not needed in this example. proc qlim data=tobitex ;
model apt = read math public;
endogenous apt ~ censored (ub=800);
output out = temp1 predicted;
run;
ods output PearsonCorr=tobit_corr;
proc corr data = temp1 nosimple;
var apt p_apt;
run;
data _null_;
set tobit_corr;
if variable = "APT";
file print;
a = round((p_apt)**2, .0001);
put "The squared multiple correlation between apt and the predicted value is " a;
run;
The squared multiple correlation between apt and the predicted value is 0.527
Sample Write-Up of the Analysis
In the tobit regression model predicting academic aptitude from reading, math
and public school, each of the predictor variables
in the model was also statically significant at the .001 level. The squared correlation between
the observed and predicted academic aptitude values was 0.53, indicating that these three predictors
accounted for over 50% of the variability in the outcome variable. A unit change in
read and math
lead to a 3.68 and 4.56 increase in the predicted aptitude, respectively. Attending a public school
increased the predicted aptitude by 62.16 points as compared with private school attendance.
See Also
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California