UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata Data Analysis Examples
Tobit Analysis

Examples of Tobit Analysis

Example 1. In the 1980s there was a federal law restricting speedometer readings to no more than 85 mph. So if you wanted to try and predict a vehicle's top-speed from a combination of horse-power and engine size, you would get a reading no higher than 85, regardless of how fast the vehicle was really traveling. This is a classic case of right-censoring (censoring from above) of the data. The only thing we are certain of is that those vehicles were traveling at least 85 mph. Tobit models are designed to make improved estimates when there is either left- or right-censoring.

Example 2. A research project is studying the level of lead in home drinking water as a function of the age of a house and family income. The water testing kit cannot detect lead concentrations below 5 parts per billion (ppb). The EPA considers levels above 15 ppb to be dangerous. These data are an example of left-censoring (censoring from below) and can be analyzed using tobit analysis.

Example 3. Consider the situation in which we have a measure of academic aptitude (scaled 200-800) which we want to model using reading and math test scores and whether the student is enrolled in a public or private school. The problem here is that students who answer all questions on the academic aptitude test correctly receive a score of 800, even though it is likely that these students are not "truly" equal in aptitude.

Description of the Data

Let's pursue Example 3 from above.

We have a hypothetical data file, tobitex.dta with 200 observations. The academic aptitude variable is apt, the reading and writing test scores are read and write respectively. The variable public is a zero-one variable with the ones indicating a public school student.

Let's look at the data.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a tobit analysis, let's consider some other methods that you might use.

Stata Tobit Analysis

The ul() option in the tobit command indicates the value at which the right-censoring begins. There is also a ll() option to indicate the value of the left-censoring which was not needed in this example.

The output looks very much like the output from an OLS regression. At the top, the number of observations used in the analysis is given along a likelihood-ratio chi-squared. The likelihood-ratio chi-squared tests the difference between the full model (with predictors) and the constant only model. Below that is the p-value for the chi-squared with three degrees of freedom. Obviously this model, as a whole, is statistically significant. Next come the log-likelihood for the full model and the pseudo-R2 for the model.

The pseudo-R2 in the output is obtained by computing 1 - LL(full model)/LL(constant only model), which in this case is .065. This is McFadden's pseudo-R2 and it may not be the best measure of fit. You can do better by calculating the R2 between the predicted and observed values.

The calculated value of .53 is probably closer to what you would find in an OLS regression. You can also make use of the Long and Freese utility command fitstat (findit spostado), which provides a number of pseudo-R2s in addition to other measures of fit. In the main body of the table we have the tobit regression coefficients, the standard error of the coefficients, a Wald t-test (coefficient/se) and the p-value associated with each t-test. By default, we also get a 95% confidence interval for the coefficients. With the level() option you can request a different confidence interval.

The ancillary statistic /sigma is equivalent to the standard error of estimate in OLS regression. The value of 73.63 can be compared to the standard deviation of academic aptitude which was 101.44. This shows a substantial reduction. The output also contains an estimate of the standard error of /sigma as well as a 95% confidence interval.

Finally, the output provides a summary of the number of left-censored, uncensored and right-censored values.

Sample Write-Up of the Analysis

Before we begin the sample write-up we need to get the output into a form more acceptable for publication. The estout command (findit estout by Ben Jann of ETH Zurich), will get us closer to what we want. With a little bit of manual editing and remembering to change to a better pseudo-R2 we can produce an acceptable table of the output. The tobit regression model predicting academic aptitude from reading, math and public school was statistically significant (chi-squared = 149.03, df = 3). Each of the predictor variables in the model was also statically significant at the .001 level. The squared correlation between the observed and predicted academic aptitude values was 0.53 indicating that these three predictors accounted for over 50% of the variability in the outcome variable. A unit change in read and math lead to a 3.68 and 4.56 increase in the predicted aptitude, respectively. Attending a public school increased the predicted aptitude by 62.16 points as compared with private school attendance.

Additional Example

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.