UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata Data Analysis Examples
Truncated Regression

Examples of Truncated Regression Analysis

Example 1. A researcher has data for a sample of employed persons and wishes to model wages as predicted by years of schooling and gender. Since the sample excludes individuals who are not employed, the data can be considered to be truncated at zero, i.e., wages need to be greater than zero to be included in the sample.

Example 2. A study of students in a special GATE (gifted and talented education) program wishes to model achievement as a function of gender, language skills and math skills. A major concern is that students require a minimum achievement score of 40 to enter the special program. Thus, the sample is truncated at an achievement score of 39.

Description of the Data

Let's pursue Example 2 from above.

We have a hypothetical data file, truncreg2.dta with 178 observations. The achievement variable is achiv, the language and math test scores are langscore and mathscore respectively. The variable female is a zero-one indicator variable with the one's indicating a female student.

Let's look at the data.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a truncated regression analysis, let's consider some other methods that you might use.

Stata Truncated Regression Analysis

The ll() option in the truncreg command indicates the value at which the left truncation take place. There is also a ul() option to indicate the value of the right truncation which was not needed in this example.

Now, just to be on the safe side, let's rerun the truncreg command with the robust option in order to obtain robust standard errors for the truncated regression coefficients.

The output looks very much like the output from an OLS regression. The output begins with a note indicating that zero observations were truncated. This is because our sample contained no data with values less than 40 for achievement. The note is followed by the iteration log giving the values of the log pseudolikelihoods starting with a model that has no predictors. The last value in the log is the final value of the log pseudolikelihood and is repeated below.

Next comes the header information. On the left-hand side are the lower and upper limits of the truncation and a repeat of the final log pseudolikelihood. On the right-hand the number of observations used (178) is given along with the Wald chi-squared with three degrees of freedom. The Wald chi-squared is what you would get if you used the test command, after estimating the model, to test that all the coefficients are zero. You do not get a likelihood ratio chi-squared because truncreg uses log pseudolikelihoods in the estimation. Finally, there is a p-value for the chi-squared test. Obviously this model, as a whole, is statistically significant.

truncreg produces neither an R2 nor a pseudo-R2. You can compute a rough estimate of the degree of association by correlating achiv with the predicted value and squaring the result. The calculated value of .42 is rough estimate of the R2 you would find in an OLS regression.

In the main body of the table we have the truncated regression coefficients, the standard error of the coefficients, a Wald z-test (coefficient/se) and the p-value associated with each z-test. By default, we also get a 95% confidence interval for the coefficients. With the level() option you can request a different confidence interval.

The ancillary statistic /sigma is equivalent to the standard error of estimate in OLS regression. The value of 7.74 can be compared to the standard deviation of achievement which was 8.96. This shows a modest reduction. The output also contains an estimate of the standard error of /sigma as well as a 95% confidence interval.

Sample Write-Up of the Analysis

Before we begin the sample write-up we need to get the output into a form more acceptable for publication. The estout command (findit estout by Ben Jann of ETH Zurich), will get us closer to what we want. With a little bit of manual editing and remembering to include the rough R2 we can produce an acceptable table of the output. The truncated regression model predicting achievement from language scores, math scores and gender was statistically significant (chi-squared = 117.07, df = 3, p<.001). The predictors language and math were each statistically significant at the .001 level. The effect of gender was not significant at the .05 level. The squared correlation between the observed and predicted academic aptitude values was 0.42 indicating that these three predictors accounted for over 40% of the variability in the outcome variable. A unit change in language and math lead to a 5.06 and 5.00 increase in predicted achievement, respectively. The effect for gender, although not significant, resulted in predicted achievement scores for females that were only 2.29 points lower than males.

Cautions, Flies in the Ointment

  • Stata's truncreg command is designed to work when the truncation is on the response variable in the model. It is possible to have samples that are truncated based on one or more predictors. For example, modeling college GPA as a function of high school GPA (HSGPA) and SAT scores involves a sample that is truncated based on the predictors, i.e., only student with higher HSGPA and SAT scores are admitted into the college.
  • You need to be careful about what value is used as the truncation value because it effects the estimation of the coefficients and standard errors. In the example above if we had used ll(40) instead of ll(39) the results would have been slightly different. Using the value of 39 for truncation is correct because any score of 40 or better would have been included in our sample. It does not matter that there were no values of 40 in our sample.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.