Stata Data Analysis Examples
Truncated Regression

Version info: Code for this page was tested in Stata 12.

Truncated regression is used to model dependent variables for which some of the observations are not included in the analysis because of the value of the dependent variable. 

Please note: The purpose of this page is to show how to use various data analysis commands.  It does not cover all aspects of the research process which researchers are expected to do.  In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics or potential follow-up analyses.

Examples of truncated regression

Example 1. A study of students in a special GATE (gifted and talented education) program wishes to model achievement as a function of language skills and the type of program in which the student is currently enrolled.  A major concern is that students are required to have a minimum achievement score of 40 to enter the special program.  Thus, the sample is truncated at an achievement score of 40.

Example 2. A researcher has data for a sample of Americans whose income is above the poverty line.  Hence, the lower part of the distribution of income is truncated.  If the researcher had a sample of Americans whose income was at or below the poverty line, then the upper part of the income distribution would be truncated.  In other words, truncation is a result of sampling only part of the distribution of the outcome variable.

Description of the data

Let's pursue Example 1 from above.

We have a hypothetical data file, truncreg.dta, with 178 observations.  The outcome variable is called achiv, and the language test score variable is called langscore.  The variable prog is a categorical predictor variable with three levels indicating the type of program in which the students were enrolled. 

Let's look at the data.  It is always a good idea to start with descriptive statistics.

Analysis methods you might consider

Below is a list of some analysis methods you may have encountered.  Some of the methods listed are quite reasonable, while others have either fallen out of favor or have limitations.    

Truncated regression

Below we use the truncreg command to estimate a truncated regression model.  The i. before prog indicates that it is a factor variable (i.e., categorical variable), and that it should be included in the model as a series of indicator variables.  The ll() option in the truncreg command indicates the value at which the left truncation take place.  There is also a ul() option to indicate the value of the right truncation, which was not needed in this example. 

The two degree-of-freedom chi-square test indicates that prog is a statistically significant predictor of achiv.

We can use the margins command to obtain the expected cell means.  Note that these are different from the means we obtained with the tabstat command above.

margins prog

Predictive margins                                Number of obs   =        178
Model VCE    : OIM

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        prog |
          1  |   49.78871   1.897166    26.24   0.000     46.07034    53.50709
          2  |   53.85393   1.150041    46.83   0.000     51.59989    56.10797
          3  |   48.65285   2.140489    22.73   0.000     44.45757    52.84813
------------------------------------------------------------------------------

In the table above, we can see that the expected mean of avchiv for the first level of prog is approximately 49.79; the expected mean for level 2 of prog is 53.85; the expected mean for the third level of prog is 48.65.

marginsplot

If you would like to compare truncated regression models, you can issue the estat ic command to get the log likelihood, AIC and BIC values.

estat ic

-----------------------------------------------------------------------------
       Model |    Obs    ll(null)   ll(model)     df          AIC         BIC
-------------+---------------------------------------------------------------
           . |    178           .   -591.3098      5      1192.62    1208.529
-----------------------------------------------------------------------------
               Note:  N=Obs used in calculating BIC; see [R] BIC note

The truncreg output includes neither an R2 nor a pseudo-R2.  You can compute a rough estimate of the degree of association by correlating achiv with the predicted value and squaring the result.

The calculated value of .31 is rough estimate of the R2 you would find in an OLS regression.  The squared correlation between the observed and predicted academic aptitude values is about 0.31, indicating that these predictors accounted for over 30% of the variability in the outcome variable. 

Things to consider

See also

References

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.