|
|
|
||||
|
|
|||||
Example 1. A researcher has data for a sample of employed persons and wishes to model wages as predicted by years of schooling and gender. Since the sample excludes individuals who are not employed, the data can be considered to be truncated at zero, i.e., wages need to be greater than zero to be included in the sample.
Example 2. A study of students in a special GATE (gifted and talented education) program wishes to model achievement as a function of gender, language skills and math skills. A major concern is that students require a minimum achievement score of 40 to enter the special program. Thus, the sample is truncated at an achievement score of 39.
We have a hypothetical data file, truncreg2.dta with 178 observations.
The achievement variable is achiv
Let's look at the data.
Now, just to be on the safe side, let's rerun the truncreg command with the robust
option in order to obtain robust standard errors for the truncated regression coefficients.
Next comes the header information. On the left-hand side are the lower and upper limits of the
truncation and a repeat of the final log pseudolikelihood. On the right-hand the number of
observations used (178) is given along with the Wald chi-squared with three degrees of
freedom. The Wald chi-squared is what
you would get if you used the test command, after estimating the model, to test that
all the coefficients are zero. You do not get a likelihood ratio chi-squared because truncreg
uses log pseudolikelihoods in the estimation. Finally, there is a p-value for
the
chi-squared test. Obviously this model, as a whole, is statistically significant.
In the main body of the table we have the truncated regression coefficients, the standard error
of the coefficients, a Wald z-test (coefficient/se) and the p-value associated with each z-test.
By default, we also get a 95% confidence interval for the coefficients. With the level()
option you can request a different confidence interval.
The ancillary statistic /sigma is equivalent to the standard error of estimate
in OLS regression. The value of 7.74 can be compared to the standard deviation of achievement which was
8.96. This shows a modest reduction. The output also contains an estimate of the standard
error of /sigma as well as a 95% confidence interval.
UCLA Researchers are invited to our Statistical Consulting Servicesuse http://www.ats.ucla.edu/stat/stata/dae/truncreg2, clear
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
id | 178 103.6236 57.08957 3 200
achiv | 178 54.23596 8.96323 41 76
female | 178 .5505618 .4988401 0 1
langscore | 178 5.401124 .8944896 3.1 6.7
mathscore | 178 5.302809 .9483515 3.1 7.4
histogram achiv, bin(6) start(40)
tabulate female
female | Freq. Percent Cum.
------------+-----------------------------------
male | 80 44.94 44.94
female | 98 55.06 100.00
------------+-----------------------------------
Total | 178 100.00
correlate langscore mathscore female achiv
(obs=178)
| langsc~e mathsc~e female achiv
-------------+------------------------------------
langscore | 1.0000
mathscore | 0.5052 1.0000
female | 0.2455 -0.1932 1.0000
achiv | 0.5265 0.5873 -0.0937 1.0000
graph matrix langscore mathscore achiv, half jitter(2)

Some Strategies You Might Be Tempted To Try
Before we show how you can analyze this with a truncated regression analysis, let's
consider some other methods that you might use.
Stata Truncated Regression Analysis
truncreg achiv female langscore mathscore, ll(40)
(note: 0 obs. truncated)
Fitting full model:
Iteration 0: log likelihood = -580.98553
Iteration 1: log likelihood = -574.83026
Iteration 2: log likelihood = -574.53094
Iteration 3: log likelihood = -574.53056
Iteration 4: log likelihood = -574.53056
Truncated regression
Limit: lower = 40 Number of obs = 178
upper = +inf Wald chi2(3) = 89.85
Log likelihood = -574.53056 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
achiv | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -2.290933 1.490333 -1.54 0.124 -5.211932 .6300663
langscore | 5.064698 1.037769 4.88 0.000 3.030709 7.098688
mathscore | 5.004054 .9555717 5.24 0.000 3.131168 6.87694
_cons | -.2940047 6.204858 -0.05 0.962 -12.4553 11.86729
-------------+----------------------------------------------------------------
/sigma | 7.739053 .5476443 14.13 0.000 6.66569 8.812416
------------------------------------------------------------------------------
The ll() option in the truncreg command indicates the value at which the left truncation
take place. There is also a ul() option to indicate the value of the right truncation which
was not needed in this example.truncreg achiv female langscore mathscore, ll(40) robust
(note: 0 obs. truncated)
Fitting full model:
Iteration 0: log pseudolikelihood = -580.98553
Iteration 1: log pseudolikelihood = -574.83026
Iteration 2: log pseudolikelihood = -574.53094
Iteration 3: log pseudolikelihood = -574.53056
Iteration 4: log pseudolikelihood = -574.53056
Truncated regression
Limit: lower = 40 Number of obs = 178
upper = +inf Wald chi2(3) = 117.07
Log pseudolikelihood = -574.53056 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Robust
achiv | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -2.290933 1.371207 -1.67 0.095 -4.97845 .3965837
langscore | 5.064698 1.061057 4.77 0.000 2.985065 7.144332
mathscore | 5.004054 .9422474 5.31 0.000 3.157283 6.850825
_cons | -.2940047 5.603371 -0.05 0.958 -11.27641 10.6884
-------------+----------------------------------------------------------------
/sigma | 7.739053 .4840165 15.99 0.000 6.790398 8.687708
------------------------------------------------------------------------------
The output looks very much like the output from an OLS regression. The output begins
with a note indicating that zero observations were truncated. This is because our
sample contained no data with values less than 40 for achievement. The note is
followed by the iteration log giving the values of the log pseudolikelihoods starting
with a model that has no predictors. The last value in the log is the final value
of the log pseudolikelihood and is repeated below.test female langscore mathscore
( 1) [eq1]female = 0
( 2) [eq1]langscore = 0
( 3) [eq1]mathscore = 0
chi2( 3) = 117.07
Prob > chi2 = 0.0000
truncreg produces neither an R2 nor a pseudo-R2. You can compute
a rough estimate of the degree of association by correlating achiv with the predicted
value and squaring the result.
predict p
correlate p achiv
(obs=178)
| p achiv
-------------+------------------
p | 1.0000
achiv | 0.6517 1.0000
display r(rho)^2
.42480305
The calculated value of .42 is rough estimate of the R2 you would find in an OLS
regression. Sample Write-Up of the Analysis
Before we begin the sample write-up we need to get the output into a form more acceptable for
publication. The estout command (findit estout by Ben Jann of ETH Zurich), will get
us closer to what we want.
estout, cells(b(star fmt(%8.2f)) se(par fmt(%8.2f))) stats(ll chi2, fmt(%8.2f))
----------------------------
b/se
----------------------------
eq1
female -2.29
(1.37)
langscore 5.06***
(1.06)
mathscore 5.00***
(0.94)
_cons -0.29
(5.60)
----------------------------
sigma
_cons 7.74***
(0.48)
----------------------------
ll -574.53
chi2 117.07
----------------------------
With a little bit of manual editing and remembering to include the rough R2
we can produce an acceptable table of the output.
model
female -2.29
(1.37)
lang_scr 5.06***
(1.06)
math_scr 5.00***
(0.94)
constant -0.29
(5.60)
standard error
of estimate 7.74
log psuedo-
likelihood -574.53
chi-squared 117.07
estimated
R-squared 0.42
legend: coefficient/(standard error) *** p<0.001
The truncated regression model predicting achievement from language scores, math scores and gender
was statistically significant (chi-squared = 117.07, df = 3, p<.001). The predictors
language and math were each statistically significant at the .001 level. The effect of gender
was not significant at the .05 level.
The squared correlation between
the observed and predicted academic aptitude values was 0.42 indicating that these three predictors
accounted for over 40% of the variability in the outcome variable. A unit change in
language and math lead to a 5.06 and 5.00 increase in predicted achievement, respectively.
The effect for gender, although not significant, resulted in predicted achievement scores
for females that were only 2.29 points lower than males.
Cautions, Flies in the Ointment
See Also
Greene, W. H. 2003. Econometric Analysis, Fifth Edition. Upper Saddle River, NJ: Prentice
Hall.
Long, J. S. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications.
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California.