|
|
|
||||
|
|
|||||
Example 1. A researcher has data for a sample of employed persons and wishes to model wages as predicted by years of schooling and gender. Since the sample excludes individuals who are not employed, the data can be considered to be truncated at zero, i.e., wages need to be greater than zero to be included in the sample.
Example 2. A study of students in a special GATE (gifted and talented education) program wishes to model achievement as a function of gender, language skills and math skills. A major concern is that students require a minimum achievement score of 40 to enter the special program. Thus, the sample is truncated at an achievement score of 39.
We have a hypothetical data file, truncreg2.sas7bdat
, with 178 observations.
The achievement variable is achiv
Let's look at the data.
The output looks very much like the output
from an OLS regression. In the upper right of the output we can see that
zero observations were truncated. This is because our sample contained no
data with values less than 40 for achievement. In the Parameter Estimates
table we have the truncated regression coefficients, the standard error of the
coefficients, a t-Value and the associated p-value. The ancillary
statistic _sigma is equivalent to the standard error of estimate in OLS
regression. The value of 7.74 can be compared to the standard deviation
of achievement, which was 8.96. This shows a modest reduction. The
output also contains an estimate of the standard error of _sigma, as well as the
t-Value and p-value. That _sigma is statistically significant means that
7.74 is statistically significantly different from 0. The validity of this
test of _sigma is a matter of debate among statisticians, and some programs will
produce the estimate and standard error, but not the test of statistical
significance. The calculated value of .42 is rough estimate of the R2 you would find in an OLS
regression.
The predictors
language and math were each statistically significant at the .001 level. The effect of gender
was not significant at the .05 level. The squared correlation between
the observed and predicted academic aptitude values was 0.42, indicating that these three predictors
accounted for over 40% of the variability in the outcome variable. A unit change in
language and math lead to a 5.06 and 5.00 increase in predicted achievement, respectively. The effect for gender, although not
statistically significant, resulted in predicted achievement scores
for females that were only 2.29 points lower than males. UCLA Researchers are invited to our Statistical Consulting Servicesproc means data = truncreg2;
run;
The MEANS Procedure
Variable N Mean Std Dev Minimum Maximum
--------------------------------------------------------------------------------
ID 178 103.6235955 57.0895709 3.0000000 200.0000000
ACHIV 178 54.2359551 8.9632299 41.0000000 76.0000000
FEMALE 178 0.5505618 0.4988401 0 1.0000000
LANGSCORE 178 5.4011236 0.8944896 3.0999999 6.6999998
MATHSCORE 178 5.3028090 0.9483515 3.0999999 7.4000001
--------------------------------------------------------------------------------
proc univariate data = truncreg2 plot;
var achiv;
histogram / normal;
run;
The UNIVARIATE Procedure
Variable: ACHIV
Moments
N 178 Sum Weights 178
Mean 54.2359551 Sum Observations 9654
Std Deviation 8.96322994 Variance 80.3394909
Skewness 0.4629984 Kurtosis -0.7907047
Uncorrected SS 537814 Corrected SS 14220.0899
Coeff Variation 16.5263614 Std Error Mean 0.67182249
Basic Statistical Measures
Location Variability
Mean 54.23596 Std Deviation 8.96323
Median 52.00000 Variance 80.33949
Mode 47.00000 Range 35.00000
Interquartile Range 16.00000
< some output omitted to save space >
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
41 172 73 65
41 105 73 101
42 175 73 103
42 170 76 29
42 153 76 119
Stem Leaf # Boxplot
76 00 2 |
74 |
72 00000 5 |
70 00 2 |
68 00000000000 11 |
66 0 1 |
64 000000000 9 |
62 0000000000000000 16 +-----+
60 0000000000 10 | |
58 | |
56 00000000000000 14 | |
54 00000000000000 14 | + |
52 000000000000000 15 *-----*
50 000000000000000000 18 | |
48 0 1 | |
46 0000000000000000000000000000 28 +-----+
44 000000000000000 15 |
42 000000000000000 15 |
40 00 2 |
----+----+----+----+----+---
Normal Probability Plot
77+ **
| ++
| ***+*
| *++
| *****+
| * ++
| ***+
| ***+
| **++
59+ ++
| +***
| +**
| +***
| ****
| ++*
| ******
| *****
| * *******++
41+** ++
+----+----+----+----+----+----+----+----+----+----+
proc freq data = truncreg2;
tables female;
run;The FREQ Procedure
Cumulative Cumulative
FEMALE Frequency Percent Frequency Percent
-----------------------------------------------------------
0 80 44.94 80 44.94
1 98 55.06 178 100.00
proc corr data = truncreg2 nosimple;
var langscore mathscore female achiv;
run;
The CORR Procedure
4 Variables: LANGSCORE MATHSCORE FEMALE ACHIV
Pearson Correlation Coefficients, N = 178
Prob > |r| under H0: Rho=0
LANGSCORE MATHSCORE FEMALE ACHIV
LANGSCORE 1.00000 0.50517 0.24551 0.52650
<.0001 0.0010 <.0001
MATHSCORE 0.50517 1.00000 -0.19317 0.58727
<.0001 0.0098 <.0001
FEMALE 0.24551 -0.19317 1.00000 -0.09366
0.0010 0.0098 0.2137
ACHIV 0.52650 0.58727 -0.09366 1.00000
<.0001 <.0001 0.2137
proc insight data = truncreg2;
scatter langscore mathscore achiv * langscore mathscore achiv;
run;
quit;

Some Strategies You Might Be Tempted To Try
Before we show how you can analyze this with a truncated regression analysis, let's
consider some other methods that you might use.
SAS Truncated Regression Analysis
proc qlim data=truncreg2;
model achiv = female langscore mathscore;
endogenous achiv ~ truncated (lb=40);
run;
The QLIM Procedure
Summary Statistics of Continuous Responses
N Obs N Obs
Standard Lower Upper Lower Upper
Variable Mean Error Type Bound Bound Bound Bound
achiv 54.23596 8.963230 Truncated 40
Model Fit Summary
Number of Endogenous Variables 1
Endogenous Variable achiv
Number of Observations 178
Log Likelihood -574.53056
Maximum Absolute Gradient 2.72145E-6
Number of Iterations 12
AIC 1159
Schwarz Criterion 1175
Algorithm converged.
Parameter Estimates
Standard Approx
Parameter Estimate Error t Value Pr > |t|
Intercept -0.293996 6.204858 -0.05 0.9622
FEMALE -2.290930 1.490333 -1.54 0.1242
LANGSCORE 5.064697 1.037769 4.88 <.0001
MATHSCORE 5.004053 0.955571 5.24 <.0001
_Sigma 7.739052 0.547644 14.13 <.0001
The lb= option on the endogenous statement indicates the value at which the left truncation
take place. There is also a ub= option to indicate the value of the right truncation, which
was not needed in this example.
proc qlim data=truncreg2;
model achiv = female langscore mathscore;
endogenous achiv ~ truncated (lb=40);
output out = temp_trunc predicted;
run;
ods output PearsonCorr=corr;
proc corr data = temp_trunc nosimple;
var achiv p_achiv;
run;
data _null_;
set corr;
if variable = "ACHIV";
file print;
a = round((p_achiv)**2, .0001);
put "The squared multiple correlation between achieve and the predicted value is " a;
run;
The squared multiple correlation between achieve and the predicted value is 0.4247
Sample Write-Up of the Analysis
Cautions, Flies in the Ointment
See Also
Greene, W. H. 2003. Econometric Analysis, Fifth Edition. Upper Saddle River, NJ: Prentice
Hall.
Long, J. S. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications.
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California