UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Data Analysis Examples
Interval Regression

Examples of Interval Regression

Example 1. We wish to model annual income using years of education and marital status. However, we do not have access to the precise values for income, we only have data on the income ranges: <$15,000, $15,000-$25,000, $25,000-$50,000, $50,000-$75,000, $75,000-$100,000, and >$100,000. Note that the extreme values of the categories on either end of the range are either left-censored or right-censored. The other categories are interval censored, that is, each interval is both left and right censored. Analyses of this type require a generalization of censored regression known as interval regression.

Example 2. We wish to predict GPA from teacher ratings of effort and from reading and writing test scores. The measure of GPA is a self-report response to to the following item:

Select the category that best represents your overall gpa.
  less than 2.0
  2.0 to 2.5
  2.5 to 3.0
  3.0 to 3.4
  3.4 to 3.8
  3.8 to 3.9
  4.0 or greater
Again, we have a situation with both interval censoring and left- and right-censoring. We do not know the exact value of GPA for each student, we only know the interval in which their GPA falls.

Example 3. We wish to predict GPA from teacher ratings of effort and from reading and writing test scores. The measure of GPA is a self-report response to to the following item:

Select the category that best represents your overall gpa.
  0.0 to 2.0
  2.0 to 2.5
  2.5 to 3.0
  3.0 to 3.4
  3.4 to 3.8
  3.8 to 4.0
This is a slight variation of Example 2, in which there is only interval censoring.

Description of the Data

Let's pursue Example 3 from above.

We have a hypothetical data file, intregex.sas7bdat , with 30 observations.  The GPA score is represented by two values, the lower interval score (lgpa) and the upper interval score (ugpa).  The reading, writing test scores and the teacher rating are read, write and rating respectively.

Let's look at the data.

Note that there are two GPA responses for each observation, lgpa for the lower end of the interval and ugpa for the upper end. Graphing these data can be rather tricky.  So just to get an idea of what the distribution of GPA is, we will do separate histograms for lgpa and ugpa.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a interval regression analysis, let's consider some other methods that you might use.

SAS Interval Regression

At the top of the output, we see information about the model and the data set.  This includes a summary of the number of left-censored, uncensored, right-censored and interval-censored values.  After that, the output looks very much like the output from an OLS regression.  In the table entitled "Type III Analysis of Effects," we see each variable in the model along with its degrees of freedom, Wald chi-square and p-value.  In this example, both write and read are statistically significant.  In the table entitled "Analysis of Parameter Estimates," we have the interval regression coefficients, the standard error of the coefficients, the 95% confidence intervals for the coefficients, a chi-square test and the associated p-value.  In this example, the information in the last two tables is redundant.  If we had used a categorical variable with more than two levels, the information in the two tables would not be redundant.  Rather, we would see the multi degree of freedom test in the Type III Analysis of Effects and from that would see if the variable as a whole was statistically significant, while in the Analysis of Parameter Estimates table, we would see the coefficients for each dummy variable.

The ancillary statistic scale is equivalent to the standard error of estimate in OLS regression.  The value of 0.34 can be compared to the standard deviations for lgpa and ugpa of 0.78 and 0.57.  This shows a substantial reduction.  The output also contains an estimate of the standard error of sigma as well as a 95% confidence interval.

The lifereg procedure does not compute an R2 or pseudo-R2. You can compute a rough-and-ready measure of fit by calculating the R2 between the predicted and observed values.

Sample Write-Up of the Analysis

Both the reading and writing test scores were statically significant at the 0.001 and 0.05 levels, respectively.  The teacher ratings were not significant (p = 0.09).  A unit change in reading and writing lead to a 0.0053 and 0.0023 increase in the predicted GPA, respectively.

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California