UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Data Analysis Examples
Interval Regression

Examples of Interval Regression

Example 1. We wish to model annual income using years of education and marital status. However, we do not have access to the precise values for income, we only have data on the income ranges: <$15,000, $15,000-$25,000, $25,000-$50,000, $50,000-$75,000, $75,000-$100,000, and >$100,000. Note that the extreme values of the categories on either end of the range are either left-censored or right-censored. The other categories are interval censored, that is, each interval is both left and right censored. Analyses of this type require a generalization of censored regression known as interval regression.

Example 2. We wish to predict GPA from teacher ratings of effort and from reading and writing test scores. The measure of GPA is a self-report response to to the following item:

Select the category that best represents your overall gpa.
  less than 2.0
  2.0 to 2.5
  2.5 to 3.0
  3.0 to 3.4
  3.4 to 3.8
  3.8 to 3.9
  4.0 or greater
Again, we have a situation with both interval censoring and left- and right-censoring. We do not know the exact value of GPA for each student, we only know the interval in which their GPA falls.

Example 3. We wish to predict GPA from teacher ratings of effort and from reading and writing test scores. The measure of GPA is a self-report response to to the following item:

Select the category that best represents your overall gpa.
  0.0 to 2.0
  2.0 to 2.5
  2.5 to 3.0
  3.0 to 3.4
  3.4 to 3.8
  3.8 to 4.0
This is a slight variation of Example 2, in which there is only interval censoring.

Description of the Data

Let's pursue Example 3 from above.

We have a hypothetical data file, intregex.dta with 30 observations. The GPA score is represented by two values, the lower interval score (lgpa) and the upper interval score (ugpa). The reading, writing test scores and the teacher rating are read, write and rating respectively.

Let's look at the data.

Note that there are two GPA responses for each observation, lgpa for the lower end of the interval and ugpa for the upper end. Graphing these data can be rather tricky. So just to get an idea of what the distribution of GPA is we will do separate histograms for lgpa and ugpa.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a interval regression analysis, let's consider some other methods that you might use.

Stata Interval Regression

Just to be a bit more conservative we will run the model again using the robust option to obtain the Huber-White robust standard errors. The output looks very much like the output from an OLS regression. At the top, the number of observations used in the analysis is given along a likelihood-ratio chi-squared. The likelihood-ratio chi-squared tests the difference between the full model (with predictors) and the constant only model. Below that is the p-value for the chi-squared with three degrees of freedom. Obviously this model, as a whole, is statistically significant. Next comes the log-likelihood for the full model.

The intreg command does not compute an R2 or pseudo-R2. You can compute a rough-and-ready measure of fit by calculating the R2 between the predicted and observed values.

The calculated values of approximately .60 are probably close to what you would find in an OLS regression if you had actual GPA scores. You can also make use of the Long and Freese utility command fitstat (findit spostado), which provides a number of pseudo-R2's in addition to other measures of fit. The McKelvey-Zavoina pseudo-R2's, which computes the R2 using the variances of the latent variable and the latent predicted variable [ Var(predicted-y*)/Var(y*) ], is close to our rough-and-ready estimates above. In the main body of the table we have the interval regression coefficients, the standard error of the coefficients, a Wald t-test (coefficient/se) and the p-value associated with each t-test. By default, we also get a 95% confidence interval for the coefficients. With the level() option you can request a different confidence interval.

The ancillary statistic /sigma is equivalent to the standard error of estimate in OLS regression. The value of 0.34 can be compared to the standard deviations for lgpa and ugpa of 0.78 and 0.57. This shows a substantial reduction. The output also contains an estimate of the standard error of sigma as well as a 95% confidence interval. Stata does not compute sigma directly but actually computes the log of sigma (/lnsigma in the output).

Finally, the output provides a summary of the number of left-censored, uncensored, right-censored and interval-censored values.

Sample Write-Up of the Analysis

Before we begin the sample write-up we need to get the output into a form more acceptable for publication. The estout command (findit estout by Ben Jann of ETH Zurich), will get us closer to what we want. With a little bit of manual editing and remembering to change to add the McKelvey-Zavoina pseudo-R2 we can produce an acceptable table of the output. The interval regression model predicting GPA from reading, writing and teacher ratings was statistically significant (chi-squared = 54.85, df = 3, p < 0.001). Both the reading and writing test scores were statically significant at the 0.001 and 0.05 levels, respectively. The teacher ratings were not significant (p = 0.12). The McKelvey and Zavoina pseudo-R2 was 0.68 indicating that these three predictors accounted for approximately 68% of the variability in the latent outcome variable. A unit change in reading and writing lead to a 0.0053 and 0.0023 increase in the predicted GPA, respectively.

Additional Example

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California