UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS Frequently Asked Questions 

SAS FAQ:
How can I use PROC TRANSREG to find where to split a piecewise regression?

It is not uncommon to believe a variable x predicts a variable y differently over certain ranges of x.  In such instances, you may wish to fit a piecewise regression model. Proc transreg can determine the optimal locations for the pieces to begin and end. 

In this page, we will walk through an few examples using proc transreg to help with piecewise regression.  For more information on the options available, see the SAS Online Documentation.

We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for proc transreg


data a;
  x=-0.000001;
  do i=0 to 199;
    if mod(i,50)=0 then do;
      c=((x/2)-5)**2;
      if i=150 then c=c+5;
      y=c;
      end;
    x=x+0.1;
    y=y-sin(x-c);
	output;
    end;
run;

proc print data = a (obs = 5); run;
Obs       X       I       C          Y

  1    0.10000    0    25.0000    24.7694
  2    0.20000    1    25.0000    24.4427
  3    0.30000    2    25.0000    24.0234
  4    0.40000    3    25.0000    23.5155
  5    0.50000    4    25.0000    22.9241
proc gplot data = a;
  plot y*x;
run;
 

We might look at this plot and believe that there is a downward trend in y as x increases up to a certain point in x.  After that point, there is an upward trend in y.  We will want to fit two slopes, but we are unsure of the optimal location to switch from one to the other (such a location is referred to as a "knot" when discussing splines).  When we refer to a certain location being "optimal", we are saying that choosing that location yields a better model fit than any other available location. 

By indicating the number of knots and the degrees of the functions we wish to fit, we can use proc transreg to find the optimal knot location.  This can be done using the nknots and degree options in the model statement.  We have opted for pspline because it provides the most comprehensive and easy-to-understand transformation variables

proc transreg data=A;
   model identity(Y) = pspline(X / nknots = 1 degree = 1);
   output out = A2 predicted;
run;

The TRANSREG Procedure

     TRANSREG Univariate Algorithm Iteration History for Identity(Y)
Iteration    Average    Maximum                Criterion
   Number     Change     Change    R-Square       Change    Note
-------------------------------------------------------------------------
        1    0.00000    0.00000     0.47545                 Converged

We can see in the outcome above that the model converged and has an R-squared value of 0.47545.  We can plot the predicted values to see how closely they match the original data.

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; 
proc gplot data = a2;
  plot (y py)*x / overlay legend = legend;
run;

We can see that our fitted values consist of one linear function of x for values of x less than the knot and one linear function of x for values of x greater than the knot.  The knot appears to fall around 10, but it is hard to tell from this plot its exact location.  We can find it by examining the dataset output by proc transreg.

proc print data = a2 (obs = 5); run;
Obs _TYPE_ _NAME_    Y       TY      PY   Intercept   X_1   X_2 TIntercept   TX_1  TX_2    X

  1 SCORE   ROW1  24.7694 24.7694 16.6302     1     0.10000  0       1     0.10000   0  0.10000
  2 SCORE   ROW2  24.4427 24.4427 16.4870     1     0.20000  0       1     0.20000   0  0.20000
  3 SCORE   ROW3  24.0234 24.0234 16.3437     1     0.30000  0       1     0.30000   0  0.30000
  4 SCORE   ROW4  23.5155 23.5155 16.2004     1     0.40000  0       1     0.40000   0  0.40000
  5 SCORE   ROW5  22.9241 22.9241 16.0571     1     0.50000  0       1     0.50000   0  0.50000

In addition to adding the predicted values, py, to the dataset, we can see that a new variable, ty,  has been added for the "transformed" value of y (since our transformation was the identity, these values are the same as y); also, two transformations of x, x_1 and x_2, have been added to the data.  For all values of x that fall before the knot, x_2 is equal to zero so that only x_1 contributes to the linear fit.  To find the location of the knot, we sort our dataset by x and find the first value at which x_2 is not zero. 

proc sort data = a2;
  by x;
run;

proc print data = a2 (obs = 1);
  where x_2 > 0; 
run;

Obs _TYPE_ _NAME_    Y       TY      PY   Intercept   X_1   X_2 TIntercept   TX_1  TX_2    X
102 SCORE  ROW102 1.32494 1.32494 2.54915     1     10.2000 0.1      1     10.2000  0.1 10.2000

From this, we can see that our second line begins at x = 10.2.  Thus, we have identified the optimal location for our functions to meet.  The same process could be used with more than one knot and a very similar process could be used with more than one degree.  For more details on using splines in SAS, see the online documentation.

See also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.