UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS Frequently Asked Questions 

How can I find where to split a piecewise regression?

It is not uncommon to believe a variable x predicts a variable y differently over certain ranges of x.  In such instances, you may wish to fit a piecewise regression model. The simplest scenario would be fitting two adjoined lines: one line defines the relationship of y and x for x <= c and the other line defines the relationship for x > c.  For this scenario, we can use proc nlin to find the value of c that yields the best fitting model. 

We can begin by creating a dataset with an outcome Y and a predictor X. We have borrowed this example data from SAS examples. 


data a;
  x=-0.000001;
  do i=0 to 199;
    if mod(i,50)=0 then do;
      c=((x/2)-5)**2;
      if i=150 then c=c+5;
      y=c;
      end;
    x=x+0.1;
    y=y-sin(x-c);
	output;
    end;
run;

proc print data = a (obs = 5); run;
Obs       X       I       C          Y

  1    0.10000    0    25.0000    24.7694
  2    0.20000    1    25.0000    24.4427
  3    0.30000    2    25.0000    24.0234
  4    0.40000    3    25.0000    23.5155
  5    0.50000    4    25.0000    22.9241
proc gplot data = a;
  plot y*x;
run;
 

We might look at this plot and believe that there is a downward trend in y as x increases up to a certain point in x.  After that point, there is an upward trend in y.  Let's consider the set of parameters we will need to fit.  Our first line will involve a slope and an intercept (a1 and b1); our second line will also involve a slope (b2) and we can think of the point at which it meets the first line as its "intercept" defined by the first intercept, the first slope, and the point at which the lines meet (c).  We want to estimate four total parameters:  two slopes, an intercept, and a cut point.  We can indicate these parameters in proc nlin and provide starting points for each parameter based on the plot above.

proc nlin data = a;
  parms a1=25 b1=-2 c=10 b2=2;
  ypart = a1 + b1*x;
  if (x > c) then do;
    ypart = a1 + c*(b1-b2) + b2*x;
  end;
  model y = ypart;
run;

The NLIN Procedure

                                  Sum of        Mean               Approx
Source                    DF     Squares      Square    F Value    Pr > F
Model                      3      8770.6      2923.5      69.90    <.0001
Error                    196      8197.3     41.8231
Corrected Total          199     16967.9

                              Approx
Parameter      Estimate    Std Error    Approximate 95% Confidence Limits

a1              18.5311       1.3827     15.8043     21.2579
b1              -1.9205       0.2668     -2.4467     -1.3942
c                8.9876       0.4400      8.1199      9.8554
b2               2.2676       0.1916      1.8898      2.6454

From the proc nlin output, we can see estimates of all four parameters.  We can use the estimate for the cutpoint c to generate a new variable, x2, that will allow us to run an ordinary least squares regression of y on x and x2 that effectively fits a piecewise function.


data a2; set a;
  x2 = x - 8.9876;
  if x < 8.9876 then x2 = 0;
run;

proc reg data = a2;
  model y = x x2;
  output out = a3 p = predicted;
run;
quit;

The REG Procedure

                             Analysis of Variance
                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     2     8770.59800     4385.29900     105.39    <.0001
Error                   197     8197.31882       41.61076
Corrected Total         199          16968


Root MSE              6.45064    R-Square     0.5169
Dependent Mean       12.04335    Adj R-Sq     0.5120
Coeff Var            53.56182


                        Parameter Estimates
                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1       18.53113        1.27575      14.53      <.0001
x             1       -1.92047        0.20230      -9.49      <.0001
x2            1        4.18808        0.32144      13.03      <.0001

In the proc reg output, we can see that we have the same sum of squares we saw in the proc nlin output. We also see that our intercept is unchanged, the coefficient for x matches the first slope from proc nlin, and the coefficient for x2 is equal to (b2 - b1).

We can plot the predicted values from the regression above.

proc gplot data = a3;
  plot (y predicted)*x / overlay;
run;

We have found the optimal point to split our piecewise function in this scenario.  The same process could be used if we wished to fit quadratic or cubic terms, as long as we carefully described the function and its parameters in proc nlin.  For more details, see the online documentation.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.