|
|
|
||||
|
Stat Computing > SAS > FAQ
|
|
||||
It is not uncommon to believe a variable x predicts a variable y differently over certain ranges of x. In such instances, you may wish to fit a piecewise regression model. The simplest scenario would be fitting two adjoined lines: one line defines the relationship of y and x for x <= c and the other line defines the relationship for x > c. For this scenario, we can use proc nlin to find the value of c that yields the best fitting model.
We can begin by creating a dataset with an outcome Y and a predictor X. We have borrowed this example data from SAS examples.
data a; x=-0.000001; do i=0 to 199; if mod(i,50)=0 then do; c=((x/2)-5)**2; if i=150 then c=c+5; y=c; end; x=x+0.1; y=y-sin(x-c); output; end; run; proc print data = a (obs = 5); run; Obs X I C Y 1 0.10000 0 25.0000 24.7694 2 0.20000 1 25.0000 24.4427 3 0.30000 2 25.0000 24.0234 4 0.40000 3 25.0000 23.5155 5 0.50000 4 25.0000 22.9241proc gplot data = a; plot y*x; run;
We might look at this plot and believe that there is a downward trend in y as x increases up to a certain point in x. After that point, there is an upward trend in y. Let's consider the set of parameters we will need to fit. Our first line will involve a slope and an intercept (a1 and b1); our second line will also involve a slope (b2) and we can think of the point at which it meets the first line as its "intercept" defined by the first intercept, the first slope, and the point at which the lines meet (c). We want to estimate four total parameters: two slopes, an intercept, and a cut point. We can indicate these parameters in proc nlin and provide starting points for each parameter based on the plot above.
proc nlin data = a;
parms a1=25 b1=-2 c=10 b2=2;
ypart = a1 + b1*x;
if (x > c) then do;
ypart = a1 + c*(b1-b2) + b2*x;
end;
model y = ypart;
run;
The NLIN Procedure
Sum of Mean Approx
Source DF Squares Square F Value Pr > F
Model 3 8770.6 2923.5 69.90 <.0001
Error 196 8197.3 41.8231
Corrected Total 199 16967.9
Approx
Parameter Estimate Std Error Approximate 95% Confidence Limits
a1 18.5311 1.3827 15.8043 21.2579
b1 -1.9205 0.2668 -2.4467 -1.3942
c 8.9876 0.4400 8.1199 9.8554
b2 2.2676 0.1916 1.8898 2.6454
From the proc nlin output, we can see estimates of all four parameters. We can use the estimate for the cutpoint c to generate a new variable, x2, that will allow us to run an ordinary least squares regression of y on x and x2 that effectively fits a piecewise function.
data a2; set a;
x2 = x - 8.9876;
if x < 8.9876 then x2 = 0;
run;
proc reg data = a2;
model y = x x2;
output out = a3 p = predicted;
run;
quit;
The REG Procedure
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 8770.59800 4385.29900 105.39 <.0001
Error 197 8197.31882 41.61076
Corrected Total 199 16968
Root MSE 6.45064 R-Square 0.5169
Dependent Mean 12.04335 Adj R-Sq 0.5120
Coeff Var 53.56182
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 18.53113 1.27575 14.53 <.0001
x 1 -1.92047 0.20230 -9.49 <.0001
x2 1 4.18808 0.32144 13.03 <.0001
In the proc reg output, we can see that we have the same sum of squares we saw in the proc nlin output. We also see that our intercept is unchanged, the coefficient for x matches the first slope from proc nlin, and the coefficient for x2 is equal to (b2 - b1).
We can plot the predicted values from the regression above.
proc gplot data = a3; plot (y predicted)*x / overlay; run;
We have found the optimal point to split our piecewise function in this scenario. The same process could be used if we wished to fit quadratic or cubic terms, as long as we carefully described the function and its parameters in proc nlin. For more details, see the online documentation.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services