|
|
|
||||
|
Stat Computing > SAS > FAQ
|
|
||||
It is not uncommon to believe a variable x predicts a variable y differently over certain ranges of x. In such instances, you may wish to fit a piecewise regression model. Proc transreg can determine the optimal locations for the pieces to begin and end.
In this page, we will walk through an few examples using proc transreg to help with piecewise regression. For more information on the options available, see the SAS Online Documentation.
We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for proc transreg.
data a; x=-0.000001; do i=0 to 199; if mod(i,50)=0 then do; c=((x/2)-5)**2; if i=150 then c=c+5; y=c; end; x=x+0.1; y=y-sin(x-c); output; end; run; proc print data = a (obs = 5); run; Obs X I C Y 1 0.10000 0 25.0000 24.7694 2 0.20000 1 25.0000 24.4427 3 0.30000 2 25.0000 24.0234 4 0.40000 3 25.0000 23.5155 5 0.50000 4 25.0000 22.9241proc gplot data = a; plot y*x; run;
We might look at this plot and believe that there is a downward trend in y as x increases up to a certain point in x. After that point, there is an upward trend in y. We will want to fit two slopes, but we are unsure of the optimal location to switch from one to the other (such a location is referred to as a "knot" when discussing splines). When we refer to a certain location being "optimal", we are saying that choosing that location yields a better model fit than any other available location.
By indicating the number of knots and the degrees of the functions we wish to fit, we can use proc transreg to find the optimal knot location. This can be done using the nknots and degree options in the model statement. We have opted for pspline because it provides the most comprehensive and easy-to-understand transformation variables
proc transreg data=A;
model identity(Y) = pspline(X / nknots = 1 degree = 1);
output out = A2 predicted;
run;
The TRANSREG Procedure
TRANSREG Univariate Algorithm Iteration History for Identity(Y)
Iteration Average Maximum Criterion
Number Change Change R-Square Change Note
-------------------------------------------------------------------------
1 0.00000 0.00000 0.47545 Converged
We can see in the outcome above that the model converged and has an R-squared value of 0.47545. We can plot the predicted values to see how closely they match the original data.
legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2;
proc gplot data = a2;
plot (y py)*x / overlay legend = legend;
run;

We can see that our fitted values consist of one linear function of x for values of x less than the knot and one linear function of x for values of x greater than the knot. The knot appears to fall around 10, but it is hard to tell from this plot its exact location. We can find it by examining the dataset output by proc transreg.
proc print data = a2 (obs = 5); run;Obs _TYPE_ _NAME_ Y TY PY Intercept X_1 X_2 TIntercept TX_1 TX_2 X 1 SCORE ROW1 24.7694 24.7694 16.6302 1 0.10000 0 1 0.10000 0 0.10000 2 SCORE ROW2 24.4427 24.4427 16.4870 1 0.20000 0 1 0.20000 0 0.20000 3 SCORE ROW3 24.0234 24.0234 16.3437 1 0.30000 0 1 0.30000 0 0.30000 4 SCORE ROW4 23.5155 23.5155 16.2004 1 0.40000 0 1 0.40000 0 0.40000 5 SCORE ROW5 22.9241 22.9241 16.0571 1 0.50000 0 1 0.50000 0 0.50000
In addition to adding the predicted values, py, to the dataset, we can see that a new variable, ty, has been added for the "transformed" value of y (since our transformation was the identity, these values are the same as y); also, two transformations of x, x_1 and x_2, have been added to the data. For all values of x that fall before the knot, x_2 is equal to zero so that only x_1 contributes to the linear fit. To find the location of the knot, we sort our dataset by x and find the first value at which x_2 is not zero.
proc sort data = a2; by x; run; proc print data = a2 (obs = 1); where x_2 > 0; run; Obs _TYPE_ _NAME_ Y TY PY Intercept X_1 X_2 TIntercept TX_1 TX_2 X 102 SCORE ROW102 1.32494 1.32494 2.54915 1 10.2000 0.1 1 10.2000 0.1 10.2000
From this, we can see that our second line begins at x = 10.2. Thus, we have identified the optimal location for our functions to meet. The same process could be used with more than one knot and a very similar process could be used with more than one degree. For more details on using splines in SAS, see the online documentation.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services