|
|
|
||||
|
Stat Computing > SAS > FAQ
|
Help the Stat Consulting Group by
giving a gift
| ||||
|
Loading
|
|||||
Proc transreg performs transformation regression in which both the outcome and predictor(s) can be transformed and splines can be fit. Psplines are piecewise polynomials that can be used to estimate relationships that are difficult to fit with a single function.
In this page, we will walk through an example proc transreg with the pspline option and explore its defaults. The bspline, spline, and pspline options, when similarly specified, yield the same results. Their differences lie in the number and type of transformed variables generated for estimation. For more information on the other options available, see the SAS Online Documentation.
We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for proc transreg.
data a; x=-0.000001; do i=0 to 199; if mod(i,50)=0 then do; c=((x/2)-5)**2; if i=150 then c=c+5; y=c; end; x=x+0.1; y=y-sin(x-c); output; end; run; proc gplot data = a; plot y*x; run;
Clearly, there is not a single, continuous function relating Y to X. The relationship does not appear random, but it does appear to change with X. Thus it makes sense to try to fit this with splines. Before running the proc transreg, we can see that our data contains four variables:
proc print data = a (obs = 5); run; Obs X I C Y 1 0.10000 0 25.0000 24.7694 2 0.20000 1 25.0000 24.4427 3 0.30000 2 25.0000 24.0234 4 0.40000 3 25.0000 23.5155 5 0.50000 4 25.0000 22.9241
In the proc transreg command, we indicate in the model line that we wish to predict variable y without transformation with identity(y). If we wished to model a transformed version of y (the log or rank of y, for example), we would indicate the transformation here. To predict y, we indicate that we wish to use piecewise polynomial functions of x with pspline(x). We also opted to output a dataset, a2, containing predicted values from the model.
proc transreg data=a;
model identity(y) = pspline(x);
output out = a2 predicted;
run;
The TRANSREG Procedure
TRANSREG Univariate Algorithm Iteration History for Identity(Y)
Iteration Average Maximum Criterion
Number Change Change R-Square Change Note
-------------------------------------------------------------------------
1 0.00000 0.00000 0.46884 Converged
We can see in the outcome above that the model converged and has an R-squared value of 0.47. Let's look at the dataset output by proc transreg.
proc print data = a2 (obs = 5); run;Obs _TYPE_ _NAME_ Y TY PY Intercept X_1 X_2 1 SCORE ROW1 24.7694 24.7694 24.1144 1 0.10000 0.01000 2 SCORE ROW2 24.4427 24.4427 23.4722 1 0.20000 0.04000 3 SCORE ROW3 24.0234 24.0234 22.8424 1 0.30000 0.09000 4 SCORE ROW4 23.5155 23.5155 22.2249 1 0.40000 0.16000 5 SCORE ROW5 22.9241 22.9241 21.6195 1 0.50000 0.25000 Obs X_3 TIntercept TX_1 TX_2 TX_3 X 1 0.00100 1 0.10000 0.01000 0.00100 0.10000 2 0.00800 1 0.20000 0.04000 0.00800 0.20000 3 0.02700 1 0.30000 0.09000 0.02700 0.30000 4 0.06400 1 0.40000 0.16000 0.06400 0.40000 5 0.12500 1 0.50000 0.25000 0.12500 0.50000
In addition to adding the predicted values, py, to the dataset, we can see that a new variable, ty, has been added for the "transformed" value of y (since our transformation was the identity, these values are the same as y); three variables (x_1, x_2, x_3) that are the powers of x have been added. Transformations of these three variables and the intercept are also included and indicated with a 't'. We can see that, by default, SAS fits a single third-degree polynomial in x to y. Note that though splines are often used to fit piecewise functions, the default setting when using pspline in proc transreg is to estimate just one function (zero knots).
We can plot the predicted values to see how closely they match the original data.
legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; proc gplot data = a2; plot (y py)*x / overlay legend = legend; run;
For this simple example, we could achieve the same result by running an ordinary least squares regression after transforming x in the same manner as proc transreg.
data a3; set a; x2 = x*x; x3 = x*x*x; run; proc reg data = a3; model y = x x2 x3; run;The REG Procedure Model: MODEL1 Dependent Variable: Y Number of Observations Read 200 Number of Observations Used 200 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 7955.26078 2651.75359 57.67 <.0001 Error 196 9012.65604 45.98294 Corrected Total 199 16968 Root MSE 6.78107 R-Square 0.4688 Dependent Mean 12.04335 Adj R-Sq 0.4607 Coeff Var 56.30551 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 24.76908 1.95451 12.67 <.0001 X 1 -6.60903 0.84002 -7.87 <.0001 x2 1 0.62721 0.09698 6.47 <.0001 x3 1 -0.01513 0.00317 -4.77 <.0001
In this example, using proc transreg only saves us the step of generating variables. However, we may wish to fit more than one function in a piecewise regression or use more complicated transformations of x. Doing so with data and proc reg steps quickly becomes unmanageable or impossible, while doing so with proc transreg is effective and efficient.
Note that the default settings for bspline, spline, and pspline yield identical fitted values.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services