UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Frequently Asked Questions 

SAS FAQ:
When using PROC TRANSREG, what are the defaults with bspline?

Proc transreg performs transformation regression in which both the outcome and predictor(s) can be transformed and splines can be fit. Splines are piecewise polynomials that can be used to estimate relationships that are difficult to fit with a single function. 

In this page, we will walk through an example proc transreg with the bspline option and explore its defaults.  The bspline, spline, and pspline options, when similarly specified, yield the same results.  Their differences lie in the number and type of transformed variables generated for estimation.  For more information on the other options available, see the SAS Online Documentation.

We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for proc transreg


data a;
  x=-0.000001;
  do i=0 to 199;
    if mod(i,50)=0 then do;
      c=((x/2)-5)**2;
      if i=150 then c=c+5;
      y=c;
      end;
    x=x+0.1;
    y=y-sin(x-c);
	output;
    end;
run;

proc gplot data = a;
  plot y*x;
run;
 

Clearly, there is not a single, continuous function relating Y to X.  The relationship does not appear random, but it does appear to change with X.  Thus it makes sense to try to fit this with splines.  Before running the proc transreg, we can see that our data contains four variables:

proc print data = a (obs = 5); run;

Obs       X       I       C          Y
  1    0.10000    0    25.0000    24.7694
  2    0.20000    1    25.0000    24.4427
  3    0.30000    2    25.0000    24.0234
  4    0.40000    3    25.0000    23.5155
  5    0.50000    4    25.0000    22.9241

In the proc transreg command, we indicate in the model line that we wish to predict variable y without transformation with identity(y). If we wished to model a transformed version of y (the log or rank of y, for example), we would indicate the transformation here.  To predict y, we indicate that we wish to expand x into a b-spline basis with bspline(x). We also opted to output a dataset, a2, containing predicted values from the model.

proc transreg data=a;
   model identity(y) = bspline(x);
   output out = a2 predicted;
run;

The TRANSREG Procedure

     TRANSREG Univariate Algorithm Iteration History for Identity(Y)
Iteration    Average    Maximum                Criterion
   Number     Change     Change    R-Square       Change    Note
-------------------------------------------------------------------------
        1    0.00000    0.00000     0.46884                 Converged

We can see in the outcome above that the model converged and has an R-squared value of 0.47.  Let's look at the dataset output by proc transreg.

proc print data = a2 (obs = 5); run;

Obs _TYPE_ _NAME_    Y       TY      PY   Intercept   X_0      X_1          X_2
  1 SCORE   ROW1  24.7694 24.7694 24.1144     .     1.00000 0.000000 7.5759E-27
  2 SCORE   ROW2  24.4427 24.4427 23.4722     .     0.98500 0.014924 .000075375
  3 SCORE   ROW3  24.0234 24.0234 22.8424     .     0.97015 0.029548 .000299977
  4 SCORE   ROW4  23.5155 23.5155 22.2249     .     0.95545 0.043873 .000671523
  5 SCORE   ROW5  22.9241 22.9241 21.6195     .     0.94090 0.057902 .001187727

Obs        X_3  TIntercept    TX_0     TX_1          TX_2        TX_3     X
  1  1.269E-40       .      1.00000  0.000000           0           0  0.10000
  2 .000000127       .      0.98500  0.014924  .000075375  .000000127  0.20000
  3 .000001015       .      0.97015  0.029548  .000299977  .000001015  0.30000
  4 .000003426       .      0.95545  0.043873  .000671523  .000003426  0.40000
  5 .000008121       .      0.94090  0.057902  .001187727  .000008121  0.50000

In addition to adding the predicted values, py, to the dataset, we can see that a new variable, ty,  has been added for the "transformed" value of y (since our transformation was the identity, these values are the same as y); our predictor variable x has been expanded into four variables (x_0, x_1, x_2, x_3) that form the basis. Within an observation, the sum of these four values is one.  The number of variables in the basis is determined by the polynomial degree or number of knots indicated.  SAS generates a basis of (#degrees + #knots + 1) variables and, by default, assumes degree = 3 and zero knots. 

Transformations of these basis variables are also included and indicated with a 't'.  We can plot the predicted values to see how closely they match the original data.

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; 
proc gplot data = a2;
   plot (y py)*x / overlay legend = legend;
run;

Using the basis variables generated by proc transreg to predict y using an ordinary least squares regression model will result in the same R-Squared value as that shown in the proc transreg output.  We can choose any three of the four variables for the model since we know they always sum to one.


proc reg data = a2;
  model y = x_1 x_2 x_3;
run;
The REG Procedure
Model: MODEL1
Dependent Variable: Y

Number of Observations Read         200
Number of Observations Used         200

                             Analysis of Variance
                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3     7955.26078     2651.75359      57.67    <.0001
Error                   196     9012.65604       45.98294
Corrected Total         199          16968


Root MSE              6.78107    R-Square     0.4688
Dependent Mean       12.04335    Adj R-Sq     0.4607
Coeff Var            56.30551

                            Parameter Estimates
                               Parameter      Standard
Variable    Label       DF      Estimate         Error   t Value   Pr > |t|
Intercept   Intercept    1      24.11444       1.88257     12.81     <.0001
X_1         X 1          1     -43.01079       5.44815     -7.89     <.0001
X_2         X 2          1      -3.82721       3.47800     -1.10     0.2725
X_3         X 3          1      -1.67330       2.96550     -0.56     0.5732

It is important to note that these basis variables x_1, x_2, and x_3 are not powers of x as we would see when using the pspline option.

See also

Note that the default settings for bspline, spline, and pspline yield identical fitted values. 


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.