Help the Stat Consulting Group by giving a gift

How can I run a piecewise regression in Stata?

Looking at this you are not happy with the nonlinearity that you see in the data, so try to add a quadratic fit.use http://www.ats.ucla.edu/stat/stata/faq/talk, clear twoway (scatter talk age) (lfit talk age)

Thinking about this more, you decide that you think that the amount of time that kids talk on the phone changes dramatically at age 14, and that the slope might change at that age as well. You think that a piecewise regression might make more sense, where before age 14 there is an intercept and linear slope, and after age 14, there is a different intercept and different linear slope, kind of like pictured below with just freehand drawing of what the two regression lines might look like.twoway (scatter talk age) (lfit talk age) (qfit talk age)

Note how the slopes do seem quite different for the two groups. However, the intercepts don't make much sense, since they are the predicted time talking on the phone when one is 0 years old.* Before age 14 regress talk age if age < 14Source | SS df MS Number of obs = 62 -------------+------------------------------ F( 1, 60) = 3.19 Model | 175.387138 1 175.387138 Prob > F = 0.0791 Residual | 3297.59673 60 54.9599456 R-squared = 0.0505 -------------+------------------------------ Adj R-squared = 0.0347 Total | 3472.98387 61 56.9341618 Root MSE = 7.4135 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .6820981 .3818309 1.79 0.079 -.0816775 1.445874 _cons | 8.074878 3.980529 2.03 0.047 .1126352 16.03712 ------------------------------------------------------------------------------* At age 14 and after regress talk age if age >= 14Source | SS df MS Number of obs = 138 -------------+------------------------------ F( 1, 136) = 144.88 Model | 11570.8699 1 11570.8699 Prob > F = 0.0000 Residual | 10861.5142 136 79.8640747 R-squared = 0.5158 ------------+------------------------------ Adj R-squared = 0.5123 Total | 22432.3841 137 163.74003 Root MSE = 8.9367 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 3.629046 .3014985 12.04 0.000 3.032814 4.225277 _cons | -24.97267 5.709467 -4.37 0.000 -36.26348 -13.68185 ------------------------------------------------------------------------------

Note how the slopes for the two groups stayed the same, but now the intercepts (generate age14 = age - 14 regress talk age14 if age < 14Source | SS df MS Number of obs = 62 -------------+------------------------------ F( 1, 60) = 3.19 Model | 175.387138 1 175.387138 Prob > F = 0.0791 Residual | 3297.59673 60 54.9599456 R-squared = 0.0505 -------------+------------------------------ Adj R-squared = 0.0347 Total | 3472.98387 61 56.9341618 Root MSE = 7.4135 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age14 | .6820981 .3818309 1.79 0.079 -.0816775 1.445874 _cons | 17.62425 1.752455 10.06 0.000 14.11882 21.12968 ------------------------------------------------------------------------------regress talk age14 if age >= 14Source | SS df MS Number of obs = 138 -------------+------------------------------ F( 1, 136) = 144.88 Model | 11570.8699 1 11570.8699 Prob > F = 0.0000 Residual | 10861.5142 136 79.8640747 R-squared = 0.5158 -------------+------------------------------ Adj R-squared = 0.5123 Total | 22432.3841 137 163.74003 Root MSE = 8.9367 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age14 | 3.629046 .3014985 12.04 0.000 3.032814 4.225277 _cons | 25.83397 1.626457 15.88 0.000 22.61755 29.05039 ------------------------------------------------------------------------------

**age1**is the age centered around age 14 but converted to 0s after age 14 (representing the effect of age for before 14 year olds).**age2**is the age centered around age 14 but converted to 0s before 14 (representing the effect of age for after 14 year olds).**int1**is 1 before age 14 (representing the intercept for before 14 year olds).**int2**is 1 after age 14 (representing the intercept for after 14 year olds).

That might have been confusing, so let us show what these variables look like in a table below. Note that we have a strange person who is 13.9999 years old (very very close to being 14, but not quite). This person will be helpful for seeing the effect of the jump from going from being under 14 to being 14.generate age1 = (age - 14) replace age1 = 0 if age >= 14 generate age2 = (age - 14) replace age2 = 0 if age < 14 generate int1 = 1 replace int1 = 0 if age >= 14 generate int2 = 1 replace int2 = 0 if age < 14

Now we are ready to run our combined regression. We use the* Check the coding tablist age int1 int2 age1 age2, sort(v)+--------------------------------------------------+ | age int1 int2 age1 age2 Freq | |--------------------------------------------------| | 5 1 0 -9 0 4 | | 6 1 0 -8 0 4 | | 7 1 0 -7 0 2 | | 8 1 0 -6 0 5 | | 9 1 0 -5 0 6 | | 10 1 0 -4 0 13 | | 11 1 0 -3 0 3 | | 12 1 0 -2 0 13 | | 13 1 0 -1 0 11 | | 13.99999 1 0 -9.54e-06 0 1 | |--------------------------------------------------| | 14 0 1 0 0 11 | | 15 0 1 0 1 2 | | 16 0 1 0 2 15 | | 17 0 1 0 3 20 | | 18 0 1 0 4 12 | | 19 0 1 0 5 25 | | 20 0 1 0 6 8 | | 21 0 1 0 7 22 | | 22 0 1 0 8 16 | | 23 0 1 0 9 7 | +--------------------------------------------------+

Now let's obtain the predicted values (shown in the table below) and relate those to the meaning of the coefficients above.* Run the regresion, compare to try 2 regress talk int1 int2 age1 age2, hasconsSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 210.66 Model | 45655.2691 3 15218.423 Prob > F = 0.0000 Residual | 14159.1109 196 72.2403617 R-squared = 0.7633 -------------+------------------------------ Adj R-squared = 0.7597 Total | 59814.38 199 300.574774 Root MSE = 8.4994 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- int1 | 17.62425 2.009156 8.77 0.000 13.66191 21.58659 int2 | 25.83397 1.54688 16.70 0.000 22.7833 28.88464 age1 | .6820981 .4377618 1.56 0.121 -.1812301 1.545426 age2 | 3.629046 .2867473 12.66 0.000 3.063539 4.194552 ------------------------------------------------------------------------------

**age1**is the slope when age is less than 14. For example, as you go from being age 5 to 6, phone talking goes from 11.48 to 12.17, and that equals .68 (with rounding).**age2**is the slope when age is 14 or higher. For example, as you go from being age 15 to 16, phone talking goes from 29.46 to 33.09, and that equals 3.62 (with rounding).**int1**is the predicted mean for someone who is just infinitely close to being 14 years old (but not quite 14). Note that when someone is 13.9999 their predicted mean is the same as**int1**.**int2**is the predicted mean for someone who just turned 14 years old, and note that 25.83 is the value for**int2**and is the value for the predicted value at exactly age 14.

Here we make a graph of the results.predict yhat tablist age yhat, sort(v) // get this via findit tablist+----------------------------+ | age yhat Freq | |----------------------------| | 5 11.48537 4 | | 6 12.16747 4 | | 7 12.84956 2 | | 8 13.53166 5 | | 9 14.21376 6 | | 10 14.89586 13 | | 11 15.57796 3 | | 12 16.26005 13 | | 13 16.94215 11 | | 13.99999 17.62424 1 | |----------------------------| | 14 25.83397 11 | | 15 29.46302 2 | | 16 33.09206 15 | | 17 36.72111 20 | | 18 40.35015 12 | | 19 43.9792 25 | | 20 47.60825 8 | | 21 51.23729 22 | | 22 54.86634 16 | | 23 58.49538 7 | +----------------------------+

You might want to test whether the difference in the intercepts is 0, so we can do this below. Indeed, as you turn 14 years old, you have a "jump" in the time you talk on the phone, by 8.2 minutes.twoway (scatter talk age) /// (line yhat age if age <14, sort) (line yhat age if age >=14, sort), xline(14)

You can also test whether the slopes are different. The slope after 14 is greater by 2.94, and that difference (2.94) is significantly different from 0.lincom int2 - int1( 1) - int1 + int2 = 0 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 8.20972 2.535655 3.24 0.001 3.209051 13.21039 ------------------------------------------------------------------------------

lincom age2 - age1( 1) - age1 + age2 = 0 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 2.946947 .5233158 5.63 0.000 1.914895 3.979 ------------------------------------------------------------------------------

Using this coding scheme, here is the meaning of the coefficients.regress talk age14 age2 int2Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 210.66 Model | 45655.2691 3 15218.423 Prob > F = 0.0000 Residual | 14159.1109 196 72.2403617 R-squared = 0.7633 -------------+------------------------------ Adj R-squared = 0.7597 Total | 59814.38 199 300.574774 Root MSE = 8.4994 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age14 | .6820981 .4377618 1.56 0.121 -.1812301 1.545426 age2 | 2.946947 .5233158 5.63 0.000 1.914895 3.979 int2 | 8.20972 2.535655 3.24 0.001 3.209051 13.21039 _cons | 17.62425 2.009156 8.77 0.000 13.66191 21.58659 ------------------------------------------------------------------------------

**age14**is the slope when age is less than 14.**age2**is the**change**in the slope as a result of becoming age 14 or higher (as compared to being less than 14). Note how this value of 2.94 corresponds to the**lincom**command above comparing the slope for after 14 to the slope before 14.**_cons**is the predicted mean for someone who is just infinitely close to being 14 years old (but not quite 14).**int2**is the predicted mean for someone who just turned 14 years old minus the predicted mean for someone who is infinitely close to being 14 years old (the jump that occurs at age 14). Note how this corresponds to the result from the**lincom**command above that tested the difference in the intercepts.

Below we compute the predicted values calling them **yhat2**. Note how the predicted values are the same for this model and the prior model, because the
models are essentially the same, they are just parameterized differently.

predict yhat2(option xb assumed; fitted values)tablist age yhat yhat2, sort(v)+---------------------------------------+ | age yhat yhat2 Freq | |---------------------------------------| | 5 11.48537 11.48537 4 | | 6 12.16747 12.16747 4 | | 7 12.84956 12.84956 2 | | 8 13.53166 13.53166 5 | | 9 14.21376 14.21376 6 | |---------------------------------------| | 10 14.89586 14.89586 13 | | 11 15.57796 15.57796 3 | | 12 16.26005 16.26005 13 | | 13 16.94215 16.94215 11 | | 13.99999 17.62424 17.62424 1 | |---------------------------------------| | 14 25.83397 25.83397 11 | | 15 29.46302 29.46302 2 | | 16 33.09206 33.09206 15 | | 17 36.72111 36.72111 20 | | 18 40.35015 40.35015 12 | |---------------------------------------| | 19 43.9792 43.9792 25 | | 20 47.60825 47.60825 8 | | 21 51.23729 51.23729 22 | | 22 54.86634 54.86634 16 | | 23 58.49538 58.49538 7 | +---------------------------------------+

We then run the regression below. Note that the effect formkspline xage1 14 xage2 = age tablist age xage1 xage2, sort(v)+------------------------------------+ | age xage1 xage2 Freq | |------------------------------------| | 5 5 0 4 | | 6 6 0 4 | | 7 7 0 2 | | 8 8 0 5 | | 9 9 0 6 | |------------------------------------| | 10 10 0 13 | | 11 11 0 3 | | 12 12 0 13 | | 13 13 0 11 | | 13.99999 13.99999 0 1 | |------------------------------------| | 14 14 0 11 | | 15 14 1 2 | | 16 14 2 15 | | 17 14 3 20 | | 18 14 4 12 | |------------------------------------| | 19 14 5 25 | | 20 14 6 8 | | 21 14 7 22 | | 22 14 8 16 | | 23 14 9 7 | +------------------------------------+

regress talk xage1 xage2 int2Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 210.66 Model | 45655.2691 3 15218.423 Prob > F = 0.0000 Residual | 14159.1109 196 72.2403617 R-squared = 0.7633 -------------+------------------------------ Adj R-squared = 0.7597 Total | 59814.38 199 300.574774 Root MSE = 8.4994 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- xage1 | .6820981 .4377618 1.56 0.121 -.1812301 1.545426 xage2 | 3.629046 .2867473 12.66 0.000 3.063539 4.194552 int2 | 8.20972 2.535655 3.24 0.001 3.209051 13.21039 _cons | 8.074878 4.5636 1.77 0.078 -.9251856 17.07494 ------------------------------------------------------------------------------

Note that all of the coefficients are the same as the last model, except formkspline yage1 14 yage2 = age, marginal tablist age yage1 yage2, sort(v)+------------------------------------+ | age yage1 yage2 Freq | |------------------------------------| | 5 5 0 4 | | 6 6 0 4 | | 7 7 0 2 | | 8 8 0 5 | | 9 9 0 6 | |------------------------------------| | 10 10 0 13 | | 11 11 0 3 | | 12 12 0 13 | | 13 13 0 11 | | 13.99999 13.99999 0 1 | |------------------------------------| | 14 14 0 11 | | 15 15 1 2 | | 16 16 2 15 | | 17 17 3 20 | | 18 18 4 12 | |------------------------------------| | 19 19 5 25 | | 20 20 6 8 | | 21 21 7 22 | | 22 22 8 16 | | 23 23 9 7 | +------------------------------------+regress talk yage1 yage2 int2Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 210.66 Model | 45655.2691 3 15218.423 Prob > F = 0.0000 Residual | 14159.1109 196 72.2403617 R-squared = 0.7633 -------------+------------------------------ Adj R-squared = 0.7597 Total | 59814.38 199 300.574774 Root MSE = 8.4994 ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- yage1 | .6820981 .4377618 1.56 0.121 -.1812301 1.545426 yage2 | 2.946947 .5233158 5.63 0.000 1.914895 3.979 int2 | 8.20972 2.535655 3.24 0.001 3.209051 13.21039 _cons | 8.074878 4.5636 1.77 0.078 -.9251856 17.07494 ------------------------------------------------------------------------------

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.