Statistical Computing Seminars
What's new in Stata 11:  Factor variables, margins and interactions

Factor variables

Factor variables are extensions of existing variables that are used to code categorical variables. Factor variables create indicator (dummy) variables from categorical variables, interactions of categorical variables, interactions of categorical and continuous variables, and interactions of continuous variables (polynomials). They are allowed with most estimation and postestimation commands, along with a few other commands.

Currently, only indicator (dummy) variable coding is performed however future extension of factor variables may allow for other coding systems, such as effect coding, Helmert coding, etc.

The table below provides a list of factor variable operators.

   Factor Variable Operators

    i.    factor variable
    b.    base level

    #     interaction
    ##    factorial interaction

    c.    continuous variable

    o.    omitted variable or levels
Running the commands below will demonstrate how factor variables work in Stata 11.
use, clear
egen grp=group(prog female)        /* create large categorical variable */

list prog i.prog,  nolabel     /* prog==1 is the base level     */
list prog bn.prog, nolabel     /* no omitted base level         */
list prog i3.prog, nolabel     /* prog==3 only                  */

summarize i.prog               /* prog==1 is the base level     */
summarize bn.prog              /* no omitted base level         */
summarize i3.prog              /* prog==3 only                  */

regress write i.prog           /* prog==1 is the base level     */
regress write bn.prog, nocons  /* no omitted base level         */
regress write i3.prog          /* prog==3 only                  */

regress write i.prog i.female  /* main effects - no interactions */
regress write i.(prog female)  /* same as above                  */

*     show baselevels

summarize i.prog, baselevels

regress write i.prog, baselevels

summarize b(freq).prog, baselevel 

regress write bn.prog, nocons
The table below shows variations on setting the base or reference category.
Use the b. operator to choose the base level for your factor variables.

       specification  Description
       b#.            use # as base, #=value of variable
       b(##).         use the #th ordered value as base
       b(first).      use smallest value as base (the default)
       b(last).       use largest value as base
       b(freq).       use most frequent value as base
       bn.            no base level
We will now continue with factor variable examples.
*     Use fvset to declare the base level for a factor variable.
*         This information will be stored with the dataset.

fvset base 3 prog

fvset report

summarize i.prog, baselevel

fvset base 1 prog             /* return baselevel to prog==1 */

*     You can select a range of levels by using the i(numlist). operator.

list grp i(3/5).grp

summarize i(1(2)5).grp, baselevel

regress write 3.grp 

*     Interactions

*     Use the ## operator do specify factorial interactions between variables.

*      A##B is shorthand for i.A i.B A#B.

summarize i.grp grp#ses, sep(0)   
summarize grp##ses, sep(0)

regress write i.prog i.female prog#female
regress write prog##female

regress write prog##female, baselevels

regress write prog##female, allbaselevels

*    Use the c. operator with continuous variables in anova or interactions.

anova write prog c.socst                      /* continuous covariate in anova   */
regress write i.prog socst                    /* continuous covariate in regress */

regress write female##c.socst, allbaselevels  /* categorical by continuous       */

list                          /* continuous by continuous        */

regress read c.socst##c.socst                 /* second degree polynomial term   */
regress read c.socst##c.socst##c.socst        /* third  degree polynomial term   */

The margins command

The real power of factor variables becomes evident when used in conjunction with the margins command.

The margins command can compute estimated marginal means, least-squares means, average and conditional marginal and partial effects (which may be reported as derivatives or as elasticities), average and conditional adjusted predictions, and predictive margins. The margins command is a postestimation command which estimates margins of responses for specified values of covariates and presents the results as a table.

We will begin with a few examples and systematically cover categorical by categorical, categorical by continuous and continuous by continuous interactions for both continuous and binary response variables.

regress write i.female i.prog

margins female prog                   /* predictive margins */

logit honors i.female i.prog

margins female prog                   /* predicted probability */

regress write i.female

margins, dydx(read)

regress write

margins, dydx(read)                   /* average marginal effect */

margins, dydx(read) at(female=(0 1))  /* averaged across values of read */

*     continuous response variable

*     categorical by categorical interaction

anova write female##prog

margins prog               /* cells means */
margins prog, asbalanced   /* estimated marginal means -- lsmeans */

*     categorical by categorical interaction with covariate

anova write female##prog

margins prog               /* cells means */
margins prog, asbalanced   /* estimated marginal means -- lsmeans -- adjusted cell means */

*     another categorical by categorical example  -- 2x4 factorial design

use, clear

anova y a b a*b                        /* old syntax -- does not work */
version 10: anova y a b a*b            /* works with version control */

anova y a##b

anovaplot b a, scatter(msymbol(i))     /* user written command */

margins a#b, post                      /* cell means */

test 1.a#1.b=2.a#1.b                   /* test of simple main effect at b=1 */
test 1.a#2.b=2.a#2.b                   /* test of simple main effect at b=2 */
test 1.a#3.b=2.a#3.b                   /* test of simple main effect at b=3 */
test 1.a#4.b=2.a#4.b                   /* test of simple main effect at b=4 */

*     categorical by continuous interaction

use, clear

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(lean1)

summarize socst

anova write female##c.socst         /* can run as anova or regress */
regress write i.female##c.socst     /* can run as anova or regress */

margins female, at(socst=(30(10)70))
margins female, at(socst=(30(10)70)) vsquish

margins, dydx(female) at(socst=(30(10)70)) vsquish

*    continuous by continuous interaction

use, clear

/* show centered at 50 */

regress read c.math##c.socst

margins, dydx(math)                                      /* average marginal effect */

margins, dydx(math) at(socst=50)                         /* simple slope at socst=50 */ 

margins, dydx(math) at(socst=(30(5)70)) vsquish          /* simple slopes */ 

matrix s=r(b)                                            /* save simple slopes */

margins, at(math=0 socst=(30(5)70)) vsquish              /* intercepts for simple slopes */

mat i=r(b)                                               /* save intercepts */

*     graph simple slopes

twoway (function y = i[1,1]  + s[1,1]*x,  range(30 75))  ///
       (function y = i[1,2]  + s[1,2]*x,  range(30 75))  ///
       (function y = i[1,3]  + s[1,3]*x,  range(30 75))  ///
       (function y = i[1,4]  + s[1,4]*x,  range(30 75))  ///
       (function y = i[1,5]  + s[1,5]*x,  range(30 75))  ///
       (function y = i[1,6]  + s[1,6]*x,  range(30 75))  ///
       (function y = i[1,7]  + s[1,7]*x,  range(30 75))  ///
       (function y = i[1,8]  + s[1,8]*x,  range(30 75))  ///
       (function y = i[1,9]  + s[1,9]*x,  range(30 75))  ///
       (function y = i[1,10] + s[1,10]*x, range(30 75))  ///
       (scatter read math, msym(oh) jitter(3)),           ///
       legend(off) ytitle(read) xtitle(math) scheme(lean1)

*     polynomial regression -- 2nd degree polynomial -- squared term

twoway (qfit math write)(lfit math write)(scatter math write, ///
       jitter(3) msym(oh)), scheme(lean1) legend(off)

regress math c.write##c.write

predict pquad
twoway line pquad write, sort

margins, dydx(write) at(write=(30(5)70)) vsquish

*     binary response variable

use, clear

logit honors i.prog read                         /* model with no interaction */

predict pprob                                    /* predicted probability for graphing */
twoway (line pprob read if prog==1, sort)  ///
       (line pprob read if prog==2, sort)  /// 
       (line pprob read if prog==3, sort), ///
       legend(order(1 "prog1" 2 "prog2" 3 "prog3"))

margins prog                                     /* averaged across read       */

margins prog, atmeans                            /* read held constant at mean */

margins prog, at(read=(40 50 60))                /* predicted probabilities    */

margins, dydx(prog) at(read=(40 50 60))          /* differences in probabilitt */

logit honors female##prog read                   /* model with interaction & covariate */

margins prog, at(female=(0 1) read=60)           /* predicted probabilities            */

*     categorical by categorical interaction -- 2x2 design

use, clear

tab1 f h

logit y f##h cv1                                     /* categorical by categorical with covar */

margins h, at(f=(0 1) cv1=(30(10)70)) vsquish        /* predicted probabilities */

margins, dydx(h) at(f=(0 1) cv1=(30(10)70)) vsquish  /* differences in probabilities */

*     categorical by continuous interaction

use, clear

logit y i.f##c.s, nolog                              /* model with interaction       */ 

margins f, at(s=(30(10)70)) vsquish                  /* predicted probabilities      */

margins, dydx(f) at(s=(30(10)70)) vsquish            /* differences in probabilities */

logit y i.f c.s, nolog                               /* model with no interaction    */   

margins f, at(s=(30(10)70)) vsquish        /* why we need to look at model without interaction */

margins, dydx(f) at(s=(30(10)70)) vsquish

*     continuous by continuous interaction

use, clear

logit y c.r##c.m, nolog                          /* continuous by continuous interaction */

margins, dydx(r) at(m=(30(10)70)) vsquish        /* marginal effect                      */

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.