|
|
|
||||
|
|
|||||
Please Note: Stata graph commands changed with version 8 and this page was developed before version 8 was released and uses Stata 7 graph commands. Please see How do I use version 7 graph commands in Stata version 8? for information on how to either run these Stata 7 graph commands in Stata version 8, or how you can covert these commands to use Stata 8 syntax.
* Is it important to use logistic regression when analyzing a 0/1 * outcome variable? Some have pointed out that there may be * little practical difference between the two methods and * if so, then OLS would be simpler to apply and interpret. * This page will compare OLS and logistic regression using * a couple of examples to explore this issue. We start by * using the "hsb2" data file and we will predict "female" * (0=male, 1=female) from a set of standardized scores, * "read write science", and "socst". We know that * predicting "female" is kind of a nutty thing to do, but * this is just for illustration purposes. * We first use the "hsb2" data file. * use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear use hsb2, clear ***** Check #1. Compare results of OLS and Logistic Models * We run the model using OLS regression. regress female read write science socst * We run the model now using logistic regression. logit female read write science socst * You can compare the two models and see that the "p values" * associated with the two methods are quite similar, except * for the constant, which is a boring difference. * To help compare the two models, we will re-run the analyses * and use the "outreg" command (see "findit outreg") and it will * help us see the results side by side. quietly regress female read write science socst outreg using olslog, nolabel replace ctitle(OLS) pvalue quietly logit female read write science socst outreg using olslog, append nolabel replace ctitle(Logit) pvalue * We have to add some tabs to this table to make it line up. type olslog.out ***** Check #2. Show graph of predicted probability of being female ***** by each predictor comparing OLS and Logistic Models * Let's look at the relationship between one of the predictors, "read", * and the predicted probabilility of being female, while holding * all other predictors constant. We will do this comparing * OLS and logistic regression. quietly regress female read write science socst postgr read, generate(yhatols1) quietly logit female read write science socst postgr read, generate(yhatlog1) graph yhatols1 yhatlog1 read, c(ll) ylabel(0 .2 to 1) sort * We can compute the "observed probability" of being a female * by each level of "read" using the "egen" command and * include that in the graph as well. egen probfem_byread = mean(female) , by(read) graph yhatols1 yhatlog1 probfem_byread read, c(ll.) ylabel(0 .2 to 1) sort * This does not work very well because we have small numbers * in each level of "read", but this will work better in our * second example. For now, it suffices that the predicted * probabilities by read are nearly identical for OLS and logistic. * We can do the same for the other predictors in our model, just * to assure us that this is not a fluke. We will create all of the * graphs and show them as one graph. quietly regress female read write science socst postgr write , generate(yhatols2) postgr science, generate(yhatols3) postgr socst , generate(yhatols4) quietly logit female read write science socst postgr write , generate(yhatlog2) postgr science, generate(yhatlog3) postgr socst , generate(yhatlog4) graph yhatols1 yhatlog1 read , c(ll) sort ylab(0 .2 to 1) saving(compare1a, replace) graph yhatols2 yhatlog2 write , c(ll) sort ylab(0 .2 to 1) saving(compare1b, replace) graph yhatols3 yhatlog3 science, c(ll) sort ylab(0 .2 to 1) saving(compare1c, replace) graph yhatols4 yhatlog4 socst , c(ll) sort ylab(0 .2 to 1) saving(compare1d, replace) graph using compare1a compare1b compare1c compare1d ***** Check #3. Look at correlation between predicted probability ***** of being female for OLS and Logistic Models. * Let's now look at the predicted probabilities for the two models * and compare them. quietly regress female read write science socst predict yhatols quietly logit female read write science socst predict yhatlog * The correlation between the predicted values is very high! corr yhatols yhatlog ***** Check #4. Graph predicted probability of being female ***** for OLS by predicted probability for and Logistic Model. * Let's graph the predicted values against each other. Even * though the correlation between the two predicted values is * nearly 1, we can see some non-linearity between them. graph yhatols yhatlog, yline(0 1) ylab(0 .2 to 1) ***** Check #5. Graph observed probability of being female ***** by predicted probability of being female for OLS model ***** and Logistic model. * Now let's compare the observed probability with the predicted * probability using OLS. We could simply graph "female" by * the predicted probability as shown below. graph female yhatols, ylab(0 .2 to 1) * From the graph above, it is hard to see how well the * predicted data fit the observed data. Instead, we can * group the students up into groups of 20 and then for each * group of 20 compute the observed probability of being * female by dividing the number of females by 20. * For the OLS case, we will sort the data on the predicted * probability and then create the groups of 20 and the compute * the observed probability. sort yhatols generate olsgroup = int( (_n+1) / 20) egen obsprobols = mean(female), by(olsgroup) graph obsprobols yhatols, ylab(0 .2 to 1) saving(obsprob1, replace) title(obs vs fitted, OLS) * We can now do the same of the logistic analysis. sort yhatlog generate loggroup = int( (_n+1) / 20) egen obsproblog = mean(female), by(loggroup) graph obsproblog yhatlog, ylab(0 .2 to 1) saving(obsprob2, replace) title(obs vs fitted, Logistic) graph using obsprob1 obsprob2 * This example would appear to suggest that the results from * OLS and Logistic regression are quite similar. * When you look at the relationship between the predictors and * predicted probability, the two are much the same. The predicted * probabilities are very highly correlated, and the relationship between * the observed probability and predicted probability looks much the same * for the two techniques. But, let's consider another example. * Example 2. * This example uses the api data file * use http://www.ats.ucla.edu/stat/stata/webbooks/logistic/apilog, clear use apilog, clear ***** Check #1. Compare results of OLS and Logistic Models regress hiqual full avg_ed outreg using api1, nolabel ctitle(OLS) pvalue replace logit hiqual full avg_ed outreg using api1, nolabel ctitle(OLS) pvalue append type api1.out ***** Check #2. Show graph of predicted probability of being female ***** by each predictor comparing OLS and Logistic Models * get observed probabilities by ivs egen ofull = mean(hiqual), by(full) egen oavg_ed = mean(hiqual), by(avg_ed) * run logistic and get predicted values logistic hiqual full avg_ed predict yhatlog * generate predicted values by each iv postgr full , gen(fullhatlog) postgr avg_ed, gen(avg_edhatlog) * run OLS and get predicted values regress hiqual full avg_ed predict yhatols * generate predicted values by each iv postgr full , gen(fullhatols) postgr avg_ed, gen(avg_edhatols) * make graph of each iv by observed and predicted probability graph ofull fullhatlog fullhatols full , c(.ll) s(o..) sort saving(grfull, replace) graph oavg_ed avg_edhatlog avg_edhatols avg_ed, c(.ll) s(o..) sort saving(gravg_ed, replace) graph using gryr_rnd grfull grell gravg_ed ***** Check #3. Look at correlation between predicted probability ***** of being female for OLS and Logistic Models. * #3, compare correlations corr yhatlog yhatols ***** Check #4. Graph predicted probability of being hiqual ***** for OLS by predicted probability for and Logistic Model. graph yhatols yhatlog, yline(0 1) ylab(0 .2 to 1) ***** Check #5. Graph observed probability of being female ***** by predicted probability of being female for OLS model ***** and Logistic model. * break data up into 40 bins and get observed prob sort yhatlog generate n = int((_n-1) / 30) egen mhiquall = mean(hiqual), by(n) graph mhiquall yhatlog, saving(g1, replace) sort yhatols generate n2 = int((_n-1) / 30) egen mhiqualo = mean(hiqual), by(n2) graph mhiqualo yhatols, saving(g2, replace) graph using g1 g2 ***** Check #6. Compare residuals between OLS and logistic model. gen resols = yhatols - mhiqualo gen reslog = yhatlog - mhiquall mdensity reslog resols, xlab c(ll) s(..) xline(0) * Example 3. * This example uses the api data file * use http://www.ats.ucla.edu/stat/stata/webbooks/logistic/apilog, clear use apilog, clear ***** Check #1. Compare results of OLS and Logistic Models regress hiqual full avg_ed yr_rnd meals outreg using api2, nolabel ctitle(OLS) pvalue replace logit hiqual full avg_ed yr_rnd meals outreg using api2, nolabel ctitle(OLS) pvalue append type api2.out ***** Check #2. Show graph of predicted probability and ***** observed probability of being female ***** by each predictor comparing OLS and Logistic Models * get observed probabilities by ivs egen ofull = mean(hiqual), by(full) egen oavg_ed = mean(hiqual), by(avg_ed) egen oyr_rnd = mean(hiqual), by(yr_rnd) egen omeals = mean(hiqual), by(meals) * run logistic and get predicted values logistic hiqual full avg_ed yr_rnd meals predict yhatlog * generate predicted values by each iv postgr full , gen(fullhatlog) postgr avg_ed, gen(avg_edhatlog) postgr yr_rnd, gen(yr_rndhatlog) postgr meals , gen(mealshatlog) * run OLS and get predicted values regress hiqual full avg_ed yr_rnd meals predict yhatols * generate predicted values by each iv postgr full , gen(fullhatols) postgr avg_ed, gen(avg_edhatols) postgr yr_rnd, gen(yr_rndhatols) postgr meals , gen(mealshatols) * make graph of each iv by observed and predicted probability graph ofull fullhatlog fullhatols full , c(.ll) s(o..) sort saving(grfull, replace) graph oavg_ed avg_edhatlog avg_edhatols avg_ed, c(.ll) s(o..) sort saving(gravg_ed, replace) graph omeals mealshatlog mealshatols meals , c(.ll) s(o..) sort saving(grmeals, replace) graph oyr_rnd yr_rndhatlog yr_rndhatols yr_rnd, c(.ll) s(o..) sort saving(gryr_rnd, replace) graph using grfull gravg_ed grmeals gryr_rnd ***** Check #3. Look at correlation between predicted probability ***** of being female for OLS and Logistic Models. * #3, compare correlations corr yhatlog yhatols ***** Check #4. Graph predicted probability of being hiqual ***** for OLS by predicted probability for and Logistic Model. graph yhatols yhatlog, yline(0 1) ylab(0 .2 to 1) ***** Check #5. Graph observed probability of being female ***** by predicted probability of being female for OLS model ***** and Logistic model. * break data up into 40 bins and get observed prob sort yhatlog generate n = int((_n-1) / 30) egen mhiquall = mean(hiqual), by(n) graph mhiquall yhatlog, saving(g1, replace) sort yhatols generate n2 = int((_n-1) / 30) egen mhiqualo = mean(hiqual), by(n2) graph mhiqualo yhatols, saving(g2, replace) graph using g1 g2 ***** Check #6. Compare residuals between OLS and logistic model. gen resols = yhatols - mhiqualo gen reslog = yhatlog - mhiquall mdensity reslog resols, xlab c(ll) s(..) xline(0)
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services