UCLA Academic Technology Services HomeServicesClassesContactJobs
Search
UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Annotated Stata Output
Logistic Regression Analysis

This page shows an example logistic regression analysis with footnotes explaining the output.  The analysis uses a data file about crime. For our example we will create a dichotomuos response variable called hicrime by the command: generate hicrime = (crimerat >= 110). This model will predict crimrat from maleteen, south, educ and police59 using the following Stata commands (both logistic and logit are use to obtain output with odds ratios and regular coefficients).

logistic hicrime maleteen south educ police59
logit

lfit
lfit, group(10)

The output of this command is shown below, followed by explanations of the output.   If you are viewing the frames version, you can click on the hotlinks in the output and the corresponding footnote will appear in the frame below.   If you are viewing the no frames version, you can scroll to the bottom to see the footnotes, or you can click the hotlinks a new window will popup with the corresponding footnote showing in the new window. 

Output

logistic hicrime maleteen south educ police59
Logit estimates                                   Number of obsa   =         47
                                                  LR chi2(4)b      =      13.93
                                                  Prob > chi2c     =     0.0075
Log likelihoodd = -18.606959                      Pseudo R2e       =     0.2724

------------------------------------------------------------------------------
hicrimef | Odds Ratiog  Std. Err.h      zi    P>|z|j      [95% Conf. Interval]k
---------+--------------------------------------------------------------------
maleteen |   1.086959   .0478646      1.894   0.058       .9970804    1.184939
   south |   .3272305   .4449077     -0.822   0.411       .0227796     4.70068
    educ |   1.023187   .5723757      0.041   0.967       .3418133    3.062818
police59 |   1.059909   .0222633      2.770   0.006        1.01716    1.104455
------------------------------------------------------------------------------

logit

Logit estimates                                   Number of obsa   =         47
                                                  LR chi2(4)b      =      13.93
                                                  Prob > chi2c     =     0.0075
Log likelihoodd = -18.606959                      Pseudo R2e       =     0.2724

------------------------------------------------------------------------------
hicrimef |      Coef.l  Std. Err.m      zi    P>|z|j      [95% Conf. Interval]n
---------+--------------------------------------------------------------------
maleteen |   .0833837   .0440353      1.894   0.058      -.0029239    .1696914
   south |  -1.117091   1.359616     -0.822   0.411      -3.781888    1.547707
    educ |   .0229224   .5594047      0.041   0.967      -1.073491    1.119335
police59 |   .0581834   .0210049      2.770   0.006       .0170147    .0993522
   _cons |  -17.70177   9.495993     -1.864   0.062      -36.31357    .9100364
------------------------------------------------------------------------------

lfit

Logistic model for hicrime, goodness-of-fit testo

       number of observationsp =        47
 number of covariate patternsq =        47
             Pearson chi2(42)r =        38.72
                  Prob > chi2s =         0.6158


lfit, group(10)

Logistic model for hicrime, goodness-of-fit testo
(Table collapsed on quantiles of estimated probabilities)

       number of observationsp =        47
             number of groupst =        10
      Hosmer-Lemeshow chi2(8)u =        13.45
                  Prob > chi2v =         0.0972

UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Footnotes

a. This is the number of observations being analyzed.

b. This is the likelihood ratio chi-square with 4 degrees of freedom. One degree of freedom is used for each predictor variable in the logistic regression model. The likelihood-ration chi-square is defined as 2(L1 - L0), where L0 represents the log likelihood for the "constant-only" model and L1 is the log likelihood for the full model with constant and predictors. In this example, L0 = -25.573407 (which doesn't show up in the output) and L1 = -18.606959 (which is found in item d below). Thus, the likelihood-ratio chi-square = 2*(-18.606959 - (-25.573407)) = 13.93.

c. This is the p-value associated the chi-square with 4 degrees of freedom. The value of .0075 indicates that the model as a whole is statistically significant.

d. This is the values of the log likelihood for the model including the constant and all of the predictors that was computed using the maximum-likelihood logit model.

e. Technically, R2 cannot be computed the same way in logistic regression as it is in OLS regression. The pseudo-R2, in logistic regression, is defined as (1 - L1)/L0, where L0 represents the log likelihood for the "constant-only" model and L1 is the log likelihood for the full model with constant and predictors.

f. This column starts with the name of the response variable (hicrime) and then lists the names of the predictor variables (maleteen south educ police59).

g. The odds ratio column gives the amount of change expected in the odds ratio when there is a one unit change in the predictor variable with all of the other valiables in the model held constant. An odds ratio close to 1.0 suggest that there is no change due to the predictor variable.

In this example, the odds ratio for police59 is 1.059909. Thus, you would predict that the odds ratio would change by 1.059909 for every one unit change in police59 when maleteen, south and educ are held constant.

For a more detailed explanation of odds rations see the Stata FAQ: How do I interpret odds ratios in logistic regression?

h. The standard error for the odds ratio is obtained from the logistic regression coefficient and its standard error using the formula:

i. This column contains the z-statistic testing the logistic coeffieient.

In the case of the logit command, z = (coef.)/(Std. Err). For this example, z(police59) = .0581834/.0210049 = 2.770.

Stata uses the same z-test value computed for the logistic coefficient as the test of the odds ratio.

j. This column contains the two-tail p-value for the z-test. Stata uses the same p-value computed testing the hypothesis, H0: b = 0, for both the logistic coefficients and for the odds ratios.

k. This column contains the 95% confidence intervals for the odds ratios. Significant effects are suggested when confidence intervals do not contain 1.0. In this example, the only interval that would be considered significant at the .05 level is the one for police59. All of the other confidence intervals contain the value 1.0.

l. The coefficient column gives the values for the logistic regression coefficients. These coefficients indicate the amount of change expected in the log odds when there is a one unit change in the predictor variable with all of the other valiables in the model held constant. A coefficient close to 0 suggest that there is no change due to the predictor variable.

There is a relationship between the logistic coefficients and the odds ratios,
odds ration = exp(coefficient). In this example the logistic coefficienct for police59 is .0581834, exp(.0581834) = 1.0599094, which is very close to the value of the odds ratio for police59.

Also in this example, the logistic coefficient for police59 is .0581834. Thus, you would predict that the log odds for hicrime would change by .0581834 for every one unit change in police59 when maleteen, south and educ are held constant.

The logistic coefficients can be used in a manner very similar to regression coefficient to generate predicted values. In this example,

predicted = -17.70177 + .0833837*maleteen -1.117091*south + 0229224*educ + .0581834*police59

You would get the same results in you used the predict command with the xb option.

m. This column contains the standard error for the logistic regression coefficient which is used to compute the z-test for the coefficient.

n. This column contains the 95% confidence intervals for the logistic regression coefficients. Significant effects are suggested when confidence intervals do not contain 0. In this example, the only interval that would be considered significant at the .05 level is the one for police59. All of the other confidence intervals contain the value 0.

o. The lfit command, typed without options, displays the Pearson goodness-of-fit test for the estimated model. With the group option lfit produces Hosmer-Lemeshow's goodness-of-fit test.

p. The number of observations in the model.

q. The Pearson chi-square goodness-of-fit test compares the observed against the expected number of responses using cells defined by the covariate patterns. When the number of covariate patterns approaches the number of observations the Pearson chi-square test is not advised. In those situations the Hosmer and Lemeshow goodness-of-fit test for grouped data is preferred.

r. The value of the Pearson chi-square with 42 degress of freedom is given.

s. The p-value for the goodness-of-fit test suggests that the model fits reasonably well. However, since the number of covariate patterns is equal to the number of observations the Pearson test is not appropriate for these data.

t. The group option requested that the data be formed into 10 nearly equal-size groups for the Hosmer and Lemeshow test of goodness-of-fit.

u. The value of the Hosmer and Lemeshow chi-square with 8 degress of freedom is given.

v. The p-value for the Hosmer and Lemeshow goodness-of-fit test is .0972 which could suggest problems concerning the fit of our model.

 

 

 

 

 

 

 

 

 

 


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.