### Statistical Computing Seminars Statistical Writing

Research, from hypothesis development through finished manuscript, is a process.  Hence, the results section of the manuscript is the product of all of the earlier stages of the research.  The better the quality of these earlier stages, the better the quality of the results section.

#### The main points

• Tell a cohesive, concise story.
• Clearly state your hypotheses and how they relate to your analyses.
• Provide enough detail such that your audience can understand what you did and why you did it.

#### Other tips

• Planning is important (power analysis, classes, stat packages).
• The more you understand about a statistical technique, the easier it is to describe it to others.
• The "distance" between getting your results and being able to write about them increases with the complexity of the analysis (i.e., the type of statistical technique that you are using) and the complexity of the model.
• Try the "Grandma" technique.
• There is no relationship between the amount of time it took you to do something and the amount of space on the page its write-up gets.
• Remember that there is a careful balance between enough detail to replicate the experiment and space limitations imposed by the journal.

#### Where to start

The results section usually contains two parts:  the descriptive statistics and the analyses.  These two parts should be closely related.  For example, you probably don't want to describe variables that won't be used in the analyses.  This can confuse your audience and wastes valuable space.

The descriptive statistics are important because this is often the vehicle by which your variables are introduced to your audience.  You can think of this part as introducing one friend to another.  (Introducing Sally to John example)  Of course, different types of descriptive statistics are used for different types of variables.

Continuous variables

• ordinal or continuous
• number of valid (non-missing) values
• mean and standard deviation
• perhaps median
• range
• perhaps correlations with other continuous variables
• perhaps histograms

Categorical variables

• frequency of each level (including missing)
• perhaps crosstabs with other categorical variables
• perhaps bar charts

Nominal v. ordinal

Count variables

The above points are merely suggestions.  If you have nested data, you will want to describe the variables at each level of nesting.  If you have weighted data, then medians, correlations and histograms may not be part of the description of your variables.

In the analysis part of the results section, you will want to describe your specific hypothesis, the statistical technique that you will be using, and the model (e.g., outcome and predictor variables).  This is especially important when your hypothesis involves an interaction.  Clearly stating the relationship between your hypothesis and the statistical technique and model is important for two reasons.  First, it helps guide your audience through this part of the results section.  Second, this connection will make the substantive interpretation of the results easier.  For commonly used techniques, such as ordinary least squares regression, your description may be as short as a single sentence.  For more complicated techniques or when using a technique that is likely unfamiliar to your audience, more description (and explanation) may be required.  Describing the model building process is also important.  If there are categorical variables in your model, clearly state how they were handled (e.g., reference category, coding scheme, specific hypothesis).  Most models make assumptions, and you usually want to mention that the assumptions were assessed, but the result of each diagnostic test is usually not included.  If one or more assumptions are grossly violated, further discussion may be warranted.  It is not uncommon to mention which statistical package (and which version of the package) was used to conduct the analysis.

Usually, the analyses are ordered from most to least important, except when this will disrupt the flow of your story.  If there are more than a few analyses, indicate whether an alpha control procedure was used, and if so, which one.  Almost all studies have at least some missing data.  You will want to indicate how the missing data were handled (e.g., complete cases analysis, maximum likelihood techniques, multiple imputation).  Many journals also require or encourage researchers to include measures of effect sizes.  You need to be very specific about which measure you have used, because there are dozens of them.  If you conducted an a priori power analysis, you will want to describe it.

Ideally, there will be at least a few days between the time that you finish writing and the time the article (or poster) is due.  Rereading your article after setting it aside for a while is a great way to catch errors and to check for consistency.  It may also be helpful to have a colleague read it over.

#### Examples

After I gave this seminar last time, I found that what most people in the audience wanted was specifics, especially what to say and what not to say in the results section.  In fact, many people said they wanted to be shown an output, say of a regression analysis, and then an example of how to write it up.  Unfortunately, this is nearly impossible to do, and I will show you why in just a moment.  Besides, this "cookie-cutter" approach is usually a very bad way to go.  I don't like to see people doing statistics this way, and this approach is even worse when you are writing results.  The best way to write a clear, concise results section is to thoroughly understand the statistical techniques that you used to analyze your data.  Another good strategy is to look at articles in your field that report similar analyses for ideas about the exact terminology to use.  This is a particularly good idea because the write-ups of similar analyses can be very different in different fields.  Also, some journals require much more precise language than other journals, so you might want to look at some articles in the journal in which you want to publish.  You can also find examples in our Data Analysis Example pages, our annotated output pages, and Regression Models for Categorical Dependent Variables Using Stata, Second Edition by Long and Freese (2006).  Even if you are not analyzing your data with Stata, this is a great resource.

Let's start off with a couple of examples of why you can't just look at a piece of output and write about it.  After that, we will look at some examples of some common pitfalls encountered when writing up the results of seemingly simple analyses.

So, here is a regression table.  The variable gender is dichotomous, and the variable read is continuous.  What could be difficult about interpreting this?

The difficulty has to do with the way the dichotomous variable gender is coded.  If gender was coded as 0/1, then the intercept is the mean for the group coded 0 when the reading score is equal to 0.  If gender is coded 1/2, then the intercept is the mean for the group coded 1 minus the coefficient (the B, 5.487) for gender when reading is equal to 0.

Now, let's take this example one step further.  Let's say that we create a variable called female, which is 1 for females and 0 otherwise (i.e., 0 for males).  Let's replace gender with female, and let's also include the interaction between female and read

How would you interpret these results?  Well, the interaction, fr, is not statistically significant, so there isn't much we can say about that.  So let's go on to female and read.  Or can we?  The answer is no, we can't interpret any of the other (lower order) effects, because the dichotomous variable is not independent of the interaction term.  Hence, it doesn't matter if the interaction term is statistically significant or not, because either way it is still not independent of the lower order terms.  (If you had two dichotomous predictor variables (both coded -1/1) and their interaction in the model, then you could interpret the lower order terms; in our example, because we have a continuous predictor, we can't interpret the lower order terms.)  Now, although we can't draw any conclusions regarding the tests of statistical significance, we can look at the coefficients, as they have been calculated correctly.  So, the mean of the writing scores for males (the variable female at 0) is 16.524 when the variable read, and hence the interaction term fr, are held at 0.  The mean for the females is 12.491 + 16.524, when the variable read, and hence the interaction term fr, are held at 0.  The slope for the variable read for males is 0.636, and for the females the slope is 0.636 + (-0.134).

The important point here is that how you code your variables affects how you interpret their coefficients in the output.  Therefore, you want to use methods of coding that yield the kind of interpretation you would like to make.  While our example illustrated coding of a dichotomous variable, you also have options with regard to the coding of continuous variables.  For example, if you want the constant to have a different meaning, you can center the continuous predictor variable.

Another common error when working with regression models is to refer to the model above as a multivariate regression instead of a multiple regression.  A multivariate regression is a regression model with more than one outcome variable; a multiple regression is a regression with more than one predictor variable.

The point here is that simply looking at the output is often not enough when trying to do interpretation and writing.  Rather, you need to know lots of things, and seemingly small details can greatly affect the meaning.  This is why the "cookie-cutter" approach to interpretation doesn't work well.  Now let's go on to some other examples of places where people often have difficulty in writing about results.

Example:  Categorical predictor variables

Now let's look at a model that includes a categorical variable that has more than two levels.  In this example, we have included the variable race, which has four levels.  Because race has four levels, we have included three dummy variables (i.e., 0/1 variables) in the regression.  The dummy variable for the second level of race is statistically significant, while none of the other dummy variables are.  What can we say about this?

regress write read math female i.race

Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  6,   193) =   37.46
Model |  9619.24508     6  1603.20751           Prob > F      =  0.0000
Residual |  8259.62992   193    42.79601           R-squared     =  0.5380
Total |   17878.875   199   89.843593           Root MSE      =  6.5419

------------------------------------------------------------------------------
write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
read |    .320763   .0612872     5.23   0.000     .1998843    .4416416
math |   .3652081    .067842     5.38   0.000     .2314011    .4990151
female |   5.287456    .937736     5.64   0.000      3.43793    7.136983
|
race |
2  |   4.838573    2.45403     1.97   0.050    -.0015891    9.678734
3  |   .9289412   1.989441     0.47   0.641    -2.994896    4.852778
4  |   2.490295   1.493206     1.67   0.097    -.4548022    5.435392
|
_cons |   11.74903   2.984052     3.94   0.000     5.863487    17.63457
------------------------------------------------------------------------------

What we can say about this depends on your hypothesis and your training.  If the hypothesis is about the variable race, then we can't say anything about the comparisons of the various levels of race until we know if the variable race as a whole is statistically significant or not.  The 3 degree of freedom test below indicates that it is not, so we can't say anything about the difference between level 2 and level 1 of race.  On the other hand, if you had an a priori hypothesis regarding the test between Hispanic (the reference group) and Asian (2.race), you could interpret the result above and ignore the 3 degree of freedom test below.

testparm i.race

( 1)  2.race = 0
( 2)  3.race = 0
( 3)  4.race = 0

F(  3,   193) =    1.67
Prob > F =    0.1757

Now let's change the model a little bit (replace math with socst) and see what happens.

regress write read socst female i.race

Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  6,   193) =   38.06
Model |  9689.26202     6    1614.877           Prob > F      =  0.0000
Residual |  8189.61298   193  42.4332279           R-squared     =  0.5419
Total |   17878.875   199   89.843593           Root MSE      =  6.5141

------------------------------------------------------------------------------
write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
read |   .3307708   .0592551     5.58   0.000     .2139001    .4476414
socst |   .3074725   .0553338     5.56   0.000      .198336    .4166091
female |   4.690728   .9393554     4.99   0.000     2.838008    6.543449
|
race |
2  |    7.55963   2.399498     3.15   0.002     2.827024    12.29224
3  |   .2886157   1.981522     0.15   0.884    -3.619603    4.196834
4  |   3.043909    1.47917     2.06   0.041     .1264957    5.961323
|
_cons |   14.17782   2.780192     5.10   0.000     8.694361    19.66128
------------------------------------------------------------------------------
testparm i.race

( 1)  2.race = 0
( 2)  3.race = 0
( 3)  4.race = 0

F(  3,   193) =    4.26
Prob > F =    0.0061

Now the overall test of race is statistically significant, and you can consider the results in the regression table above.

When writing about the dummy variables, you will want to make clear what type of coding system was used (e.g., dummy coding, effect coding, orthogonal polynomial coding, etc.), as well as what the reference group is.  Both of these will affect the interpretation of the dummy variables.  Also, you don't want to leave out dummy variables that are not statistically significant; for example, you would not want to rerun the above model without the third level of race.  If you did that, your reference group would be a combination of the first and third levels of race, and that is not likely to make substantive sense.

Example:  Logistic regression

If you have conducted a logistic regression, you can describe your results in several different ways.  You could discuss the logits (log odds), odds ratios or the predicted probabilities.  Which metric you choose is a matter of personal preference and convention in your field.  Most of the information in this section is quoted from Regression Models for Categorical Dependent Variables Using Stata, Second Edition by Long and Freese (2006), pages 177-181.  If you are running a logistic regression model, an ordered logit model, a multinomial logit model, a poisson model or a negative binomial model, I strongly suggest that you borrow or buy a copy of this book and read up on the particular type of model that you are running.  Most people find this book very helpful, even if they are using a statistics package other than Stata.

When interpreting the output in the logit metric, "... for a unit change in xk, we expect the logit to change by k, holding all other variables constant."  "This interpretation does not depend on the level of the other variables in the model."

When interpreting the output in the metric of odds ratios, "For a unit change in xk, the odds are expected to change by a factor of exp(k), holding all other variables constant."  "When interpreting the odds ratios, remember that they are multiplicative.  This means that positive effects are greater than one and negative effects are between zero and one.  Magnitudes of positive and negative effects should be compared by taking the inverse of the negative effect (or vice versa)."  "For exp(k) > 1, you could say that the odds are "exp(k) times larger", for exp(k) < 1, you could say that the odds are "exp(k) times smaller.""

Now if you are having difficulty understanding a unit change in the log odds really means, and odds ratios aren't as clear as you thought, you might want to consider describing your results in the metric of predicted probabilities.  Many audiences, and indeed, many researchers, find this to be a more intuitive metric in which to understand the results of a logistic regression.  While the relationship between the outcome variable and the predictor variables is linear in the logit metric, the relationship is not linear in the probability metric.  Remember that "... a constant factor change in the odds does not correspond to a constant change or a constant factor change in the probability.  This nonlinearity means that you will have to be very precise about the values at which the other variables in the model are held.

I hope that this example makes clear why I say that in order to write a clear and coherent results section, you really need to understand the statistical tests that you are running.

Our next example concerns confidence intervals, so let's jump ahead a little bit and talk about confidence intervals in logistic regression output.  "If you report the odds ratios instead of the untransformed coefficients, the 95% confidence interval of the odds ratio is typically reported instead of the standard error.  The reason is that the odds ratio is a nonlinear transformation of the logit coefficient, so the confidence interval is asymmetric."

Example:  Confidence intervals

Many journals are pushing for confidence intervals to be included in the results section.  But what does the confidence interval tell you?  Problematic interpretations include:  "We are 95% confident that the true parameter for reading score lies between .209 and .456."  "There is a 95% chance that the true parameter lies between .209 and .456."  Rather, the confidence interval gives a range of values such that if the experiment was run many times (e.g., 10,000 times), the range would contain the true parameter 95% of the time.  Most of the time, there is little reason to comment on the confidence interval:  it is what it is.  One situation in which you might want to comment on the confidence interval is when you are conducting a study in order to get a precise estimate of a particular parameter, e.g., the mean age of people in a particular population.

Example:  Interaction terms

Many researchers have difficulty interpreting and understanding the meaning of interaction terms in statistical models, so this is often one of the most challenging parts of the results section to write.  If you are going to include an interaction term in your model, be sure that it is testing a hypothesis of interest to you; don't include interactions "just because".  Also, plan on spending extra time exploring and graphing the interaction.  This is one term in your model that you are going to have to understand really, really well before you will be able to write about it clearly.  Also, some statistical software packages are better than others for creating the graphs of interactions, so you may need to switch packages to make the graph.  Graphs are often a necessary part of understanding the interaction, even if the graph won't be included in the final manuscript.

The simplest form of interaction to interpret is the interaction of two dichotomous variables.  It is fairly easy to get the cell means, see how the coefficients are calculated, and obtain a graph.  The situation becomes more complicated when you have a dichotomous by continuous interaction.  In this situation, graphs are usually very helpful in understanding what is happening.  When you have a continuous by continuous interaction, the graph is three dimensional, and you are looking at the warping of a plane.  The situation becomes even more complex if you have more than one interaction in the model or three-way (or higher) interactions.  Please remember that if you have interaction terms in your model, you almost always need to have the lower-order effects in the model as well.  For example, if you have a three-way interaction of xyz, you will need to include in the model the three two-way interactions, xy, yz and xz, as well as x, y and z.  If all of the lower-order terms are not included in the model, the three-way interaction will likely be uninterruptible.

For more information regarding the use and interpretation of interactions in regression, please see the last few chapter of our OLS Regression with SAS , Stata  and SPSS web books.  For more information on interactions in logistic regression, please see our seminar Visualizing Main Effects and Interactions for Binary Logit Models in Stata with movies.

Example:  Bivariate tests

For our last example, let's talk about the clarity of specifying which statistical test was conducted.  Looking at the output above, a researcher might write,  "We did a bivariate analysis between the variables, and the result was significant (p = .01)."  However, this is problematic for a couple of reasons.  First of all, a "bivariate" analysis can refer to any analysis that involves only two variables.  Examples of bivariate analyses include chi-square, correlation, simple OLS regression, simple logistic regression, t-test, one-way ANOVA, etc.  Second, the write up should be specific about which variables are used in each analysis.  Perhaps a better way to write this would be:  "We conducted a chi-square test with gender and favorite flavor of ice cream, and the result was statistically significant (χ2(2) = 9.269, p < .05)."  Depending on the rest of the paragraph, you might also want to include the number of cases used in this analysis, the number of cases in each cell, and/or that the assumption that each expected count was five or greater was met.

#### Words of caution

While I can't tell you exactly what words to use in your results section, we have come up with a partial list of words that you want to be very careful when using.  One of the problems with many of these words is that they have at least two meanings:  a meaning in common parlance and a specific statistical meaning (and sometimes more than one statistical meaning).

- proved/proven
- chance
- odds
- risk
- probability
- significance (statistical or clinical, parameter or model)
- likelihood
- beta (standardized or unstandardized regression coefficient)
- standardized (variable, coefficient, test scores)
- normal
- controlling for (this is an idea that is in the analyst's head, not the program analyzing the data)
- covariates
- robust (regression, standard errors, findings)
- nested (models, data)
- hierachical (models (multilevel modeling, blocked regression), data)
- random (variables, intercepts, slopes, effects)
- datum is; data are
- strata

#### Tables and graphs

Returning to the point about space issues, tables and graphs are two ways to convey a lot of information in a relatively small amount of space.  However, creating useful tables and graphs is often more difficult than it seems.  Almost everyone has had the experience of reading a journal article and being mystified about what exactly is in a particular table or how some values where calculated.  It is often tempting and easy to add too much information in a single table; the old adage  "Less is more" is often true.

Tables and graphs can be included in either the descriptive part of the results section, the analysis part or both.  Of course, you want to use these methods of conveying information very judiciously.  (In other words, you probably can't have more than a few tables and/or graphs in your manuscript.)

Here are a few general tips for creating tables.  (quoted from Lang and Secic, How to Report Statistics in Medicine:  Annotated Guidelines for Authors, Editors, and Reviewers, Second Edition, 2006, chapter 20)

• "Tables are for communication, not data storage." (Howard Wainer)
• "Tables should have a purpose; they should contribute to and be integrated with the rest of the text."
• "Tables should be organized and formatted to assist readers in finding, seeing, understanding, and remembering the information."
• "Organize the table visually as well as functionally."
• "Data presented in tables should not be duplicated elsewhere in the text."

Here are a few general tips for creating graphs.  (Nicol and Pexman, Displaying Your Findings:  A Practical Guide for Creating Figures, Posters, and Presentations, 2003)

• include only essential information
• black and white only; no color (journal v. web)
• understandable on its own; all information needed to understand each element (such as abbreviations and definitions) should be included in the caption
• in most cases, three dimensional graphs should be avoided
• when possible, legends should be in the figure image
• when possible, axis labels should be parallel to the axis
• the dependent variable is usually presented on the y-axis
• should follow the rules of capitalization that are specific to captions
• the y-axis should be 2/3 to 3/4 the length of the x-axis
• the highest values on the axes should be higher than the highest data values
• grid lines, if used, should not be too close together
• symbols marking data points or plot symbols are about the same size as the smallest lower case letters in the graph
• axis scales (labels) should not be misleading (e.g., a small difference should look small)
• font sizes should not vary by more than four points
• figures are referenced in the text using the figure number

Remember that there are a wide variety of graphs, including line graphs, bar charts, histograms and scatterplots.  If you have a very large data set, graphing anything can be a challenge.  You may want to look at Graphics of Large Data Sets:  Visualizing a Million by Unwin, Theus and Hofmann (2006).  They offer some useful tips on making graphs with a large number of data points more readable.  Other types of figures, such as a relief maps, schematics of the research design or drawings that were used as stimuli in the experiment, are sometimes presented in research publications.  The texts listed above have some tips for making these as useful as possible to your audience.

#### Important issues

1.)  Missing data:  Missing data issues and the possible ways of handling them can take a lot of time.  You not only have to learn about the pros and cons of various possible techniques, but then you have to decide which one is most appropriate for your situation.  You will find that hard-and-fast rules are rare in this area, and there is lots of disagreement among "experts".  Once you have decided on a technique, you will have to determine if the package with which you are familiar will do that, or if you will then have to find and learn a package that will do that.  Next, you need to determine if the package that you want to use for the analysis will handle that type of imputation.  For example, let's say that you were doing a multiple linear regression in SPSS.  That was fine until you decided to use multiple imputation to handle your missing data.  If you are using SPSS, please note that only the Missing Data module in SPSS version 17 can create and analyze multiply imputed data sets.
2.)  Small sample sizes:  For most applied research, small sample sizes are problematic, usually for many reasons.  For one, many common statistical procedures are not appropriate for small sample sizes.  Even if the researcher decides to use the modeling technique, the model may not run for numeric reasons.  For example, the likelihood may not converge, a matrix may not be positive definite, etc.  Even if the model does run successfully, the assumptions of the test may not be met.  Any of these problems can cause the researcher to either modify the model until it does run, or "fall back" to a simpler statistical technique.  This can really complicate things because now you have to ask a modified form of your research question, then the flow of the research is disrupted, etc.  In other words, your hypotheses are necessarily tied to your statistical analyses, and you usually cannot modify one without modifying the other.  Also, issues of fair and accurate reporting of what you have done become pertinent.
4.)  Survey data:  Many researchers who have never used survey data before believe that analyzing survey data is just like analyzing data from experiments.  This isn't true.  The sampling weights need to be used to adjust the estimates for the sampling plan, and the standard errors need to be adjusted to account for the non-independence of the observations (i.e., PSUs and/or strata or replicate weights need to be used).  For some researchers, this simply means using different commands in the stat package that they are already using (such as Stata).  For others, it means learning a new stat package.
5.)  Correlated data:  Now, technically, most survey data are correlated data.  However, there are many types of correlated data that are not survey data.  For example, patients or doctors in hospitals, people in neighborhoods, partners in couples, etc.  There are several ways to analyze correlated data, and it is often a judgment call on the part of the analyst as to which technique to use.  Again, if you are not familiar with the various ways to analyze correlated data you will have to stop and learn at least enough about the various methods so that you can select which method you feel is most appropriate to use.  When writing about the analysis, you will have to justify why you selected this technique over others.  Also, you may end up having to analyze the data using more than one technique so that you can have confidence in your results.

#### The use of the web

The final topic that I want to discuss today has to do with possible future trends in research and how they might affect you.  Some researchers have started making their data sets, codebooks and syntax available on their web sites.  In a similar vein, some journals are asking for copies of data sets and making them available on their web sites so that other researchers can use them as secondary data sets or to confirm published results.  Either way, this trend means that there may be much closer scrutiny of data sets and their analysis.  We always suggest that researchers use syntax (as opposed to point-and-click) to run their analyses.  There are at least two good reasons for this.  Such syntax files can be very useful if you get a revise and resubmit ("R&R") or for posting on a web site.  This will also document your data transformations, analyses and thought process.  Even if you are not planning on making your data set publicly available, you should keep careful notes about each step in your research and data analysis, including how and why each step is done.

I hope that these tips will help make the writing of the results sections of papers easier.  If you are interested in viewing the resources mentioned in this presentation, the links are:

ATS Statistical Computing

• Data Analysis Examples
• Annotated output
• Web books Online Seminars including
• Statistics Books for Loan
• Applied Statistics Courses Offered at UCLA