Help the Stat Consulting Group by giving a gift

How can I estimate relative risk in SAS using proc genmod for common outcomes in cohort studies?

Binary outcomes in cohort studies are commonly analyzed by applying a
logistic regression model to the data to obtain odds ratios for comparing
groups with different sets of characteristics. Although this is often
appropriate, there may be situations in which it is more desirable to
estimate a relative risk or risk ratio (RR) instead of an odds ratio (OR).
Several articles in recent medical and public health literature point out
that when the outcome event is common (incidence of 10% or more), it is often
more desirable to estimate an RR since there is an increasing differential
between the RR and OR with increasing incidence rates, and there is a
tendency for some to interpret ORs as if they are RRs ([1]-[3]). There are
some who hold the opinion that the OR should be used even when the outcome is
common, however ([4]). Here the purpose is to demonstrate methods for
calculating the RR, assuming that it is the appropriate thing to do. There
are several options for how to estimate RRs directly in SAS, which have been
demonstrated to be reliable in simulated and real data sets of various sizes
and outcome incidence rates ([1],[2]). Two of these methods will be
demonstrated here using hypothetical data created for this purpose. Both
methods use **proc genmod**. One estimates the RR with a log-binomial
regression model, and the other uses a Poisson regression model with a robust
error variance.

A hypothetical data set was created to illustrate two methods of estimating
relative risks using SAS. The outcome generated is called lenses, to indicate if
the hypothetical study participants require corrective lenses by the time they
are 30 years old. Assume all participants do not need them at a baseline
assessment when they are 10 years old. Assume none of them have had serious head
injuries or had brain tumors or other major health problems during the 20 years
between assessments. Suppose we wanted to know if requiring corrective lenses is
associated with having a gene which causes one to have a lifelong love and
craving for carrots (assume not having this gene results in the opposite), and
that we screened everyone for this carrot gene at baseline (carrot = 1 if they
have it, = 0 if not). We also noted their gender (= 1 if female, = 2 if male),
and what latitude of the continental US they lived on the longest (24 to 48
degrees north). All values (N=100) were assigned using a random number
generator. The data are in eyestudy.sas7bdat.

Here’s a quick description of the variables:

proc means data = eyestudy maxdec = 2; var carrot gender latitude lenses; run;The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- carrot 100 0.51 0.50 0.00 1.00 gender 100 1.48 0.50 1.00 2.00 latitude 100 35.97 7.51 24.00 48.00 lenses 100 0.53 0.50 0.00 1.00 -------------------------------------------------------------------------------

We have an overall outcome rate of 53%. So if we want to talk about whether the carrot-loving gene, gender, or latitude is associated with the risk of requiring corrective lenses by the age of 30, then relative risk is a more appropriate measure than the odds ratio. Here is a simple crosstab of carrot and lenses, which will allow us to calculate the unadjusted OR and RR by hand.

proc freq data = eyestudy; tables carrot*lenses/nopercent nocol; run;Table of carrot by lenses carrot lenses Frequency| Row Pct | 0| 1| Total ---------+--------+--------+ 0 | 17 | 32 | 49 | 34.69 | 65.31 | ---------+--------+--------+ 1 | 30 | 21 | 51 | 58.82 | 41.18 | ---------+--------+--------+ Total 47 53 100

It is interesting that fewer people with the carrot-loving gene needed corrective lenses (especially since these are fake data!). The OR and RR for those without the carrot gene versus those with it are:

OR = (32/17)/(21/30) = 2.69

RR = (32/49)/(21/51) = 1.59

We could use either **proc logistic** or **proc genmod** to calculate
the OR. Since **proc genmod** will be used to calculate the RR, it will
also be used to calculate the OR for comparison purposes (and it gives the
same results as **proc logistic**). Here is the logistic regression with just
**carrot** as the predictor:

proc genmod data = eyestudy descending; class carrot; model lenses = carrot/ dist = binomial link = logit; estimate 'Beta' carrot 1 -1/ exp; run;The GENMOD Procedure Model Information Data Set EYESTUDY Distribution Binomial Link Function Logit Dependent Variable lenses Observations Used 100 Class Level Information Class Levels Values carrot 2 0 1 Response Profile Ordered Total Value lenses Frequency 1 1 53 2 0 47 PROC GENMOD is modeling the probability that lenses='1'. Parameter Information Parameter Effect carrot Prm1 Intercept Prm2 carrot 0 Prm3 carrot 1 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 98 132.3665 1.3507 Scaled Deviance 98 132.3665 1.3507 Pearson Chi-Square 98 100.0000 1.0204 Scaled Pearson X2 98 100.0000 1.0204 Log Likelihood -66.1832 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 -0.3567 0.2845 -0.9143 0.2010 1.57 0.2100 carrot 0 1 0.9892 0.4136 0.1786 1.7997 5.72 0.0168 carrot 1 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Contrast Estimate Results Standard Chi- Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq Beta 0.9892 0.4136 0.05 0.1786 1.7997 5.72 0.0168Exp(Beta) 2.6891 1.1121 0.05 1.1956 6.0481

The **estimate** statement with the **exp** option gives us the same OR
we calculated by hand above for those without the carrot gene versus those with it.
Now this can be contrasted with the two methods of calculating the RR
described below.

With a very minor modification of the statements used above for the logistic regression, a log-binomial model can be run to get the RR instead of the OR. All that needs to be changed is the link function between the covariate(s) and outcome. Here it is specified as log instead of logit:

proc genmod data = eyestudy descending; class carrot; model lenses = carrot/ dist = binomial link = log; estimate 'Beta' carrot 1 -1/ exp; run;The GENMOD Procedure Model Information Data Set EYESTUDY Distribution Binomial Link Function Log Dependent Variable lenses Observations Used 100 Class Level Information Class Levels Values carrot 2 0 1 Response Profile Ordered Total Value lenses Frequency 1 1 53 2 0 47 PROC GENMOD is modeling the probability that lenses='1'. Parameter Information Parameter Effect carrot Prm1 Intercept Prm2 carrot 0 Prm3 carrot 1 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 98 132.3665 1.3507 Scaled Deviance 98 132.3665 1.3507 Pearson Chi-Square 98 100.0000 1.0204 Scaled Pearson X2 98 100.0000 1.0204 Log Likelihood -66.1832 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 -0.8873 0.1674 -1.2153 -0.5593 28.11 <.0001 carrot 0 1 0.4612 0.1971 0.0749 0.8476 5.48 0.0193 carrot 1 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Contrast Estimate Results Standard Chi- Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq Beta 0.4612 0.1971 0.05 0.0749 0.8476 5.48 0.0193Exp(Beta) 1.5860 0.3126 0.05 1.0778 2.3339

Now the **exp** option on the **estimate** statement gives us the estimated RR
instead of the OR, and it also matches what was calculated by hand above for
the RR. Notice that the standard error (SE) for the beta estimate calculated
here is much smaller than that calculated in the logistic regression above (SE
= 0.414), but so is the estimate itself (logistic regression beta estimate =
0.989), so the significance level is very similar (logistic regression p =
0.017) in this case. One of the criticisms of using the log-binomial model for
the RR is that it produces confidence intervals that are narrower than they
should be, and another is that there can be convergence problems (1,2). This
is why the second approach is also presented here.

Zou ([2]) suggests using a “modified Poisson” approach to estimate the
relative risk and confidence intervals by using robust error variances. Using
a Poisson model without robust error variances will result in a confidence
interval that is too wide. The robust error variances can be estimated by
using the **repeated** statement and the subject identifier (here **id**), even if
there is only one observation per subject, as Zou cleverly points out. Here is
how it is done:

proc genmod data = eyestudy; class carrot id; model lenses = carrot/ dist = poisson link = log; repeated subject = id/ type = unstr; estimate 'Beta' carrot 1 -1/ exp; run;

Notice that **id**, the individual subject identifier, has been added to the
**class** statement and is also on the **repeated** statement (with an unstructured
correlation matrix), telling **proc genmod** to calculate the robust
errors. Also notice that the distribution has been changed to Poisson, but
the link function remains log.

The GENMOD Procedure Model Information Data Set EYESTUDY Distribution Poisson Link Function Log Dependent Variable lenses Observations Used 100 Class Level Information Class Levels Values carrot 2 0 1 id 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 ... Parameter Information Parameter Effect carrot Prm1 Intercept Prm2 carrot 0 Prm3 carrot 1 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 98 64.5361 0.6585 Scaled Deviance 98 64.5361 0.6585 Pearson Chi-Square 98 47.0000 0.4796 Scaled Pearson X2 98 47.0000 0.4796 Log Likelihood -85.2681 Algorithm converged. Analysis Of Initial Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 -0.8873 0.2182 -1.3150 -0.4596 16.53 <.0001 carrot 0 1 0.46120.2808-0.0892 1.0116 2.70 0.1005 carrot 1 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. GEE Model Information Correlation Structure Unstructured Subject Effect id (100 levels) Number of Clusters 100 Correlation Matrix Dimension 1 Maximum Cluster Size 1 Minimum Cluster Size 1 Algorithm converged. Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.8873 0.1674 -1.2153 -0.5593 -5.30 <.0001 carrot 0 0.4612 0.1971 0.0749 0.8476 2.34 0.0193 carrot 1 0.0000 0.0000 0.0000 0.0000 . . Contrast Estimate Results Standard Chi- Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq Beta 0.4612 0.1971 0.05 0.0749 0.8476 5.48 0.0193Exp(Beta) 1.5860 0.3126 0.05 1.0778 2.3339

Again, the **exp** option on the **estimate** statement gives us the estimated RR,
and it matches exactly what was calculated by the log-binomial method. In this
case, the SE for the beta estimate and the p-value are also exactly the same
as in the log-binomial model. This may not always be the case, but they should
be similar. The SE calculated without the **repeated** statement (i.e., not using
robust error variances) is 0.281, and the p-value is 0.101, so the robust
method is quite different.

Adjusting the RR for other predictors or potential confounders is simply
done by adding them to the **model** statement as you would in any other
procedure. Here gender and latitude will be added to the model:

proc genmod data = eyestudy; class carrot gender id; model lenses = carrot gender latitude/ dist = poisson link = log; repeated subject = id/ type = unstr; estimate 'Beta Carrot' carrot 1 -1/ exp; estimate 'Beta Gender' gender 1 -1/ exp; estimate 'Beta Latitude' latitude 1 -1/ exp; run;The GENMOD Procedure Model Information Data Set EYESTUDY Distribution Poisson Link Function Log Dependent Variable lenses Observations Used 100 Class Level Information Class Levels Values carrot 2 0 1 gender 2 1 2 id 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 ... Parameter Information Parameter Effect carrot gender Prm1 Intercept Prm2 carrot 0 Prm3 carrot 1 Prm4 gender 1 Prm5 gender 2 Prm6 latitude Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 96 63.7618 0.6642 Scaled Deviance 96 63.7618 0.6642 Pearson Chi-Square 96 46.7434 0.4869 Scaled Pearson X2 96 46.7434 0.4869 Log Likelihood -84.8809 Algorithm converged. Analysis Of Initial Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 -0.6521 0.6982 -2.0206 0.7163 0.87 0.3503 carrot 0 1 0.4832 0.2831 -0.0716 1.0381 2.91 0.0878 carrot 1 0 0.0000 0.0000 0.0000 0.0000 . . gender 1 1 0.2052 0.2781 -0.3398 0.7502 0.54 0.4605 gender 2 0 0.0000 0.0000 0.0000 0.0000 . . latitude 1 -0.0100 0.0190 -0.0472 0.0272 0.28 0.5980 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. GEE Model Information Correlation Structure Unstructured Subject Effect id (100 levels) Number of Clusters 100 Correlation Matrix Dimension 1 Maximum Cluster Size 1 Minimum Cluster Size 1 Algorithm converged. Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept -0.6521 0.4904 -1.6134 0.3091 -1.33 0.1836 carrot 0 0.4832 0.1954 0.1003 0.8662 2.47 0.0134 carrot 1 0.0000 0.0000 0.0000 0.0000 . . gender 1 0.2052 0.1848 -0.1570 0.5674 1.11 0.2669 gender 2 0.0000 0.0000 0.0000 0.0000 . . latitude -0.0100 0.0127 -0.0350 0.0150 -0.79 0.4324 Contrast Estimate Results Standard Chi- Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq Beta Carrot 0.4832 0.1954 0.05 0.1003 0.8662 6.12 0.0134Exp(Beta Carrot) 1.6213 0.3168 0.05 1.1055 2.3777Beta Gender 0.2052 0.1848 0.05 -0.1570 0.5674 1.23 0.2669Exp(Beta Gender) 1.2278 0.2269 0.05 0.8547 1.7637Beta Latitude -0.0100 0.0127 0.05 -0.0350 0.0150 0.62 0.4324 Exp(Beta Latitude) 0.9900 0.0126 0.05 0.9656 1.0151

We have also requested the RRs for gender and latitude in the **estimate**
statement. In this case, adjusting for them does not reduce the association
between having the carrot-loving gene and risk of needing corrective lenses by
age 30.

One should always pay attention to goodness of fit statistics and perform
other diagnostic tests. Refer to Categorical Data Analysis Using the SAS
System, by M. Stokes, C. Davis and G. Kock for standard methods of checking
whichever type of model you use.

1. McNutt LA, Wu C, Xue X, Hafner JP.
Estimating
the Relative Risk in Cohort Studies and Clinical Trials of Common Outcomes.
Am J Epidemiol 2003; 157(10):940-3.

2. Zou G. A
Modified Poisson Regression Approach to Prospective Studies with Binary Data.
Am J Epidemiol 2004; 159(7):702-6.

3. Sander Greenland ,
Model-based
Estimation of Relative Risks and Other Epidemiologic Measures in Studies of
Common Outcomes and in Case-Control Studies,

American Journal of Epidemiology 2004;160:301-305

4. Cook TD. Up with
odds ratios! A case for odds ratios when outcomes are common. Acad Emerg Med
2002; 9:1430-4.

5. Spiegelman, D. und Hertzmark,
Easy SAS
Calculations for Risk or Prevalence Ratios and Differences, E American
Journal of Epidemiology, 2005, 162, 199-205.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.