Confounders & conditioning of analyses


Idea: Statistical associations between any two variables generally vary depending on the values taken by other "confounding" variables. We need to take this dependency (or conditionality) into account when using our analyses to make predictions or hypothesize about causes, but how do we decide which variables are relevant and real confounders?

Initial notes on the Cases

From PT:
Before reading this week's articles, read Gordis on confounding variables (chaps 14 old editions or 15 new edition) & chapter 3 or 4 on age adjustment (standardization).
When reading the articles, make notes on how the readings address the topic of adjusting for confounding variables (which includes age-standardization) and identify controversies or discordant views about how to do this.

Cases:
Immunization levels (Egede): Note the conclusion about racial/ethnic inequality even after adjusting for other variables thought to correlate with race/ethnicity. Do you agree with the three implications p. 326ff) drawn from the results?

SES gradients in disease (Krieger): The abstract states that "for virtually all outcomes, risk increased with CT [census tract] poverty, and when we adjusted for CT poverty, racial/ethnic disparities were substantially reduced." Where can the result of adjustment be seen in the paper? (This paper also fits in week 7 on inequalities.)

Hormone replacement therapy (Prentice vs. Petitti): Notice the adjustments used by the first paper that bring the clinical component of the WHI hormone replacement trial into line with the observational component. Do Pettiti acknowledge and rebut this in concluding that it was wrong to think that hormone therapy prevents CV disease?

Birth weight and blood pressure (Huxley vs. Davies): Along with Huxley et al's general argument that the birthweight-adult blood pressure association may well be an artifact of selective publication of studies with small sample size, they criticise the adjustment of the association for adult weight. (In other words, the association holds for people in the same stratum or slice of weight.) Try to form an opinion about whether you agree or disagree with such an adjustment. Davies et al. provide counter-evidence to Huxley et al. -- how does their study differ in methods, results, and interpretation?

Control at work and mortality (Davey-Smith 1997): This simple study shows that "control at work" is not the cause of SES gradients in health outcomes. What method(s) do they use to undermine previous claims about control at work?

Mendelian randomization to analyze environmental exposures (Davey-Smith & Ebrahim 2007): The approach introduced in this paper is cutting edge "epidemiology in the age of genomics" and has led to funding of a major new Research Center under Davey-Smith at Bristol. I suggest that you summarize for yourself the logic of this approach so you can explain it to someone who's never heard of it.

Plan for Week #6.
From JQ
To break up the workload a bit, I'm alphabetically assigning one article to each student and asking you to present a few sentences about the article: variables, methods, outcome, possible confounders, answer the questions Peter posted on your article, say something about a table in the article, or whatever else seems important to you. There are 7 articles and we have (not counting Jan b/c I don't know if she'll be able to take part in the class), 7 students besides me so it works out fine. Your alphabetical assignments are as follows:
Susan: Egede et al., 2003
Elizabeth: Krieger et al., 2005
Judy: Prentiss et al., 2005
Louisa: Petitti and Freedman, 2005
Connie: Huxley et al., 2002
Kaori: Davies et al., 2006
Jill: Smith et al., 1997
If someone would like to volunteer to present their summarization of the Davey-Smith and Ebrahim article on Mendelian Randomization, please let me know. Otherwise, Peter will present on this topic.
Any questions, please let me know. Will post my substantive statement on Sunday.



Substantive statement


diagram about confounding

Confounding Factors and Age Standardization
As we all know, the goal of scientific investigation is to achieve, and thus present results from, tightly controlled experiments. However, realization of this “ideal” is not always possible. Animal researchers are better able to achieve an ideal situation because they can ensure that their subjects are identical or as similar to each other as is possible: they are bred in controlled conditions, live in identical housing situations, and are fed and cared for in the same way. The goal of this method is to conduct research on groups that differ on just one factor. The animals are then randomly assigned to specific groups for testing. Researchers observe the animals, with the observer blind as to which ones are in each group. A resulting difference between the groups leads the researcher to conclude that such difference is due to either the exposure variable or to chance.

In human studies, however, it is impossible to institute such tight controls. Ethical, logistic, and scientific issues pose problems for scientists. Most epidemiological research does not involve a simple assessment of whether or not a relationship exists between two variables. Instead, the exposure-disease association is generally mixed up with other effects on the relationship between the two variables; these effects are produced by an extraneous variable or variables.

Figure 1: Example of confounding

Z - - - > X - - - > Y
|- - - - - - - - - - - - ^



In Figure 1 above, X and Y are confounded by Z. This occurs when variable Z influences both X and Y. Note that Z is not an intermediate link in the causal pathway between exposure (X) and disease(Y). This phenomenon, known as confounding, is one of the most important problems, and a central concept, in epidemiological studies. It describes the relationship among several variables and the risk for disease. Knowing when confounding may occur, how it can result in bias, how to assess the presence of confounding, and how to adjust for it are crucial to the study of exposure and disease.

The first step is to list possible extraneous variables—such as age, race, gender, socioeconomic status—that may be potential confounders. Such factors would be associated with the predictor variable of interest and the cause of the outcome being studied. As we saw in examination of the tables in the Lawler et al. article last week, the associations between antioxidants and disease in the observational studies were confounded by social and behavioral factors of subjects over the life course. Socioeconomic disadvantage over the life course and adult behaviors such as smoking and being obese were associated with vitamin status.

The problem with confounding is that it can lead to a distortion in the resulting “crude” estimate; this can produce an overestimation or underestimation of an effect, depending on the direction of the association with the exposure or the disease. Unlike bias, there are ways to deal with the occurrence of confounders. One way is through mathematical strategies. Use of statistical analyses can effectively adjust or control for confounding factors and remove or minimize their effect. To determine if a confounding factor is present or absent, it is necessary to compare the crude effect measure with an “adjusted” effect measure, i.e., the estimate after confounding is removed. The extent of the confounding is inferred from any discrepancy found between the crude estimate and the adjusted estimate after removal of extraneous variables. Of course, it is only possible to adjust for those factors about which we know; remaining confounding may persevere if another variable(s) confounds the adjusted result.

The following example shows the occurrence of confounding factors in a particular situation and possible methods to use to eliminate or reduce the confounders. Say that women who use oral contraceptives (OCs) are more prone to myocardial infarction (MI), i.e., heart attack, and also that there really is no causal association between the two, i.e., the null hypothesis is true. The first study design might make a comparison of two groups of women in a cohort, one group that uses OCs and one that does not. A second design would compare women with MI to those not affected, a case-control study. We see that there appears to be a relationship between OC use and MI, but before stating that OC use “causes” heart attack, we ask the following questions: Is this association influenced by some other differences between the groups? Is there an alternative explanation for what we’ve observed? If so, then we say that the association is confounded by another factor. To be a confounder, the factor must be a risk factor for the outcome (MI) and also related to the exposure (OC use).

One possible factor to consider might be smoking: it is a known fact that people who smoke are at higher risk for MI; there’s also evidence that women who use OCs tend to smoke more than non-users. In the first design, OC users smoke more, so they will be more prone to MI. And, in the second design, smoking will be higher in women who experience MI because smoking is a causal factor. Also, since smoking is associated with OC use, OC use will also be more common in those women with MI. Smoking is a confounder in this case because it is related to the study outcome and independently related with exposure. The figure below illustrates smoking as a confounding factor.

Example of Confounding

||
||

||

||
myocardial
infarction
oral contraceptive
use
smoking
+
OC users smoke more heavily than non users
+
smokers, irrespective of OC use, have a higher risk of MI than non-smokers
||

same here--please see email attachment

Source: Critical Appraisal of Epidemiological Studies and Clinical Trials, 1998.

How can we control for the confounder factor of smoking? In addition to statistical strategies, two other techniques that have been effective in control of confounders are restriction and stratification. Restriction involves the imposition of a constraint on the study. Various restrictions and their possible effect on the above example include the following:

Restriction#1: include only women who have never smoked in the study
Effect: can then only generalize study results to non-smokers

Restriction#2: include only smokers and compare OC users to non-users
Effect: differences in amount and/or duration of smoking may now still potentially confound results because not all women smoke the same amount and smoking may have a dose-response relationship with MI

Restriction#3: include only women who smoke 10-20 cigarettes per day for 5-10 years
Effect: would further reduce the possibility of confounding

Another strategy to control for confounders is stratification, the forming of strata or layers of data based on the suspected confounding factor. By evaluating the effect of an exposure within specific strata of the confounding variable, it is possible to reduce or eliminate confounding. This is the most widely used method of controlling for confounders. In the above example, it would be necessary to use only one study instead of three to avoid confounding. This could be achieved by including OC users and non OC users without limiting smoking. However, the researcher would record levels of smoking—amount and duration—for subjects. Then, the investigator could compare OC users and non OC users at various levels of smoking.

Age is an ever-present confounder in the study of disease. Since the frequency of most diseases increases with age, the confounder of age is a risk factor for outcome. “Age standardization, also called age adjustment, is a method that applies observed age-specific rates to a standard age distribution.” Using age-adjusted rates eliminates differences in crude estimates that are due strictly to age. This allows more accurate comparison of groups that differ in age structure. For example, in the United States, White and Hispanic populations have different age structures. Age is associated not only with numerical age but with many health outcomes and risk factors in particular populations. This must be taken into consideration when controlling for the confounding effects of age. Knowing the age standardization proportions to use and applying them to the comparison populations is crucial to elimination or minimization of potential confounding of study outcomes.

References
….Age Standardization and Population Estimates. Deaprtment fo Health and Human Services Centers for Disease Control and Prevention (2007). Accessed on 10.4.07 at http://0-www.cdc.gov.mill1.sjlibrary.org/nchs/tutorials/currentnhanes/NHANESAnalyses/AgeStandardization/age_standardization_intro.htm
….Encyclopedia of Public Health/ Standardization (of Rates) (2007). Accessed on 10.07.07 at http://www.enotes.com/public-health-encyclopedia/standardization-rates
Elwood, J. M. (1998). Critical Appraisal of Epidemiological Studies and Clinical Trials. New York: Oxford University Press.
Gordis, L. (1996). Epidemiology. Philadelphia: W. B. Saunders Company.
Klein RJ, Schoenborn, CA. Age Adjustment using the 2000 projected U.S. population. Healthy People Statistical Notes, no. 20. Hyattsville, Maryland: National Center for Health Statistics. January 2001.
Kleinbaum, D. G., Kupper, L. L., and Morgenstern, H. (1982). Epidemiologic Research: Principles and Quantitative Methods. New York: Van Norstran Reinhold Company.
Rothman, K. J. (1986). Modern Epidemiology. Boston: Little Brown and Company.


Annotated additions by students


Fiscella, K. (2005). Commentary – Anatomy of racial disparity in influenza vaccination. Health Services Research, 40 (2): 539–550.
PubMed Central version (with links to references)

This piece is a hybrid commentary/review article. Fiscella uses the study by Hebert et al in the same issue of Health Services Research as a jumping off point for his discussion and review of work being done in the area of racial disparities in health care in general. He sees influenza vaccination as a kind of case study that can enlighten the broader issues around discrimination in health care. Explanations by researchers for the disparities in influenza vaccination have generally been grouped into 5 categories: less frequent care because of access barriers; worse health status in general; patients’ knowledge of and attitude toward vaccination; ‘unconscious’ provider bias; and lower standard of provider care.

This article is valuable because it not only discusses the issues above but the methodologies used to address them in the study by Hebert et al. Fiscella characterizes the use of secondary data by Hebert et al as “creative” in that they used self-reported sociodemographic data and self-reported health and other survey data with Medicare claims data. The main findings were that: 1) resistant attitudes contribute to racial, but not ethnic disparity, in receiving the vaccine; 2) access to care makes only a modest contribution to disparities in vaccination; 3) there is little evidence of provider racial bias in administering the vaccine, but that racial differences in patient initiative and source of care contributed to these disparities. Fiscella evaluates the validity of these findings by looking at sample size, alternative explanations such as selection bias and limited statistical power, and response bias. In general, he concludes that important implications can be drawn from the Hebert study. One is that poor communication between patients and providers needs to be addressed, and when it is – as in a community-based program using tracking, recall, and outreach – racial disparities can be significantly diminished. (JC)

Christenfeld, N.J.S., Sloan, R.P., Carroll, D., & Greenland, S. (2004). Risk factors, confounding, and the illusion of statistical control. Psychosomatic Medicine, 66: 868-875.

An instructive and enjoyable read, this article discusses the errors that can follow the use of so-called “statistical adjustment.” First – he notes that statistical adjustments perform less well in “the epidemiological tasks to which they are put” than in the experimental studies for which they were designed. In line with my own opinion on the matter, he says that “…almost any interesting variable will be linked to some health outcome” if the study is large enough and has many measured outcomes and variables. It is important that we be able to distinguish between a marker of a disease condition and an actual causal risk factor. The authors give examples of problems such as mismeasurement and mis-specification, assuming linearity where there may be none, replication, mediators versus confounders, and - one of the most interesting sections - constructs versus operationalizations. Using the example of assessing the effect of religiosity on health, he notes that, “…we might like to control for initial health status, and we may even report that we have done so. The best we can do, however, is adjust for the effects associated with a particular measure of health, or a limited set of measures, using a particular statistical method” (emphasis mine). This is because there is usually a difference between a particular operationalization and the underlying conceptual construct. The non-specialist will be pleased that there are plenty of apt examples given in the article that usefully illustrate the potential errors using statistical adjustment; I give this article an enthusiastic ‘thumbs up.’ (JC)


Eliseo Guallar, Ellen K. Sibergeld, Ana Navas-Acien, Saurabh Malhotra, Brad C. Astor, A. Richey Sharrett, and Brian S. Schwartz (2006) Confounding of the Relation between Homocysteine and Peripheral Arterial Disease by Lead, Cadmium, and Renal Function, American Journal of Epidemiology, Vol.163., No. 8, pp.700-708

Elevated Homocysteine levels are associated with peripheral arterial disease (PAD) in observational studies and this is often more strongly than with other cardiovascular disease endpoints. The possibility of preventing PAD through homocysteine-lowering interventions has received substantial attention. This is based on the assumption that the association between homocysteine and PAD is causal. But in randomized tests, the lowering of homeocysteine has shown no effect on cardio-vascular outcomes and suggest the possibility that previously unidentified confounders may account for the association of homeocysteine with PAD. The confounders of lead and cadmium levels plus smoking and renal function were added. In this analysis, after adjustment for age, gender, race/ethnicity and education, there was a progressive increase in PAD prevalence across quintiles of homocysteine. The odds ratio for PAD in the highest quintile as compared to lowest was 1.92. In contrast when tested for the presence of lead and cadmium – the adjusted odds ratio was reduced to 1.37. After further adjustment for smoking and estimated glomerular filtration rate, the adjusted odds ration was 0.89 In this large cross-sectional study in the general population, homocysteine, lead, and cadmium were each associated with the prevalence of PAD. However, adjustment for blood lead and cadmium, estimated glomerular filtration rate, and smoking completely eliminated the association of homocysteine with PAD while the association of lead and cadmium with PAD persisted after adjustment for homocysteine. Smoking and low level of renal function are established causes of PAD and of increase homocysteine levels. Adjustment for smoking and renal function and additional adjustment for lead and cadmium further reduced the association of homocysteine levels with PAD.. The importance of these confounders may differ across populations. Previous studies have adjusted for smoking but none have studied the confounding effects of lead and cadmium.(jg)

Ketterer, B. (1998). Dietary Isothiocyanates as Confounding Factors in the Molecular Epidemiology of Colon Cancer. Cancer Epidemiology, Biomarkers & Prevention, 7:645-646.

This is an editorial describing the issues of a previous research article in the same journal that investigated lowering the prevalence of colon/rectal cancer. The results of the study concluded that subjects exposed to dietary broccoli who have a GSTMI null phenotype are less susceptible to colon cancer than those who are GSTMI1 positive. The scientific importance of these results is that the dietary isothiocyanates are not commonly used to investigate epidemiology of cancer and that it is probable that there may have been confounding factors in a number of earlier studies that should be taken into account when designing future research in this area. The author sites that the reduction of colon cancer among high consumers of broccoli may be explained by inconsistencies in the studies of GSTMI genotype, including transport of the enzymes through the liver and its potential to either activate or detoxify carcinogens, the method and timing of how blood is transported with relationship to the enzymes. (ci)

Nichol, K.L., Nordin, J. D. Nelson, D. B. Mullooly, J.P., & Hak, E. (2007)
Effectiveness of Influenza Vaccine in the Community-Dwelling Elderly. New England Journal of Medicine, 357 (14):1373-1381
http://content.nejm.org/cgi/content/full/357/14/1373

This study makes the point that short-term studies may provide misleading pictures of long-term benefits of vaccination, and residual confounding may have biased past results. Accordingly, the population for this study examining the effectiveness of influenza vaccine in seniors over the long term consisted of 18 cohorts of community-dwelling elderly members of one U.S. health maintenance organization (HMO) for 1990–1991 through 1999–2000 and of two other HMOs for 1996–1997 through 1999–2000, which the authors define as flu 10 seasons. They found that influenza vaccination was associated with significant reductions in the risk of hospitalization for pneumonia or influenza and in the risk of death among community-dwelling elderly persons. What is interesting about this study, however, is the fact that – lacking important data about race, income, etc. in the records from the HMOs – they also conducted ‘sensitivity analyses’ theorizing that if an unmeasured confounder was present such that persons with the confounder were less likely to be vaccinated but more likely to be hospitalized or die, then their analyses would have overestimated vaccine effectiveness. As it happened, unpublished observations indicated that the study populations were mostly white, so they did not model on race but rather ‘functional status’ as a strong predictor of hospitalization or death. The limitations of the study were equally informative, with the authors noting that elderly ‘enrollees in HMOs may differ from elderly persons without HMO coverage in important ways, including race, income, functional status, and urban versus nonurban residence, and caution should be used in generalizing our results to other groups.' (JC)