Multivariable "structural" models of development

Idea: Just as standard regression models allow prediction of a dependent variable on the basis of independent variables, structural models can allow a sequence of predictive steps from root ("exogeneous") through to highest-level variables. Although this kind of model seems to illuminate issues about factors that build up over the life course, there are strong criticisms of using such models to make claims about causes.

Initial notes on the Cases

From PT:
Cases: Kendler et al. 2002 on pathways to depression in women: Notice the high R^2 and the way the authors tease out different kinds of pathways to depression from the model they fit to their data.
Ou's 2005 synthesis of pathways from pre-school programs to later outcomes: Notice the different kinds of networks Ou reviews in the literature before presenting her own analysis.
Freedman 2005 is a statistician who questions whether structural models can be thought of as causal models and tries hard to make his questioning accessible (i.e., with a minimum of technical language [not zero however]).

During the class, CI and PT propose that we look first at Ou's and Kendler's diagrams,
then do Q&A on the technical aspects of path analysis and SEM primed by the substantive statement below,
then work our way through Freedman's critique. (Guided by pdf narrated by PT on "Reading Freedman," as a case study in learning as much as possible from a paper that might at first seem hard going. [pdf available on request from PT.] )

Substantive statement

PT's first attempt at a non-technical introduction to path analysis and structural equation modeling (alternatives expositions welcome)

Path analysis is a data analysis technique that quantifies the relative contributions of variables (“path coefficients”) to the variation in a focal variable once a certain network of interrelated variables has been specified (Lynch & Walsh 1998, 823). Some of these contributions are direct and some mediated through other variables, i.e., indirect. Although some researchers interpret “contribution” in causal terms (e.g., Pearl 2000, 135 & 344-5), others criticize such an interpretation (e.g., Freedman 2005). Here, contribution refers neutrally to the term of an additive model fitted to data.

The conceptual starting point for path analysis is an additive regression model that associates the focal (“dependent”) variable with several other measured (“independent” or “exogenous”) variables.

X1 ----|
X2 ----|----> Y
X3 ----|

Technically, the additive model is transformed by subtracting the mean from every term, squaring the expression (so it is an equation for the variance), and dividing by the variance of the focal (“dependent”) variable. The result is the “equation of complete determination,” with the regression coefficients being multiplied by the SD of the other “independent” variables and divided by the SD of the focal variable to arrive at the path coefficient.

The next step is to consider more than one focal, “endogenous” variable and networks of exogenous and endogenous variables that you have reason to think are associated with one another. Indeed, the focal variable of one regression may be among the variables associated with a second focal variable and so on. In the figure below X3 has a direct link with Y2 and an indirect one through Y1.

X1 ----|
X2 ----|----> Y1 -|--> Y2
X3 ----|-------------|

The software (e.g., LISREL) can solve these linked regression equations, but it is up to you to compare the results using the network you specify with plausible (theoretically-justified) alternatives that may link exogenous, independent variables and endogenous variables differently. Unlike multiple regression, we do not arrive at our idea of what should be in the regression by adding or subtracting variables in some stepwise procedure.

Structural equation modeling extends path analysis to include latent (a.k.a. unmeasured) variables or “constructs.” These latent variables are sometimes the presumed real underlying variable of which the measured one is an imperfect marker. For example, birth weight at full term and the neonate APGAR scores might be the measured variables but the model might include degree of fetal under-nutrition as a latent variable. Latent variables can also be constructed by the software in the same way that they are in factor analyses, namely, as economical (dimension-reducing) linear combinations of measured variables. Calling the networks of linked variables “structural” is meant to suggest that we can give the pathways causal interpretations, but SEM and path analysis has no trick that overcomes the problems that regression and factor analyses have in exposing causes.

This section is not needed for understanding the papers for this week. However, looking ahead to studies of heritability (part of week 12), a field in which path analysis originated, there are no measured variables except the observed focal variable (e.g., height). Path analysis can still be used if we convert the additive model on which any given Analysis of Variance (AOV or ANOVA) is based into an additive model of constructed variables that take the values of the contributions fitted to the first model. For example, in an agricultural evaluation trial of many varieties replicated one of more times in each of many locations, the AOV model is

Yijk = M +Vi +Lj +VLij +Eijk (eqn. 1)

where Yijk denotes the measured trait y for the ith variety in the jth location and kth replication;
M is a base level for the trait;
Vi is the contribution of the ith variety;
Lj is the contribution of the jth location;
VLij is an additional contribution from the i,jth variety-location combination—in statistical terms, the “variety-location-interaction” contribution; and
Eijk is a noise contribution adding to the trait measurement.

The path model equivalent to equation 1 is
Yx = M +Z1x +Z2x +Z3x +Ex (eqn. 2)

where
Y is the measured trait as before and x denotes the replicates
Z1x = Vi if x if a replicate of variety i, or 0 otherwise
Z2x = Lj if x if a replicate in location j, or 0 otherwise
Z3x = VLij if x if a replicate of variety i in location j, or 0 otherwise
Ex = Eijk where x is replicate k of variety i in location j

The path coefficients are then set to equal the square root of the ratio of the variance of the contribution (Vi, etc.) to the total variance for the trait (Y). The equation of complete determination becomes
1 = ? variance (Zw) / var(Y) (eqn. 3)
where w denotes the different contributions in the Analysis of Variance model.

For the agricultural trial this equation might be written
1 = [var(V) + var(L) + var(VL) + var(E)] / var(Y) (eqn. 4)
where V = variance of the vi terms, etc.

In human studies the var(VL) is ignored and this is expressed as
1 = heritability + shared environmental effect + non-shared environmental effect (eqn. 5)

When the same trait is observed in two relatives, their separate path analyses can be linked in one network and the correlation between the relatives calculated (Lynch & Walsh 1998, 826)—provided it is assumed that the contributions (and path coefficients) apply to both and that the noise contributions are uncorrelated. If we have data on correlations for different kinds of relatives (e.g., identical vs. fraternal twins), we can estimate the relative size of the contributions in equations such as 4 and 5. That’s the crux of heritability studies.

References
Freedman, D. A. (2005). Linear statistical models for causation: A critical review. Encyclopedia of Statistics in the Behavioral Sciences. B. Everitt and D. Howell. Chichester, Wiley.
Lynch, M. and B. Walsh (1998). Genetics and Analysis of Quantitative Traits. Sunderland, MA, Sinauer.
Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge, Cambridge University Press.
http://en.wikipedia.org/wiki/Apgar_score

Response to Today's Class

When we looked at the article of Kendler et al (2002), we talked about how the authors came to choose those variables, and they seem to follow what they did in their study done in 1993, and added three more variables, which are childhood sexual abuse, conduct disorder (highly aggressive children have found to be associated with a high rate of committing crimes by age of 30, maybe it’s related to their behaviors, feelings, etc. over the life course), and substance misuse. That previous study would probably explain more about how they came to choose variables for internalizing, externalizing, and adversity, or even the hierarchical structure.

We also talked about twin studies. The data used in the study were from one of the biggest twin studies, so I would say the outcomes would be pretty reliable. Dr. Taylor also mentioned that recently there was a study done by using male twin samples, and results turned out to be quite similar to what we saw in today’s article.

As for their objective for the study, they wanted to “generate a developmental model for the etiology of major depression.” As Dr. Taylor said, he wanted to know what causes depression, so he/they chose structural equation modeling, which is to tease out causation with a consideration of unmeasured, confounding variables, unlike a regression model which is just to see any association between a dependent variable with an independent variable (holding everything constant, or taking all other variables into account). It would be interesting to see how and which therapy has the most effect on bringing out positive outcomes/behaviors from the depressed patients, which would probably lead to another study using structural equation modeling to see the result. So putting a therapy/treatment in the model wouldn’t work because those factors in the article are considered as risk factors, and a therapy/treatment would be a protective factor, and probably won’t fit into this hierarchy, I think.

In his article, Ou (2005) chose to use structural equation modeling to estimate causal relations (among the observational variables) based on the model in Figure 1. He shows those observational variables in Figure 2 with fatty and skinny lines, and told us that “among the five hypotheses, the cognitive advantage hypothesis contributed most to the indirect effect of preschool participation.” He explained, “The emboldened arrows denote that the intervening factors directly mediated the effect of CPC preschool participation.” So other thin lines are indirect effects. “Overall, 68.6% of the indirect program effect on highest grade completed was accounted for by cognitive advantage, family support, and school support.” Those variables act as a go-between variable to the next variable are in the hierarchical or in a sequential pattern, and eventually to highest grade completed by age 22 (one of the pathways on educational attainment).

Natural (or Quasi-) experiment is the “study in which the investigator measures the impact of some naturally occurring event that is assumed to affect people’s lives.” The independent variable is the “event.” This information is from a text book, Developmental Psychology edited by Shaffer, Wood, and Willoughby in 2002.

As for the question of how policy makers may see this and use the results, it would depend on which point they choose to intervene at the local, state, or government level. They can choose to do something at the beginning point, or choose to do something in the middle, or even close to the end, maybe depending on the policy and budget. Also they need to consider how feasible it is for implementing a policy. Since policy makers often can’t wait for months or years to see a result of a study, the world may change before a study related a policy issue has completed. In that case, the outcomes of the study may be able to use at lower level (e.g. a school principal encouraging preschool teachers).

One of our professors at UMassBoston who is teaching Advanced Statistics seems to spend a semester teaching this structural equation modeling. So probably we won’t get this complicated model in 2.5 hours, but if we want to know more about it, we may want to try to audit his class, if that’s possible.
And we can reveiw some of the important points stated by Freedman. (km)

Annotated additions by students

Not sure if this would help or confuse more people, but I found a web site that explains sturctural equation modeling. The site is www.statsoft.com/textbook/stespath (sa)

Unfortunately, this site is not working correctly. I have emailed as a word doc attachment. (sa)

Rini CK, Dunkel-Schetter C, Wadhwa PD, et al. Psychological Adaptation and Birth Outcomes: The Role of Personal Resources, Stress, and Sociocultural Context in Pregnancy. Health Psychology 1999;18:333-45.

This article uses structural equation modeling (SEM) to examine the relationship between low birth weight (LBW) and pre-term delivery (PTD) and three constructs: personal resources, prenatal stress, and sociocultural factors among a sample of white and Latina women in their late second or early third trimester. Personal resources include mastery, self-esteem, and optimism, which are measured using itemized scales; stress is divided into state and pregnancy anxiety, also measured by itemized scales; and sociocultural factors included income, education and ethnicity. Of the Latinas in the study, 77% were foreign-born Mexican women. Given the varying constructs, the number of instruments used, and the need to translate the instruments into Spanish, the authors conduct internal reliability tests for each instrument in both English and Spanish and further engage in factor analysis with regard to some of the instruments that demonstrated lower reliability. The authors use structural equation modeling to evaluate relationships between the variables, including the introduction of interaction terms to determine whether personal resources or sociocultural factors alter the effects of stress on LBW or PTD. Among other things, the authors find that personal resources have a positive direct effect on birth weight for both low and high income women, and an indirect effect on length of gestation through the mechanism of stress reduction. Furthermore, low SES (as measured by education and income), which is endemic to the Latina population studied, predicts lower birth weight babies. An unexpected finding was that higher income had a positive association with prenatal stress. This article proves useful in detailing some of the aspects of SEM and explaining how they apply to studies of health outcomes and their correlates. It also sheds a slightly different perspective on some of the issues we discussed in previous weeks related to the fetal origins hypothesis and the life course approach. (lh)

Grimm KJ. Multivariate longitudinal methods for studying developmental relationships between depression and academic acheivement. International Journal of Behavioral Development 2007;31 (4):328-339.

The author investigates the relationship between two developmental processes: how the development of depression is related to the development of acheivement. He uses 3 longitudinal models to illustrate this and each model is fitted to repeated measurements of childresn's depression and acheivement. In the literature review regarding acheivment and depression, the author states that acedemic performance has been identified as both a cause and consequence of depression in children. Most of the studies in the past have been cross-sectional "research based on a within-time correlation from a single measurement occasion". Of the longitudinal ones, they were designed as "two-occasion, test-retest design". He justifies his study by stating the need for "multi-occasion" longitunial studies to examine the "time-dependent" relationship between acheivement and depression. The three models ( all based on latent growth curve model) were utilized to investigate this relationship. They include:bivariate latent growth curve model, latent growth curve with a time-vary covariate model, and bivariate latent difference score growth model. Each model is analyzed indepth and the equations for eahc model are presented. The author concludes: (1) that the level of depression was negatively realted to level of acheivement at age 8 in the bivariate latent growth model,(2) depression was unrelated to acheivement when controlling for acheivement change from age 8-14 in the time-vary covariate model, (3)the level of acheivement was negatively related to the level of depression at age 8, the level of acheivement at age 8 was positively associated with the constant change factor for depression, and acheivement was a negative indicator of changes in depression based on results from the bivariate dual change model. The author does acknowledge that there seems to be inconsistancy in the results, but he concludes that the differing results, are each correct, but correct for different developmental questions. The author usess the results of this study to advocate for more longitudinal multivariate models to investigate developmental issues. (CI)

Second attempt at posting

Structural Equation Modeling of the Management Practice and Organizational Processes Questionnaire (MPOP). Kim Y, Whitman G, Whitman G, Davidson L, Wang SL.

Abstr Acad Health Serv Res Health Policy Meet. 2002; 19: 47.

Here’s a different view on structural modeling. This nursing research article focused on examining the impact of managerial practices and organizational processes in relation to patient outcomes and achieving quality care. Their purpose for utilizing structural modeling was to give support to the reliability and validity of a particular questionnaire called the MANAGEMENT PRACTICE AND ORGANIZATIONAL PROCESSES QUESTIONNAIRE (MPOP). This sub-study was a secondary analysis of prospective data relating to process variables defined as leadership, coordination, communication and conflict management. I won’t pretend to say that I understood all the statistical data that followed, but I thought it was interesting that while looking up research for another class I came upon this. However, the conclusion noted that there was sufficient evidence that the MPOP was a reliable and valid measure of managerial practices and organizational processes and could be a valuable tool to use in outcomes research. (sa).

Stiffman, A.R., Hadley-Ives, E., Else, D., Johnson, S. Dore, P. (1999). Impact of Environment on Adolescent Mental Health and Behavior. American Journal of Orthopsychiatry. January 69(1).
This study incorporates a bit of neigbhorhood effects from a few weeks ago with structural equation modeling (SEM). The study used structural equation modeling to examine the relationships between perceived neighborhood environments, objective neighborhood environments, environmental support and mental health among adolescents. The authors posited a model in which objective neighborhood qualities (e.g. census data indicators - proportion on public assistance, below poverty level, of rental units, unemployment, etc.) both directly and indirectly effect adolescent mental health, indirect effects being mediated by the way in which youth perceive their environment (e.g. self report of drugs, shooting, murders, abandoned buildings, homeless, prostitution, people on welfare in neighborhood). And that environmental support (e.g. peer influence, family instability, family support, family mental health) would improve both the youths’ perception of their neighborhood and directly impact their mental health. Nearly 800 youth, aged 14-18, were recruited from youth services locations (e.g. health, juvenile justice, child welfare or education) in St. Louis and interviewed. Used a two-step process in analysis: first principal factor analysis; then structural equation modeling using factor scores from step one to examine the fit of the model. The results showed that perceived neighborhood environment, exposure to violence and environmental support directly contributed to mental health; youth with worse mental health problems were more likely to be exposed to violence, more likely to perceive their neighborhood as deteriorating and less likely to have environmental support. Objective environment indicator did not have a direct impact on adolescents’ mental health, but was mediated through its influence on perceived neighborhood environment. The study showed the importance of youth perception to understanding mental health and behavioral problems and that it was most often based in reality: the youth who perceived that their neighborhood was deteriorating were more likely to have been exposed to violence, less likely to environmental support and more likely to live in an environment that was indicated as deteriorating. The study also demonstrated the importance of social support as it partially canceled out the impact of deteriorating environments on the problems of youth. The findings point to the importance of including parents and families in mental health prevention and to understanding the relationships between youth perceptions and their environment when developing interventions.

Chandola, T., Clarke, P. & Blane, D. (2006). Pathways between education and health: a causal modelling approach. Statistics in Society, 169 (2): 337-359. Retrieved on November 7, 2007 from http://www.ucl.ac.uk/epidemiology/educationhealth/working%20paper%20for%20website.pdf

This is a very accessible article that examines 6 pathways that have been hypothesized to link education and health. It is instructive in the way the authors explain the rationale and accompanying studies for each of the six pathways in detail; describe the selection of variables from the National Child Development Study data source (specifically, adult and adolescent health status, sense of control, healthy behavior, education, childhood and adult social class, and childhood cognitive ability); explain their choice of structural equation modeling as the statistical methodology of choice; and illustrate the added insights that can are generated when using latent (unobserved) variables. Even lacking the background to completely follow the statistical modeling, it is clear that more fine-grained and nuanced associations can be drawn by using a SEM approach. The study found that in men the direct effect of education on health was not significant, and in women the direct effect was negative. Instead, using SEM, a “total effect” was estimated which included the effect of education on social class, the sense of control and health behaviors. This so-called total effect of education was found to be positive for men and women, with the different components explaining certain parts of the association. The explanation of how certain effects might work indirectly through different variables was especially helpful, and led the authors to suggest that policies could be aimed at specific pathways rather than directly at education. (JC)