Back

ⓘ Causal graph. In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assum ..




                                     

ⓘ Causal graph

In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assumptions about the data-generating process. They can also be viewed as a blueprint of the algorithm by which Nature assigns values to the variables in the domain of interest.

Causal graphs can be used for communication and for inference. As communication devices, the graphs provide formal and transparent representation of the causal assumptions that researchers may wish to convey and defend. As inference tools, the graphs enable researchers to estimate effect sizes from non-experimental data, derive testable implications of the assumptions encoded, test for external validity, and manage missing data and selection bias.

Causal graphs were first used by the geneticist Sewall Wright under the rubric "path diagrams". They were later adopted by social scientists and, to a lesser extent, by economists. These models were initially confined to linear equations with fixed parameters. Modern developments have extended graphical models to non-parametric analysis, and thus achieved a generality and flexibility that has transformed causal analysis in computer science, epidemiology, and social science.

                                     

1. Construction and terminology

The causal graph can be drawn in the following way. Each variable in the model has a corresponding vertex or node and an arrow is drawn from a variable X to a variable Y whenever Y is judged to respond to changes in X when all other variables are being held constant. Variables connected to Y through direct arrows are called parents of Y, or "direct causes of Y," and are denoted by PaY.

Causal models often include "error terms" or "omitted factors" which represent all unmeasured factors that influence a variable Y when PaY are held constant. In most cases, error terms are excluded from the graph. However, if the graph author suspects that the error terms of any two variables are dependent e.g. the two variables have an unobserved or latent common cause then a bidirected arc is drawn between them. Thus, the presence of latent variables is taken into account through the correlations they induce between the error terms, as represented by bidirected arcs.

                                     

2. Fundamental tools

A fundamental tool in graphical analysis is d-separation, which allows researchers to determine, by inspection, whether the causal structure implies that two sets of variables are independent given a third set. In recursive models without correlated error terms sometimes called Markovian, these conditional independences represent all of the models testable implications.

                                     

3. Example

Suppose we wish to estimate the effect of attending an elite college on future earnings. Simply regressing earnings on college rating will not give an unbiased estimate of the target effect because elite colleges are highly selective, and students attending them are likely to have qualifications for high-earning jobs prior to attending the school. Assuming that the causal relationships are linear, this background knowledge can be expressed in the following structural equation model SEM specification.

Model 1

Q 1 = U 1 C = a ⋅ Q 1 + U 2 Q 2 = c ⋅ C + d ⋅ Q 1 + U 3 S = b ⋅ C + e ⋅ Q 2 + U 4, {\displaystyle {\begin{aligned}Q_{1}&=U_{1}\\C&=a\cdot Q_{1}+U_{2}\\Q_{2}&=c\cdot C+d\cdot Q_{1}+U_{3}\\S&=b\cdot C+e\cdot Q_{2}+U_{4},\end{aligned}}}

where Q 1 {\displaystyle Q_{1}} represents the individuals qualifications prior to college, Q 2 {\displaystyle Q_{2}} represents qualifications after college, C {\displaystyle C} contains attributes representing the quality of the college attended, and S {\displaystyle S} the individuals salary.

Figure 1 is a causal graph that represents this model specification. Each variable in the model has a corresponding node or vertex in the graph. Additionally, for each equation, arrows are drawn from the independent variables to the dependent variables. These arrows reflect the direction of causation. In some cases, we may label the arrow with its corresponding structural coefficient as in Figure 1.

If Q 1 {\displaystyle Q_{1}} and Q 2 {\displaystyle Q_{2}} are unobserved or latent variables their influence on C {\displaystyle C} and S {\displaystyle S} can be attributed to their error terms. By removing them, we obtain the following model specification:

Model 2

C = U C S = β C + U S {\displaystyle {\begin{aligned}C&=U_{C}\\S&=\beta C+U_{S}\end{aligned}}}

The background information specified by Model 1 imply that the error term of S {\displaystyle S}, U S {\displaystyle U_{S}}, is correlated with C s error term, U C {\displaystyle U_{C}}. As a result, we add a bidirected arc between S and C, as in Figure 2.

Since U S {\displaystyle U_{S}} is correlated with U C {\displaystyle U_{C}} and, therefore, C {\displaystyle C}, C {\displaystyle C} is endogenous and β {\displaystyle \beta } is not identified in Model 2. However, if we include the strength of an individuals college application, A {\displaystyle A}, as shown in Figure 3, we obtain the following model:

Model 3

Q 1 = U 1 A = a ⋅ Q 1 + U 2 C = b ⋅ A + U 3 Q 2 = e ⋅ Q 1 + d ⋅ C + U 4 S = c ⋅ C + f ⋅ Q 2 + U 5, {\displaystyle {\begin{aligned}Q_{1}&=U_{1}\\A&=a\cdot Q_{1}+U_{2}\\C&=b\cdot A+U_{3}\\Q_{2}&=e\cdot Q_{1}+d\cdot C+U_{4}\\S&=c\cdot C+f\cdot Q_{2}+U_{5},\end{aligned}}}

By removing the latent variables from the model specification we obtain:

Model 4

A = a ⋅ Q 1 + U A C = b ⋅ A + U C S = β ⋅ C + U S, {\displaystyle {\begin{aligned}A&=a\cdot Q_{1}+U_{A}\\C&=b\cdot A+U_{C}\\S&=\beta \cdot C+U_{S},\end{aligned}}}

with U A {\displaystyle U_{A}} correlated with U S {\displaystyle U_{S}}.

Now, β {\displaystyle \beta } is identified and can be estimated using the regression of S {\displaystyle S} on C {\displaystyle C} and A {\displaystyle A}. This can be verified using the single-door criterion, a necessary and sufficient graphical condition for the identification of a structural coefficients, like β {\displaystyle \beta }, using regression.



                                     
  • In graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path, or equivalently a connected acyclic undirected
  • A signal - flow graph or signal - flowgraph SFG invented by Claude Shannon, but often called a Mason graph after Samuel Jefferson Mason who coined the
  • In the area of graph theory in mathematics, a signed graph is a graph in which each edge has a positive or negative sign. A signed graph is balanced if
  • cosmology. In her interdisciplinary paper The Internal Description of a Causal Set: What the Universe Looks Like from the Inside Markopoulou instantiates
  • operational awareness or external causal relationships. Recent developments in big data analysis, combined with graph mining techniques, make it possible
  • Spirtes P, Glymour C 1991 An algorithm for fast recovery of sparse causal graphs PDF Social Science Computer Review. 9 1 62 72. doi: 10.1177 089443939100900106
  • such as 360 - degree feedback Feedback arc set, in graph theory, a method of eliminating directed graphs Feedback vertex set, in computational complexity
  • undergo quantum fluctuations. Causal sets by Bombelli, Lee, Meyer and Sorkin All of spacetime at very small scales is a causal set consisting of locally finite
  • specifically in graph theory, a polytree also called directed tree, oriented tree or singly connected network is a directed acyclic graph whose underlying
  • a circle after the event E occurs, and if we graph the growing circle with the vertical axis of the graph representing time, the result is a cone, known
  • independence in probability theory is shared by undirected graphs Variables are represented as nodes in a graph in such a way that variable sets X and Y are independent
  • the counterfactual tradition of causal analysis with the variants of structural equation modeling worth keeping. The graph theory that he uses to accomplish
  • or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables

Users also searched:

...
...
...