THE IMPOSSIBILITY OF INFERRING CAUSATION FROM ASSOCIATION WITHOUT BACKGROUND KNOWLEDGE

James M. Robins and Larry Wasserman

Abstract:

Spirtes, Glymour and Scheines (SGS) and Pearl and Verma (PV) make the startling claim that it is possible to infer causal relationships between two variables X and Y from associations found in observational (non-experimental) data without substantive subject-matter-specific background knowledge. When causal relationships are represented by directed acyclic graphs (DAG's), SGS argue that their claim follows, mathematically, from two reasonable assumptions: (i) the sample size is sufficiently large and (ii) the distribution of the random variables is faithful to the causal graph. In particular, SGS have shown that under their faithfulness assumption, there exist methods for identifying causal relationships which are asymptotically (in sample size) correct.
However, we shall show that SGS's asymptotics implicitly assume that probability of there being ``no unmeasured common causes'' of X and Y\ is positive and not small relative to sample size. We prove that, under an asymptotics for which the probability of ``no unmeasured common causes'' is small relative to sample size, causal relationships are non-identifiable from the data alone, even when we assume distributions are faithful to the causal graph. We argue that, in observational epidemiologic, econometric, and social scientific studies, a formal asymptotic analysis that models the probability of ``no unmeasured common causes'' as small relative to sample size accurately reflects the beliefs of practicing professionals. We argue that these beliefs derive both from experience and from the fact that the world contains so many potential unmeasured common causes (i.e., confounders) that it is a priori highly unlikely that not a single one actually causes both X and Y. We conclude that, in observational studies, small causal effects can never be either reliably ruled in or ruled out; furthermore, one should not make the leap from even relatively large empirical associations to causation without substantive subject-matter-specific background information.

Here is the full postscript text for this technical report. It is 163157 bytes long.