16 March 2021, online causal inference seminar

- Kun has laid out the ML/AI importance very clearly
- To add value, let me talk about the statistical viewpoint

*Estimation*: Accepting a certain causal structure, what’re the effects of various configurations of the causes?*Discovery*: What is the causal structure of the system anyway?

- Includes e.g. policy learning as an application
- Get clear about the estimand
- Get clear about the conditions under which the estimand is or is not identified from the distribution of observables
- Get clever about the estimation and/or testing
- An immense task, but all resting on assumptions about the causal structure

- Given the distribution of observables, what’s the causal structure?
- Or what’s the range of possible causal structures?

- Experiment is one way of answering this!
- There had better be non-experimental ways
- E.g., geology explains the causes of earthquakes without randomized controlled trials on tectonic plate boundaries
- But maybe we’re fooling ourselves and the geologists don’t really know any more than astrologers

- Why think discovery problems are solvable?

- Hume (1739) (in modern language): all we can
*observe*is association (“constant conjunction”), not counterfactuals or causes (“necesary connexion”)- Anticipated by al-Ghazali (n.d.) in 1100 (it is not “habitual” for “a corpse to sit up and write an eloquent volume in a well-ordered script”, but habits can change)

- Bertrand Russell (1954): there is a special place in Hell for philosophers who think they have refuted Hume (or al-Ghazali)
- Is Kun putting his soul in danger?

- We need to impose
*some*assumptions - These should be weaker or more plausible than the assumptions used in estimation problems
- Causal assumptions in \(\Rightarrow\) causal conclusions out
- “The goal of therapy is to turn neurotic misery into everyday unhappiness” (attrib. Freud)
- The goal of causal discovery is to turn metaphysical misery into everyday statistical unhappiness

- Causal structure is
*qualitative*- The DAG, or the non-parametric structural equations, or Rubin-style ignorability

- Picking qualitative aspects of a statistical model is
**model selection** - Model selection has issues which don’t match continuous-parameter-estimation intuition, but it’s not impossible
- E.g. post-selection inference (perhaps jut by data-splitting)

- Model selection
*also*rests on assumptions

- The kind Kun has just explained to us:
*if*we make these assumptions about the causal structure,*and*we make those assumptions about the functional forms in the statistical model,*then*such-and-such a procedure will consistently select the right causal structure - Constraint-based methods, noise assumptions, etc.

- Some assumptions are harder to buy than others
- Linear models are very hard for me personally to swallow

- Expanding the range of assumptions under which we know we have consistent discovery procedures (should) make this work an easier sell
- Once we hit the frontier, there will be a trade-off between harder-to-swallow assumptions and more-precise conclusions (at fixed data size)
- Nonparametric conditional independence tests (Zhang et al. 2011) have lower power than partial-correlation tests for linear-Gaussian relations

- This again is no different from any other statistical problem! (Manski 2003)
- Anyone willing to estimate an ATE by propensity-score matching has declared their reservation price…

- We
*know*causal discovery is possible under causal and statistical assumptions- comparable to or weaker than assumptions for causal estimation problems

- We should think of this as a kind of model selection problem
- There is an immense field for statistical and econometric work here

al-Ghazali, Abu Hamid Muhammad ibn Muhammad at-Tusi. n.d. *The Incoherence of the Philosophers = Tahafut al-Falasifah: A Parallel English-Arabic Text*. Provo, Utah: Brigham Young University Press.

Hume, David. 1739. *A Treatise of Human Nature: Being an Attempt to Introduce the Experimental Method of Reasoning into Moral Subjects*. London: John Noon.

Manski, Charles F. 2003. *Partial Identification of Probability Distributions*. New York: Springer-Verlag.

Russell, Bertrand. 1954. *Nightmares of Eminent Persons*. New York: Simon; Schusters.

Zhang, Kun, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2011. “Kernel-Based Conditional Independence Test and Application in Causal Discovery.” In *Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (Uai-11)*, edited by Fabio Gagliardi Cozman and Avi Pfeffer, 804–13. Corvallis, Oregon: AUAI Press. http://arxiv.org/abs/1202.3775.