Aaditya Ramdas – Sequential uncertainty quantification

A large fraction of published research in top journals in applied sciences such as medicine and psychology has been claimed as irreproducable. In light of this 'replicability crisis’, traditional methods for hypothesis testing, most notably those based on p-values, have come under intense scrutiny. One central problem is the following: if our test result is promising but nonconclusive (say, p = 0.07) we cannot simply decide to gather a few more data points. While this practice is ubiquitous in science, it invalidates p-values and error guarantees and makes the results of standard meta-analyses very hard to interpret. This issue is not unique for p-values: other approaches, such as replacing testing by estimation with confidence intervals, suffer from similar optional continuation problems. Over the last few years several distinct but closely related solutions have been proposed, such as the anytime confidence sequences and p-values, and safe tests.

Remarkably, all these approaches can be understood in terms of (sequential) gambling. One formulates a gambling strategy under which one would not expect to gain any money if the null hypothesis were true. If for the given data one would have won a large amount of money in this game, this provides evidence against the null hypothesis. The test statistic in traditional statistics gets replaced by the gambling strategy; the p-value gets replaced by the (virtual) amount of money gained. In more mathematical terms, evidence against the null and confidence sets are derived in terms of nonnegative supermartingales. While this idea in essence goes back to Wald’s sequential testing of the 1950s and its extensions by Robbins and co in the early 1960s and Lai in the 1970s, it never really caught on because it used to be applicable only to very simple statistical models and testing scenarios. However, recent work shows that this idea is essentially universally applicable – one can design supermartingales for large classes of nonparametric tests and many estimation problems, and one can analyze them using novel tools such as nonasymptotic versions of the law of the iterated logarithm. Also, these directions are able to somewhat unite Bayesian, frequentist ways of thinking; with the explicit ability to use prior knowledge, with correct frequentist inference often using Bayesian techniques.

Anytime-valid, safe confidence intervals and p-values (package) (tutorial)

  • Admissible anytime-valid sequential inference must rely on nonnegative martingales
    A. Ramdas, J. Ruf, M. Larsson, W. Koolen       arxiv

  • Confidence sequences for sampling without replacement
    I. Waudby-Smith, A. Ramdas       (NeurIPS 2020, submitted)   arxiv   app-continuous   app-discrete  

  • Time-uniform, nonparametric, nonasymptotic confidence sequences
    S. Howard, A. Ramdas, J. Sekhon, J. McAuliffe       The Annals of Stat., 2020   arxiv   code   tutorial

  • Time-uniform Chernoff bounds via nonnegative supermartingales
    S. Howard, A. Ramdas, J. Sekhon, J. McAuliffe       Prob. Surveys, 2020   arxiv   proc   talk

  • Universal inference
    L. Wasserman, A. Ramdas, S. Balakrishnan       PNAS, 2020   arxiv   proc   talk

  • Uncertainty quantification using martingales for misspecified Gaussian processes
    W. Neiswanger, A. Ramdas       (NeurIPS, submitted)   arxiv   code   talk

  • Sequential estimation of quantiles with applications to A/B-testing and best-arm identification
    S. Howard, A. Ramdas       (Bernoulli, major revision)   arxiv   code

  • Sequential nonparametric testing with the law of the iterated logarithm
    A. Balsubramani*, A. Ramdas*       UAI, 2016   arxiv   proc   errata

Multi-armed bandits

All of the aforementioned techniques come in handy when designing new algorithms for multi-armed bandit problems, as well as to understand what existing algorithms are doing in quite some generality.

  • On conditional versus marginal bias in multi-armed bandits
    J. Shin, A. Ramdas, A. Rinaldo       ICML, 2020   arxiv

  • Are sample means in multi-armed bandits positively or negatively biased?
    J. Shin, A. Ramdas, A. Rinaldo       NeurIPS, 2019   arxiv   poster   proc

  • On the bias, risk and consistency of sample means in multi-armed bandits
    J. Shin, A. Ramdas, A. Rinaldo       (Sequential Analysis, submitted)   arxiv   talk

  • MAB-FDR: Multi (A)rmed/(B)andit testing with online FDR control
    F. Yang, A. Ramdas, K. Jamieson, M. Wainwright       NeurIPS, 2017   arxiv   code   30-min talk   proc   (spotlight talk)