Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions for complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce reliable measures of uncertainty. In this paper, we present a statistical framework for LFI that unifies classical statistics with modern machine learning to: (1) construct frequentist confidence sets and hypothesis tests with finite-sample guarantees of nominal coverage (type I error control) and power, and (2) provide rigorous diagnostics for assessing empirical coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that estimates a test statistic, such as the likelihood ratio, can be plugged into our framework to create powerful tests and confidence sets with correct coverage. In this work, we specifically study two test statistics (ACORE and BFF), which, respectively, maximize versus integrate an odds function over the parameter space. Our theoretical and empirical results offer multifaceted perspectives on error sources and challenges in likelihood-free frequentist inference.