- Statistics for High-Dimensional Data: Methods, Theory and Applications, by P. Buhlman and S. van de Geer, Springer, 2011.
- Statistical Learning with Sparsity: The Lasso and Generalizations, by T. Hastie, R. Tibshirani and M Wainwright, Chapman & Hall, 2015.
- Introduction to High-Dimensional Statistics, by C. Giraud, Chapman & Hall, 2015.
- Testing Statistical Hypotheses, by Lehmann and Romano, 2005, Spinger, 3rd Edition.
- Asymptotic Statistics, by A. van der Vaart, Springer, 2000.
- Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
- Rigollet, P. (2015) High-Dimensional Statistics - Lecture Notes Lecture Notes for the MIT course 18.S997.

Parameter consistency and central limit theorems for models with increasing dimension d (but still d < n):

- Wasserman, L, Kolar, M. and Rinaldo, A. (2014). Berry-Esseen bounds for estimating undirected graphs, Electronic Journal of Statistics, 8(1), 1188-1224.
- Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters, the Annals of Statistics, 32(3), 928-961.
- Portnoy, S. (1984). Asymptotic Behavior of M-Estimators of p Regression, Parameters when p^2/n is Large. I. Consistency, tha Annals of Statistics, 12(4), 1298--1309.
- Portnoy, S. (1985). Asymptotic Behavior of M Estimators of p Regression Parameters when p^2/n is Large; II. Normal Approximation, the Annals of Statistics, 13(4), 1403-1417.
- Portnoy, S. (1988). Asymptotic Behavior of Likelihood Methods for Exponential Families when the Number of Parameters Tends to Infinity, tha Annals of Statistics, 16(1), 356-366.

- Chernozhukov, V., Chetverikov, D. and Kato, K. (2016). Central Limit Theorems and Bootstrap in High Dimensions, arxiv
- Bentkus, V. (2003). On the dependence of the Berry–Esseen bound on dimension, Journal of Statistical Planning and Inference, 113, 385-402.
- Portnoy, S. (1986). On the central limit theorem in R p when $p \rightarrow \infty$, Probability Theory and Related Fields, 73(4), 571-583.

- Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
- Concentration Inequalities and Model Selection, by P. Massart, Springer Lecture Notes in Mathematics, vol 1605, 2007.
- The Concentration of Measure Phenomenon, by M. Ledoux, 2005, AMS.
- Concentration of Measure for the Analysis of Randomized Algorithms, by D.P. Dubhashi and A, Panconesi, Cambridge University Press, 2012.
- R. Vershynin, Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing: Theory and Applications, eds. Yonina Eldar and Gitta Kutyniok. Cambridge University Press

- Metric Characterization of Random Variables and Random Processes, by V. V. Buldygin, AMS, 2000.
- Introduction to the non-asymptotic analysis of random matrices, by R. Vershynin, Chapter 5 of: Compressed Sensing, Theory and Applications. Edited by Y. Eldar and G. Kutyniok. Cambridge University Press, 210–268, 2012. pdf

- Check out the Wikipedia page.
- A guided tour of chernoff bounds, by T. Hagerup and C. R\"{u}b, Information and Processing Letters, 33(6), 305--308, 1990.
- Chapter 4 of the book Probability and Computing: Randomized Algorithms and Probabilistic Analysis, by M. Mitzenmacher and E. Upfal, Cambridge University Press, 2005.
- The Probabilistic Method, 3rd Edition, by N. Alon and J. H. Spencer, Wiley, 2008, Appendix A.1.

- László Györfi, Michael Kohler, Adam Krzyżak, Harro Walk (2002). A Distribution-Free Theory of Nonparametric Regression, Springer.

- Example 2.12 in Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
- Lemma 1 in Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection, Annals of Statistics, 28(5), 1302-1338.

- Rudelson, M., and Vershynin, R. (2013). Hanson-Wright inequality and sub-gaussian concentration. Electron. Commun. Probab., 18(82), 1- 9.

- Tropp, J. (2012). User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Vol. 12, num. 4, pp. 389-434, 2012.
- Tropp, J. (2015). An Introduction to Matrix Concentration Inequalities, Found. Trends Mach. Learning, Vol. 8, num. 1-2, pp. 1-230

- Andreas Buja, Richard Berk, Lawrence Brown, Edward George, Emil Pitkin, Mikhail Traskin, Linda Zhao and Kai Zhang (2015). Models as Approximations — A Conspiracy of Random Regressors and Model Deviations Against Classical Inference in Regression. pdf

- Fu, W. and Knight, K. (2000). Asymptotics for lasso-type estimators, The Annals of Statistics, 8(5), 1356-1378.

- Tibshirani, R. (2013). The lasso problem and uniqueness, EJS, 7, 1456-1490.

- Homrighausen, D. and McDonald, D. (2013). The lasso, persistence, and cross-validation,” Proceedings of the 30th International Conference on Machine Learning, JMLR W&CP, 28. pdf
- Homrighausen, D. and McDonald, D. (2013b). Risk consistency of cross-validation with Lasso- type procedures. arxiv:1308.0810.
- Chatterjee, S. and Jafarov, J. (2015). Prediction error of cross-validated Lasso, arxiv:1502.06291
- Chetverikov, D. and Liao Z. (2016). On cross-validated Lasso, arxiv:1605.02214

- Statistics for High-Dimensional Data: Methods, Theory and Applications, by P. Buhlman and S. van de Geer, Springer, 2011. Chapter 6 and Chapter 7.
- Belloni A., Chernozhukov, D> and Hansen C. (2010) Inference for High-Dimensional Sparse Econometric Models, Advances in Economics and Econometrics, ES World Congress 2010, arxiv link
- Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009), Simultaneous analysis of Lasso and Dantzig selector, Annals of Statistics, 37(4), 1705–1732.

- Greenshtein and Ritov (2007). Persistence in high-dimensional linear predictor selection and the virtue of overparametrizationi, Bernoulli, 10(6), 971-988.
- For an alternative proof of persietence see: Jon Wellner, Persistence: Alternative proofs of some results of Greenshtein and Ritov. pdf

- Stewart and Sun (1990). Matrix Perturbation Theory, Academic Press. (Start with the CS decomposition and the move on to principal angles and then perturbation theory results).
- Parlett, B.N. (1998). The Symmetric Eigenvalue Problem, Society for Industrial and Applied Mathematics.

- Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension, Annals of Statistics, 41(1), 1780–1815.

- Lei, J. and Vu, V. (2015). Sparsistency and Agnostic Inference in Sparse PCA. Annals of Statistics, 43(1), 299-322.

- Devroye, L., Gyorfi, L. and Lugosi, G. (1997). A Probabilistic Theory of Pattern Recognition, Springer.
- Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems, Springer Lecture Notes in Mathematics, 2033.

- M. Anthony and J. Shawe-Taylor, "A result of Vapnik with applica- tions," Discrete Applied Mathematics, vol. 47, pp. 207-217, 1993.
- V. N. Vapnik and A. Ya. Chervonenkis, "On the uniform convergence of rel- ative frequencies of events to their probabilities," Theory of Probabil- ity and its Applications, vol. 16, pp. 264-280, 1971.

- Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems, Springer Lecture Notes in Mathematics, 2033.
- The Concentration of Measure Phenomenon, by M. Ledoux, 2005, AMS.

- Chapter 2.2. in Aad W. van der Vaart, Jon A. Wellner (1996). Weak Convergence and Empirical Processes
- Section 11.1 in Talagrand, M. and Ledoux, M. (1991). Probability in Banach Spaces, Springer.

- Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson (2005). Local Rademacher complexities, Annals of Statistics, 33(4) 1497-1537.
- Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems, Springer Lecture Notes in Mathematics, 2033.
- Koltchinskii, V. (2006). 2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization, Annals of Statistics, 34(6), 2593–2656.

- van der Geer, S. (2009). Empirical Processes in M-Estimation, Cambridge University Press.

- Mendelson, S. (2002). Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 1977–1991.

- Chapter 11 and especially Chapter 12 of van der Vaart, A. (1998). Asymptotic Statistics, Cambridge Series in Statistical and Probabilistic Mathematics.
- Chapter 5 of Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics, John Wiley and Sons.
- For an excellent and readable treatment, see: Lee, A.J. (1990). U-Statistics: Theory and Practice, CRC Press.
- This set of lecture notes by Thomas Ferguson.

- Mentch, L. and Hooker, G. (2015). Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests, available on the arxiv: 1404.6473.

- Hoeffding, W. (1963). Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, 58(301), 13-30.
- M. A. Arcones. A bernstein-type inequality for u-statistics and u-processes. Statistics & probability letters, 22(3):239–247, 1995.
- Peña, V.d.l. and Evarist, G. (1999). Decoupling From Dependence to Independence, Springer.