36-462/662, Spring 2022
14 February 2022 (Lecture 9)
\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Risk}{r} \newcommand{\EmpRisk}{\hat{r}} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\ModelClass}{S} \newcommand{\OptimalModel}{s^*} \DeclareMathOperator{\tr}{tr} \newcommand{\optimand}{\theta} \newcommand{\optimum}{\optimand^*} \newcommand{\OptimalParameter}{\optimand^*} \newcommand{\ERM}{\hat{\optimand}} \newcommand{\ObjFunc}{{M}} \newcommand{\Hessian}{\mathbf{k}} \]
\[ \ERM \approx \optimum - \Hessian^{-1} \nabla \EmpRisk(\optimum) \]
The difference between the two curves is the risk deviation \(\gamma(\theta)\):
\(\gamma(\theta)\) is really a random function (a stochastic process), and this is one draw from its distribution (one realization of the process)
\[ \Risk(\ERM) \approx \EmpRisk(\ERM) + n^{-1}\tr{\left(\mathbf{j}\Hessian^{-1}\right)} \]
\[ \Risk(\ERM) \approx \EmpRisk(\ERM) + n^{-1}\tr{\mathbf{j}\Hessian^{-1}} \]