Optimization — Basics from Calculus

36-462/662, Spring 2022

1 February 2022 (Lecture 5)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\Risk}{r} \newcommand{\EmpRisk}{\hat{\Risk}} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \newcommand{\ModelClass}{S} \newcommand{\OptimalModel}{s^*} \newcommand{\Indicator}[1]{\mathbb{I}\left\{ #1 \right\}} \newcommand{\myexp}[1]{\exp{\left( #1 \right)}} \newcommand{\eqdist}{\stackrel{d}{=}} \newcommand{\OptDomain}{\Theta} \newcommand{\OptDim}{p} \newcommand{\optimand}{\theta} \newcommand{\altoptimand}{\optimand^{\prime}} \newcommand{\ObjFunc}{{M}} \newcommand{\optimum}{\optimand^*} \newcommand{\Hessian}{\mathbf{h}} \]

Previously

Optimization: some jargon

Local vs. global minima

Local vs. global minima

“The” minimum: value vs. location

Finding the optimum: calculus basics

The first order condition

The first order condition

The tangent line to \(\ObjFunc\) is flat at the minimum \(\optimum\)

The first order condition and boundary optima

The first order condition and boundary optima

The minimum on this domain is at the right-hand boundary, and the tangent line is not flat

The first order condition and boundary optima

The second order condition

A bit more insight into the second-order condition

Generic minima look, locally, like parabolas

\(\ObjFunc(\optimand)\) (solid) vs. \(\ObjFunc(\optimum) + \frac{1}{2}(\optimand - \optimum)^2 \frac{d^2 \ObjFunc}{d\optimand^2}(\optimum)\) (dashed) around the local minimum \(\optimum\)

Break for in-class exercise (15 min.)

Suppose \[ \ObjFunc(\optimand) = -q\log{\optimand} - (1-q)\log{(1-\optimand)} \] with \(0 < q< 1\), \(\OptDomain = [0,1]\)

  1. Write out the first-order condition for \(\optimum\) (but don’t solve for it yet)
  2. Solve for \(\optimum\) in terms of \(q\)
  3. Write out the second-order condition — how do we know that \(\optimum\) is really the optimum?
  4. Sketch the value of the optimum, \(\ObjFunc(\optimum)\), as \(q\) goes from \(0\) to \(1\)
    • Hint: \(0\log{0} = 0\)

What about more than one dimension?

No slope in any direction: the first-order condition

First-order condition or first-order conditions?

The function increases in every direction: the second-order condition

Positive-definite matrices

The first- and second- order conditions for minima

For \(\optimum\) to be a local minimum,

Near a minimum, nice functions look quadratic

Minimizing risk vs. minimizing empirical risk

Morals to remember, about minimizing smooth functions

Next time: actual algorithms

Backup: What if \(\nabla\nabla\ObjFunc \succeq 0\)?

Backup: Big-O notation

Backup: What do I mean when I say “weird, a-typical”?

References