Regularizing Optimization with Penalties and Constraints

36-462/6652 Spring 2022

8 February 2022 (Lecture 7)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Risk}{r} \newcommand{\EmpRisk}{\hat{r}} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\ModelClass}{S} \newcommand{\OptimalModel}{s^*} \newcommand{\Indicator}[1]{\mathbb{1}\left\{ #1 \right\}} \newcommand{\myexp}[1]{\exp{\left( #1 \right)}} \newcommand{\eqdist}{\stackrel{d}{=}} \newcommand{\OptDomain}{\Theta} \newcommand{\OptDim}{p} \newcommand{\optimand}{\theta} \newcommand{\altoptimand}{\optimand^{\prime}} \newcommand{\ObjFunc}{M} \newcommand{\outputoptimand}{\optimand_{\mathrm{out}}} \newcommand{\optimum}{\optimand^*} \newcommand{\Hessian}{\mathbf{h}} \newcommand{\Penalty}{\Omega} \newcommand{\Lagrangian}{\mathcal{L}} \]

Previously

Think about ordinary least squares

Think about ordinary least squares (2)

\[ \hat{\beta} = (\mathbf{x}^T\mathbf{x})^{-1} \mathbf{x}^T \mathbf{y} \]

Thinking about ordinary least squares (3)

Thinking about ordinary least squares (4)

Thinking about ordinary least squares (5)

Penalties

Penalties

Some pictures

Some pictures (2)

Some pictures (3)

Some pictures (4)

Some pictures (4)

What does the penalty do?

What specifically does the \(L_2\) penalty do?

What about \(L_1\)?

What about \(L_1\)? (2)

What about \(L_1\)? (3)

\(\lambda=1/4\)

What about \(L_1\)? (4)

\(\lambda=4\)

What about \(L_1\) and \(L_2\)?

Penalties \(\Leftrightarrow\) Constraints

Constrained optimization in general

  1. Use the constraint equation \(\Penalty(\optimand) = c\) to eliminate a degree of freedom
    • i.e., write one coordinate in \(\optimand\) as a function of the others and of \(c\)
    • Do unconstrained optimization over the remaining degrees of freedom
    • What about the \(\leq\) case?!?
  2. Add a new variable and do unconstrained optimization over a larger problem

Lagrange multipliers

Lagrange multipliers (2)

Lagrange multipliers are prices

Lagrange multipliers vs. penalties

Lagrange multipliers turns constrained optimization into penalized optimization

Many constraints

Inequality constraints

Summing up on constraints and Lagrange multipliers

Mathematical programming

Mathematical programming (2)

What do constraints/penalties do to learning and risk?

Summing up

Backup: More about why \(L_1\) promotes sparsity but \(L_2\) doesn’t

Backup: \(L_q\) penalties

Backup: Intercepts, standardized variables

Backup: Inverting \(\mathbf{x}^T\mathbf{x}\) and eigenvalues

Backup: Interior point methods for convex programming

\[\begin{eqnarray*} \optimum & = & \argmin_{\optimand \in \OptDomain}{\ObjFunc(\optimand)}\\ & \text{subject to} &\\ \Penalty(\optimand) & \leq & c \end{eqnarray*}\]

Backup: “Comrades, let’s optimize!”

References

Gneezy, Uri, and Aldo Rustichini. 2000. “A Fine Is a Price.” Journal of Legal Studies 29:1–17. https://doi.org/10.1086/468061.

Kantorovich, L. V. 1965. The Best Use of Economic Resources. Cambrdige, Massachusetts: Harvard University Press.

Robert Dorfman, Paul A. Samuelson, and Robert M. Solow. 1958. Linear Programming and Economic Analysis. New York: McGraw-Hill.

Spufford, Francis. 2010. Red Plenty. London: Faber; Faber.