# 10-725 Convex Optimization

Nearly every problem in machine learning and computational statistics can be formulated in terms of the optimization of some function, possibly under some set of constraints. As we obviously cannot solve every problem in machine learning, this means that we cannot generically solve every optimization problem (at least not efficiently). Fortunately, many problems of interest in machine learning can be posed as optimization tasks that have special properties—such as convexity, smoothness, sparsity, separability, etc.—permitting standardized, efficient solution techniques.

This course is designed to give a graduate-level student a thorough grounding in these properties and their role in optimization, and a broad comprehension of algorithms tailored to exploit such properties. The focus will be on convex optimization problems (though we also may touch upon nonconvex optimization problems at some points). We will visit and revisit important applications in machine learning and statistics. Upon completing the course, students should be able to approach an optimization problem (often derived from a machine learning or statistics context) and:

- identify key properties such as convexity, smoothness, sparsity, etc., and/or possibly reformulate the problem so that it possesses such desirable properties;
- select an algorithm for this optimization problem, with an understanding of the advantages and disadvantages of applying one method over another, given the problem and properties at hand;
- implement this algorithm or use existing software to efficiently compute the solution.

### Instructors

- Sivaraman Balakrishnan
- Yuanzhi Li

### Education Associate

### TAs

### Course Syllabus

The syllabus provides information on grading, class policies etc.

### Lecture Notes

Fundamentals of Convex Optimization

- Lecture 1: (1/17) Introduction, Convex Sets
- Lecture 2: (1/19) Convex Functions, Optimization Basics

First-Order Methods

- Lecture 3: (1/24) Gradient Descent
- Lecture 4: (1/26) More Gradient Descent and Subgradients
- Lecture 5: (1/31) The Subgradient Method and Oracle Lower Bounds
- Lecture 6: (2/2) Projected Gradient Descent and the Proximal Method
- Lecture 7: (2/7) More Proximal Method
- Lecture 8: (2/9) Stochastic Gradient Descent
- Lecture 9: (2/14) Mirror Descent

Duality

- Lecture 10: (2/16) LPs and Lagrangian Duality
- Lecture 11: (2/21) More Lagrangian Duality and KKT
- Lecture 12: (2/23) Fenchel Conjugates and Fenchel Duals

Advanced Topics

- Lecture 13: (2/28) (Nearly)-Convex Optimization
- Lecture 14: (3/2) Little Test I

Advanced Optimization Techniques

- Lecture 15: (3/14) Sums of Squares
- Lecture 16: (3/16) Second-order optimization: Newton’s method and Preconditioned Gradient Descent
- Lecture 17: (3/21) Adaptive optimization algorithms
- Lecture 18: (3/23) Interior Point Methods
- Lecture 19: (3/28) Optimization over the cloud: Asynchronous Optimization

Online optimization

- Lecture 20: (3/30) Online learning and (Online) Mirror Descent
- Lecture 21: (4/4) Reinforcement learning and variance reduction

Introduction to Non-convex optimization

- Lecture 22: (4/6) Limitations of convex optimization
- Lecture 23: (4/11) Basics of non-convex optimization and (noisy) gradient descent algorithm
- Lecture 24: (4/18) Nonconvex optimization theories, and a case study of optimizing a transformer block
- Lecture 25: (4/20) Optimization and sampling I
- Lecture 26: (4/25) Optimization and sampling II
- Lecture 26: (4/27) Little Test II