EN.553.662: Optimization for Data Science


Previous offerings: Fall 2024 · Fall 2023

Spring 2026

Course Information

Instructor: Nicolas Loizou

Lecture times: Monday, Wednesday 12:00 pm – 01:15 pm (01/20 – 04/27)

Location: Homewood Campus, Bloomberg 274

Syllabus (PDF): Download

Teaching Assistants & Office Hours

Course Description

Optimization formulations and algorithms play a central role in data analysis and machine learning. In the era of big data, the need to solve large-scale optimization problems is ubiquitous across industry and science. This course is a mathematically rigorous introduction to large-scale optimization for data science and machine learning, emphasizing algorithms, convergence/complexity guarantees, and practical implementation. Applications include text analysis, page ranking, speech recognition, image classification, finance, and decision sciences.

Prerequisites / Corequisites
  • Linear algebra (vector spaces, quadratic forms, inner product, norms, etc.)
  • Multivariate calculus (gradient, Hessian, Taylor approximation, chain rule, etc.)
  • Probability theory (expectation, law of large numbers, tower property, etc.)
  • Numerical programming (MATLAB / Python / Julia or equivalent)

Some degree of mathematical maturity (ability to read/write proofs) is required. Background in optimization theory is highly recommended.

Course Materials
  • Detailed slides (posted before each lecture)
  • Primary resources:
  • Further useful references:
    • Convex Optimization: Algorithms and Complexity, Sébastien Bubeck
      Open access: PDF (author site).
    • First-Order Methods in Optimization, Amir Beck
      Publisher page (not open access): SIAM.
    • Understanding Machine Learning: From Theory to Algorithms, S Shalev-Schwartz, & S Ben-David
      Open access (authors): PDF.
    • Advances and Open Problems in Federated Learning (Ch. 1–3) Peter Kairouz et al.
      Open access: arXiv PDF.
Learning Outcomes
  • Analyze scalable nonlinear optimization algorithms used in ML and data science.
  • Execute high-performance computing (HPC) implementations of optimization algorithms in selected applications.
  • Select algorithms based on problem structure (convex/nonconvex, smooth/nonsmooth, data regime, memory/distributed constraints).
Evaluation
  • Assignments: 50%
  • Exams: 50%
Course Outline
Part I

Stochastic Optimization

  1. Foundations of Large-Scale Optimization
    Core setup + tools; deterministic GD as baseline.
    • Stochastic optimization for machine learning
    • Basic tools from convex analysis, optimization, and probability
    • Deterministic gradient descent
  2. Stochastic Gradient Descent & Variants
    Step-size policies + guarantees across regimes.
    • Assumptions
    • Constant step-size: strongly convex, convex, nonconvex guarantees
    • Decreasing step-size: strongly convex, convex, nonconvex guarantees
    • Mini-batches, sampling strategies (uniform vs. importance sampling)
  3. Variance Reduction Methods
    Faster convergence via control variates.
    • SVRG, L-SVRG, SAGA
    • Unified analysis of stochastic gradient methods
  4. Linear System Solvers
    Randomized solvers and links to SGD.
    • Sketch-and-project methods
    • Randomized Kaczmarz; randomized coordinate descent (Gauss–Seidel)
    • Stochastic reformulations and connections to SGD
  5. Acceleration with Momentum
    Heavy-ball and Nesterov; stochastic momentum variants.
    • Heavy-ball (Polyak) momentum
    • Nesterov acceleration
    • Momentum variants in stochastic settings
  6. Adaptive / Parameter-Free Methods
    Adaptive scaling and parameter-free step sizes.
    • AdaGrad, RMSProp, Adam
    • Stochastic Polyak step-size
Part II

Variational Inequalities & Min-Max Optimization

  1. Background
    Formulations, assumptions, and ML connections.
    • Formulation and connections to optimization
    • Variational inequalities, min-max optimization, smooth games, saddle points
    • Applications in ML; key assumptions
  2. Popular Algorithms
    GDA, extragradient, and game-optimization methods.
    • Gradient descent–ascent (and stochastic variants)
    • Extragradient methods (and stochastic variants)
    • Hamiltonian gradient methods and consensus optimization
Part III

Distributed / Decentralized / Federated Optimization

  1. Main Algorithms
    Why distribution matters; core methods and settings.
    • Motivation & applications
    • Distributed & decentralized methods
  2. Rules for Improved Communication Complexity
    Communication efficiency via compression and locality.
    • Compressed operators
    • Local updates
    • Network topology
Schedule

The links in the table below will be updated as the semester progresses.

Lectures Topic Materials
L1–L3 Foundations of Large-Scale Optimization Part I
Stochastic optimization for ML; convex/probability tools; deterministic gradient descent.
L4–L7 Stochastic Gradient Descent & Variants Part I
Assumptions; step-size policies; convex/strongly convex/nonconvex guarantees; mini-batching and sampling.
L8–L9 Variance Reduction Methods Part I
SVRG, L-SVRG, SAGA; unified analysis of stochastic gradient methods.
L10 Linear System Solvers Part I
Sketch-and-project methods; stochastic reformulations and connections to SGD.
L11–L12 Randomized Kaczmarz & Coordinate Descent Part I
Randomized Kaczmarz; randomized coordinate descent (Gauss–Seidel) for linear systems.
L13–L14 Acceleration with Momentum Part I
Heavy-ball (Polyak) momentum; Nesterov acceleration; stochastic momentum variants.
L15–L16 Adaptive / Parameter-Free Methods Part I
AdaGrad, RMSProp, Adam; stochastic Polyak step-size.
L17–L19 Variational Inequalities & Min-Max Optimization — Background Part II
Formulation and connections to optimization; smooth games and saddle points; ML applications and assumptions.
L20–L21 Min-Max Algorithms Part II
Gradient descent–ascent; extragradient methods; Hamiltonian gradient and consensus optimization (and stochastic variants).
L22–L23 Distributed / Decentralized / Federated Optimization Part III
Core algorithms; communication-efficient rules (compression, local updates, topology).