IE 598 BIG DATA OPTIMIZATION (Fall 2016)
§ 12-16-2016: Project report is due at midnight.
§ 12-06-2016: Project Presentations III
Meghana Bande, Incremental Gradient Methods - SAGA
Harsh Gupta and Sai Kiran Burle, Fast Streaming Regression
§ 12-01-2016: Project Presentations II
Juan Xu and Kaiqing Zhang, Communication-Efficient Algorithms for Distributed Optimization
Shripad Gade, Matrix Completion
Peter Liu, Second-Order Methods for Large-Scale Problems
Joseph Lubars, Approximate Graph Matching with Convex Optimization
§ 11-29-2016: Project Presentations I
Samantha Thrust and Hongyi Li, Alternative Views of Accelerated Gradient Descent
Lucas Buccafusca, AlphaGo: The Program Ahead of Its Time
Heran Zhang, Optimization in Deep Reinforcement Learning
§ 11-22-2016 and 11-24-2016: Thanksgiving Break!
(SAG, SAGA, SVRG, etc)
(Stochastic Mirror Descent)
(Sample Average Approximation, Stochastic Gradient Descent)
§ 11-01-2016: Template for final report: download here
(Douglas Rachford splitting, Peaceman Rachford splitting, ADMM, etc.)
§ 10-25-2016: Lecture18-Mirror Prox for Saddle Point Problems
§ 10-18-2016: Lecture16-Smoothing Techniques I
§ 10-18-2016: 2-round scribing + presentation: Sign up for presentation and scribing here
§ 10-13-2016: Lecture15-From Subgradient Descent to Mirror Descent
§ 09-29-2016: Class cancelled due to Allerton
§ 09-22-2016: Lecture10-Projected Gradient Descent
§ 09-20-2016: Lecture9-Gradient Descent and Its Acceleration
§ 09-15-2016: Lecture8-Gradient Descent
§ 09-13-2016: Lecture7-Introduction to Optimization Algorithms
§ 09-08-2016: Lecture6-Conic Programming
§ 09-06-2016: Lecture5-Convex Optimization
§ 08-29-2016 and 09-01-2016: Lecture3&4-Convex Functions
§ 08-25-2016: Lecture2-Convex Sets
§ 08-23-2016: Lecture1-Introduction
Office: Transportation Building 211
Office Hour: Tuesday, 10-11AM or by appointment via email
Nearly every problem in machine learning and high-dimensional statistics can be formulated in terms of optimization of some function, possibly under some constraints. In the wake of Big Data era, there is an increasingly strong need to design efficient and scalable algorithms to address optimization problems in large scale - including problems exhibiting inherently high dimensions and problems involving truly massive data sets. The key aim of this course is to expose students to a wide range of modern algorithmic developments in optimization (mainly in convex optimization) and bring them near the frontier of research in large-scale optimization and machine learning.
Courtesy warning: The course will be theory-oriented and emphasize deep understanding of structures of optimization problems and computation complexity of numerical algorithms. The course will not cover any software or tools used for big data analytics and is not an application course.
Students are expected to have strong knowledge of linear algebra, real analysis, and probability theory. Some prior exposure to optimization at a graduate level is preferred.
No book is required, but you are highly recommended to read the following:
Ben-Tal & Nemirovski. Lectures on Modern Convex Optimization, SIAM. 2011
Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer-Academic. 2003
Boyd & Vandenberghe. Convex Optimization. Cambridge University Press. 2003
Sra, Nowozin, Wright (eds). Optimization for Machine Learning. MIT Press. 2011
Bubeck. Convex Optimization: Algorithms and Complexity. In Foundations and Trends in Machine Learning. 2015
There will be no homework or exams. Grades will be based on
Scribing (25%): each student will be assigned to scribe lectures 1~2 times.
Project (50% report +25% presentation): This could be writing a survey on a certain topic based on several papers, conducting a novel large-scale experiment, or thinking about a concrete open theoretical question, applying optimization techniques to your own field, formalizing an interesting new topic, or trying to relate several problems. The end result should be a 10-15 page report, and a 25-30 minute presentation. I will provide a few ideas for possible topics for projects.
Instruction on Scribe Notes
Remember that you are not scribing the notes for me, but for the entire class! Try to be responsible and rigorous. Take careful notes during class and be a good scriber.
Here is some rubric:
Minimal (5/10): your source file should be compilable; your notes must contain no serious typos or mistakes and it should reflect everything we discussed in class and be written in complete sentences; you should fill in the skipped proofs if expected.
Acceptable (8/10): your notes should contain enough explanation (between sections, after main theorems) so that even a student who might have missed the class can understand the material reasonably well; the proofs should be succinct, clear, and correct; your notations and numbering should be consistent.
Exemplary (10/10): your note should be a wholistic piece of work and ideally it should be instructive. I encourage you to read the reference books or papers related to form a comprehensive understanding of the topic and you may add your thoughts, questions, or relevant material in the notes.
Please email me the latex and pdf of your notes as well as any figures included, within 3 days after the class for which you were the designated scriber. If your note falls in the unacceptable region, you will be asked to make revisions.
Module I: Introduction and Fundamentals (3 weeks)
Learning under the lenses of optimization
Basics of convex analysis
Conic programming: LP, SOCP, SDP
Structured optimization: convexity, smoothness, separability, etc.
Module II: Smooth Convex Optimization (4 weeks)
Gradient descent method
Accelerated gradient descent methods (Nesterovs method and its variants)
Projection-free methods (Frank-Wolfe)
Coordinate descent methods
Applications in logistic regression, matrix completion, image processing, etc.
Module III: Nonsmooth Convex Optimization (4 weeks)
Subgradient descent/Mirror descent
Applications in support vector machines, sparsity learning, etc.
Module IV: Stochastic and Online Optimization (3 weeks)
Sample average approximation
Stochastic approximation (stochastic gradient descent)
Incremental gradient methods
Applications in empirical risk minimization, online learning, etc.
Module V: Optimization Perspectives for Selective Topics in Machine Learning (2 weeks)
Large-scale kernel machines
You must work individually or in pairs (<=2 students).
This could be writing a survey on a certain topic based on several papers, conducting a novel large-scale experiment, or thinking about a concrete open theoretical question, applying optimization techniques to your own field, formalizing an interesting new topic, or trying to relate several problems.
Your project can overlap with your research agenda or class project for other classes, but you must emphasize the role of optimization.
The end result should be a 10-15 page written report in Latex, and a 25-30 minute presentation.
Proposal submission: Oct 15th, 2016
You should submit a 1 page outline of project topic, names of team members, and a brief description of background information, how you plan to conduct the problem and references.
Final presentation (25%): Nov 29th, Dec 1st, Dec 6th, 2016
Final report submission (50%): Dec 16th, 2016, Midnight
Below lists some broad areas and a few specific ideas:
(e.g., study 3-5 papers on same topic)
Large-scale optimization algorithm and theory
(e.g., consider a new setting, propose a modified algorithm and analyze convergence)
Optimization modelling and applications
(e.g., formulate an optimization model from a practical problem in any field, apply algorithms discussed in class to solve it)
Numerical study and implementation
(e.g., implement and compare different algorithms and study their practical behaviors)
Interpretation of Nesterovs accelerated algorithms
Boyd, and E. Candes. A Differential Equation for Modeling Nesterovs Accelerated Gradient Method: Theory and Insights. NIPS 2014.
L. Lessard, B. Recht, and A. Packward. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints. SIAM Journal on Optimization 2016.
Wibisono, A. Wilson, and M. Jordan. A Variational Perspective on Accelerated Methods in Optimization. ArXiv 2016.
Z. Allen-Zhu and L. Orrchia. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent. ArXiv 2014,
S. Bubeck, Y W. Su, S.. Lee and M. Singh. A geometric alternative to Nesterovs accelerated gradient descent. ArXiv 2015.
G. Lan and Y. Zhou. An optimal randomized incremental gradient method. Arxiv 2015.
N. Flammarion and F. Bach. From Averaging to Acceleration, There is Only a Step-size. Arxiv 2015.
W. Krichene, A. Bayen, and P. Bartlett. Accelerated mirror descent in continuous and discrete time. NIPS 2015.
Beyond convexity: non-convex optimization
S. Ghadimi and G. Lan. Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming. Mathematical Programming 2016.
E. Hazan, K. Levy, and S. Shalev-Shwartz. Beyond Convexity: Stochastic Quasi-Convex Optimization. Arxiv 2015.
J. Lee, M. Simchowitz, M. Jordan, B. Recht. Gradient Descent Converges to Minimizers. COLT 2016.
J. Sun, Q. Qu, and J. Wright. When Are Nonconvex Problems Not Scary? Arxiv 2015.
R. Ge, F. Huang, C. Jin, and Y. Yuan. Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition. Arxiv 2015.
H. Karimi, J. Nutini, and M. Schmidt. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition. European Conference on Machine Learning and Knowledge Discovery in Databases, 2016.
J. Liang, J. Fadili, G. Peyrι. A Multi-step Inertial Forward--Backward Splitting Method for Non-convex Optimization. NIPS 2016.
Advances in incremental and stochastic algorithms
R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. NIPS 2013.
A Defazio, F Bach and S Lacoste-Julien. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives. NIPS 2014.
G. Lan and Y. Zhou. An optimal randomized incremental gradient method. Arxiv 2015.
S. Reddi, S. Sra, B. Poczos, and A. Smola. Fast Stochastic Methods for Nonsmooth Nonconvex Optimization. NIPS 2016.
H. Lin, J. Mairal, and Z. Harchaoui. A universal catalyst for first-order optimization. NIPS 2016.
Beyond first-order: second-order methods
G. Grapiglia and Y. Nesterov. Globally Convergent Second-order Schemes for Minimizing Twice-differentiable Functions. CORE Discussion 2016.
Y. Nesterov, B. Polyak. Cubic regularization of Newton method and its global performance. Mathematical Programming, 2006.
R. Byrd, S. Hansen, J. Nocedal, and Y. Singer. A Stochastic Quasi-Newton Method for Large-Scale Optimization. SIAM Journal on Optimization 2016.
A, Rodomanov, A. Com, and D. Kropotov. A Superlinearly-Convergent Proximal Newton-Type Method for the Optimization of Finite Sums. ICML 2016.
N. Agarwal, B. Bullins, and E. Hazan. Second Order Stochastic Optimization in Linear Time. arXiv 2016.
R. Kolte, M. Erdogdu, and A. Oz. Accelerating SVRG via second-order information. OPT 2015
Beyond first-order: zero-order methods
S. Ghadimi and G. Lan. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming. SIAM Journal on Optimization, 2013.
J. Duchi, M. Jordan, M. Wainwright, and A. Wibisono. Optimal rates for zero-order convex optimization: the power of two function evaluations. IEEE on Information Theory, 2015.
A. Belloni, T. Liang, H. Narayanan, and A. Rakhlin. Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions. JMLR 2015.
Distributed and parallel optimization algorithms
B. Recht, C. Re, S. Wright, and F. Niu. Hogwild. A lock-free approach to parallelizing stochastic gradient descent. NIPS 2011.
J. Duchi, M. Wainwright, and Y. Zhang. Communication-Efficient Algorithms for Statistical Optimization. JMLR 2013.
Shamir, N. Srebro, T. Zhang. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method. ICML 2014
Optimization in Machine Learning
L. Bottou, F. Curtis, and J. Nocedal. Optimization Methods for Large-Scale Machine Learning. arXiv 2016.
P. Zhao, T. Zhang. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization. ICML 2015.
C. Wang, Y. Wang, E. Weinan, and R. Schapire. Functional Frank-Wolfe Boosting for General Loss Function. arXiv 2015.
F. Bach, G. Lanckriet, and M. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. ACM 2015.
Optimization in Statistical Inference
X. Nguyen, M. Wainwright, and M. Jordan. Estimating divergence functionals and the likelihood ratio by convex risk minimization. NIPS 2008.
A. Juditsky, A. Nemirovski. Nonparametric estimation by convex programming. The Annals of Statistics 2009.
A. Goldenshluger, A. Juditsky, and A. Nemirovski. Hypothesis testing by convex optimization. Electronic Journal of Statistics 2015.
M. Welling, D. Bren, and Y. Teh. Bayesian Learning via Stochastic Gradient Langevin Dynamics. ICML 2011.
M. Hoffman, D. Blei, C. Wang, J. Paisley, J. Edu. Stochastic Variational Inference. JMLR 2013.
Optimization in Signal/ Image Processing
A. Chambolle, T. Pock. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. JMIV 2011.
E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust Principal Component Analysis?. ACM 2011.
Y. Cherapanamjeri, K. Gupta, and P. Jain. Nearly-optimal Robust Matrix Completion. arXiv 2016.
A. Juditsky, F. K. Karzan, and A. Nemirovski. Randomized first order algorithms with applications to L1-minimization. Math. Program., Ser. A 2012.
Optimization in Markov Decision Processes
Y. Chow, M. Ghavamzadeh. Algorithms for CVaR Optimization in MDPs. NIPS 2014.
R. Sutton, H. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvαri, and E. Wiewiora. Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation. ICML 2009.
T. Jaakkola, M. Jordan, and P. S. Satinder. Convergence of Stochastic Iterative Dynamic Programming Algorithms, Neural Computation 1994.