All Hands Meetings on Big Data Optimization - Semester 1, 2018-2019

Venue: Al Khwarizmi Building, KAUST, ROOM: 2107 (2nd floor)
Time: Sundays 12:00 - 13:30 (lunch provided)

Date Speaker Paper
December 16, 2018

December 9, 2018 No meeting (exams)
December 2, 2018 No meeting (NIPS)
November 25, 2018

November 18, 2018

November 11, 2018

November 4, 2018

October 28, 2018

October 21, 2018

October 14, 2018

October 7, 2018

September 30, 2018 Dmitry Kovalev
Update on ongoing research work
September 23, 2018 No meeting (Saudi National Day)
September 16, 2018

September 9, 2018
Alibek Sailanbayev Optimization of composition of functions
September 6, 2018
Samuel Horváth Stochastic nested variance reduction for nonconvex optimization (Zhou, Xu, Gu - 6/2018)
August 30, 2018
Sarah Sachs
Generalizations of Jacobian sketching (summary of research work done during 6 months of internship at KAUST)

Organizers: Filip Hanzely, Aritra Dutta and Peter Richtárik


All Hands Meetings on Big Data Optimization - Semester 2, 2017-2018

Venue: Al Khwarizmi Building, KAUST, ROOM: 2107 (2nd floor)
Time: Tuesdays 12:00 - 13:30 (lunch provided)

Date Speaker Paper
May 27, 2018
El Houcine Bergou A line search algorithm inspired by the adaptive cubic regularization framework and complexity analysis (Bergou, Diouane, Gratton - 5/2018)
May 6, 2018
Matthias Mueller
Optimization for deep learning
April 29, 2018 Samuel Horváth Second order stochastic optimization for machine learning in linear time (Agarwal, Bullins, Hazan - JMLR 2017)
April 15, 2018 Filip Hanzely On the convergence of Adam and beyond (Reddi, Kale, Kumar - ICLR 2018)
April 8, 2018 Konstantin Mishchenko A simple practical accelerated method for finite sums (Defazio - NIPS 2016)
March 25, 2018
Adel Bibi
Analytic expressions for probabilistic moments of PL-DNN with Gaussian input
March 18, 2018
Konstantin  Mishchenko Penalty formulation for constrained optimization
March 11, 2018 Alibek Sailanbayev SignSGD: Compressed optimization for non-convex problems (Bernstein, Wang, Azizzadenesheli, Anandkumar - ICML 2018)
March 4, 2018
Samuel Horváth
Fast incremental method for nonconvex optimization (Reddi, Sra, Poczos, Smola - 3/2016)
February 27, 2018
El Houcine Bergou
Random direct search method for unconstrained minimization
February 20, 2018
Filip Hanzely The implicit bias of gradient descent on separable data (Soudry, Hoffer, Nacson, Gunasekar, Srebro - 10/2017)
February 13, 2018
Nicolas Loizou Random inexact projection methods

Organizers: Filip Hanzely, Aritra Dutta and Peter Richtárik


All Hands Meetings on Big Data Optimization - Semester 1, 2017-2018

Venue: Al Khwarizmi Building, KAUST, ROOM: 2107 (2nd floor)
Time: Tuesdays 12:00 - 13:30 (lunch provided)

Date Speaker Paper
December 5, 2017
Konstantin Mishchenko SARAH: A novel method for machine learning problems using stochastic recursive gradient (Nguyen, Liu, Scheinberg, Takac - ICML 2017)
November 28, 2017
Filip Hanzely
Relative continuity for non-Lipschitz non-smooth convex optimization using stochastic (or deterministic) mirror descent (Lu - 10/2017)
November 21, 2017
Robert Gower
SAGA is a variant of stochastic gradient: new view and new proof
November 14, 2017
Nicolas Loizou
First-order adaptive sample size methods to reduce complexity of empirical risk minimization (Mokhtari, Ribeiro - 9/2017)
November 7, 2017
Konstantin Mishchenko
Proximal-proximal-gradient method (Ryu, Yin - 8/2017)
October 31, 2017 Nikita Doikov
Regularized Newton methods for minimizing functions with Hölder continuous Hessians (Grapiglia, Nesterov - SIOPT 2017) Cubic regularization of Newton method and its global performance (Nesterov, Polyak - MAPR 2006)
October 24, 2017 Viktor Lukáček
Dykstra's algorithm with Bregman projections: a convergence proof (Bauschke, Lewis - Optimization 1998)
October 17, 2017 Sebastian Stich
Approximate steepest coordinate descent (Stich, Raj, Jaggi - ICML 2017)
October 10, 2017
Alibek Sailanbayev
Breaking locality accelerates block Gauss-Seidel (Tu, Venkataraman, Wilson, Gittens, Jordan, Recht - ICML 2017)
October 3, 2017
Konstantin  Mishchenko An asynchronous distributed prox-grad algorithm (Mishchenko, Iutzeler, Malick - 2017)
September 26, 2017 Konstantin  Mishchenko
An asynchronous distributed prox-grad algorithm (Mishchenko, Iutzeler, Malick - 2017)
September 19, 2017
Aritra Dutta
Self-occlusion and disocclusion in causal video object segmentation (ICCV 2015)
September 12, 2017
Filip Hanzely Randomized methods for relative smooth optimization (Hanzely, Richtarik - 2017)
August 29, 2017
Filip Hanzely
Relatively-smooth convex optimization by first-order methods, and applications (Lu, Freund and Nesterov - 10/2016)
August 22, 2017
Aritra Dutta
A Batch-Incremental Video Background Estimation Model using Weighted Low-Rank Approximation of Matrices (Dutta, Li and Richtárik - 7/2017)

Organizers: Filip Hanzely, Aritra Dutta and Peter Richtárik


All Hands Meetings on Big Data Optimization - Semester 2, 2016-2017

Venue: James Clerk Maxwell Building ROOM: JCMB 5323 (5th floor)
Time: Tuesdays 12:15 - 13:30 (lunch provided)

We thankfully acknowledge support from the Head of School of Mathematics and the Center for Doctoral Training in Data Science

Date Speaker Paper
March 14, 2017
Filip Hanzely Finding Approximate Local Minima for Nonconvex Optimization in Linear Time (Agarwal, Allen-Zhu, Bullins, Hazan, and Ma - 11/2016)
March 7, 2017
Marcelo Pereyra
Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau (Durmus, Moulines and Pereyra - 12/2016)
February 28, 2017
Jakub Konečný
QSGD: Randomized quantization for communication-optimal stochastic gradient descent (Alistarh, Li, Tomioka and Vojnovic - 10/2016)
February 21, 2017
Nicolas Loizou
Global convergence of the Heavy-ball method for convex optimization (Ghadimi, Feyzmahdavian and Johansson - 12/2014)
February 14, 2017 Kostas Zygalakis
A differential equation for modeling Nesterov's accelerated gradient method: theory and insights (Su, Boyd and Candes - NIPS 2014)
February 7, 2017 László A. Végh (LSE)
Rescaled first-order methods for linear programming (Dadush, Végh and Zambelli 11/2016)
January 31, 2017 Filip Hanzely Relatively smooth convex optimization by first-order methods, and applications (Lu, Freund and Nesterov - 10/2016)
January 24, 2017 Ion Necoara (Bucharest)
Linear convergence of first order methods for non-strongly convex optimization (Necoara, Nesterov and Glineur - 4/2015)
January 17, 2017 Armin Eftekhari (The Alan Turing Institute)
The alternating descent conditional gradient method for sparse inverse problems (Boyd, Schiebinger and Recht - 7/2015)

Organizers: Nicolas Loizou and Peter Richtárik


All Hands Meetings on Big Data Optimization - Semester 1, 2016-2017

Venue: James Clerk Maxwell Building ROOM: JCMB 6207 (6th floor)
Time: Tuesdays 12:15 - 13:30 (lunch provided: thanks to the support of the Head of School)

Date Speaker Paper
December 13, 2016
Panos Parpas
Using variational techniques to understand accelerated methods (Wibisono, Wilson and Jordan - 3/2016)
December 6, 2016
No meeting (NIPS)
November 29, 2016
Iain Murray
Fitting real-valued conditional distributions.

Abstract: Neural networks can be used for regression. Given an input x, guess the output y. The standard optimization task is to minimize some regularized
measure of mismatch between guesses and observed training outputs.

Neural networks can also express their own uncertainty. For example, we
can fit two functions, a guess m(x) and an "error-bar" s(x), by maximizing the total log probability of training outputs under a Gaussian model: \sum_n log N(y_n; m(x_n), s(x_n)^2).

Fitting functions representing Gaussian outputs by stochastic steepest descent can be hard: the gradients of the loss with respect to the mean depend strongly on the standard deviation, making it hard to adapt step-sizes.

Moving beyond the Gaussian assumption, we might represent p(y|x) with a mixture of Gaussians, or with quantiles. For multivariate y we can use multivariate Gaussians or RNADE. Gaussians are also fitted in stochastic variational inference, sometimes with diagonal covariances, sometimes low-rank + diagonal.

We are able to optimize all these things to some extent, but it's harder than conventional neural networks, which hinders wide-spread adoption of the methods.

Relevant papers Mixture Density Networks (MDNs), Multivariate MDN, RNADE, Bayesian MDN, matrix manifold optimization for Gaussian mixtures
November 22, 2016
Lukasz Szpruch An analytical framework for a consensus-based global optimization method (Carrillo, Choi, Totzek and Tse - 1/2016)
November 15, 2016
Dominik Csiba Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure (Bietti and Mairal - 10/2016)
November 8, 2016
Aretha Teckentrup
Large-scale Gaussian process regression via doubly stochastic gradient descent (Yan, Xie, Song and Boots - 2015)
November 1, 2016
Filip Hanzely
Variance reduction for faster non-convex optimization (Allen-Zhu and Hazan - 3/2016)
October 25, 2016 Dominik Csiba Linear coupling: an ultimate unification of gradient and mirror descent (Allen-Zhu and Orecchia - 1/2015)
October 18, 2016 Jakub Konečný Train faster, generalize better: Stability of stochastic gradient descent (Hardt, Rech and Singer - 7/2016)
October 11, 2016 Nicolas Loizou Convergence rates for greedy Kaczmarz algorithms, and faster randomized Kaczmarz rules using the orthogonality graph (Nutini, Sepehry, Laradji, Schmidt, Koepke, Virani - UAI 2016) supplementary material poster
October 4, 2016 Jakub Konečný Differentially private empirical risk minimization (Chaudhuri, Monteleoni, Sarwate - JMLR 2011)
September 27, 2016 Dominik Csiba Online ad allocation via online optimization (Jenatton, Huang, Csiba and Archambeau - 6/2016)

Organizers: Dominik Csiba and Peter Richtárik


All Hands Meetings on Big Data Optimization - Semester 2, 2015-2016

Venue: James Clerk Maxwell Building ROOM: JCMB 4312 (4th floor)
Time: Tuesdays 12:15 - 13:30 (lunch provided: thanks to the support of the Head of School)

Date Speaker Paper
May 3, 2016
JC Pesquet (Paris)
A stochastic majorize-minimize subspace algorithm with application to filter identification (Chouzenoux and Pesquet - 12/2015)
April 26, 2016
Robert M Gower Open-ended research discussion on the topic: "Newton-type methods for solving the empirical risk minimization problem"
April 19, 2016
Haihao Lu (MIT)
Norm-free methods
April 12, 2016
Sebastian Stich (CORE)
A simple, combinatorial algorithm for solving SDD systems in nearly-linear time (Kelner, Orecchia, Sidford, Allen-Zhu - 1/2013)
April 5, 2016
No meeting (Easter)
March 29, 2016
No meeting (Easter)
March 22, 2016
Nicolas Loizou Second order stochastic optimization in linear time (Agarwal, Bullins and Hazan - 2/2016)
March 15, 2016
Robert M Gower Sub-sampled Newton methods I: globally convergent algorithms (Roosta-Khorasani and Mahoney - 1/2016)
March 8, 2016
No meeting (I am in Oberwolfach...)
March 1, 2016 Dominik Csiba Local smoothness in variance-reduced optimization (Vainsencher, Liu and Zhang - NIPS 2015 )
February 23, 2016 Jaroslav Fowkes
Submodular function maximization (based on a survey of Krause and Golovin 2012)
February 16, 2016 Jakub Konečný Taming the wild: a unified analysis of Hogwild!-style algorithms (De Sa, Zhang, Olukotun, Re - NIPS 2015)
February 9 2016 No meeting (Dominik, Jakub, Robert and I will be in Les Houches)
February 2, 2016 Nicolas Loizou Randomized gossip algorithms (Boyd, Ghosh, Prabhakar and Shah - IEEE Transactions on Information Theory 2006 and Dimakis, Kar, Moura, Rabbat and Scaglione - Proceedings of the IEEE)
January 26, 2016 Jakub Konečný On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants (Reddi, Hefny, Sra, Poczos and Smola - NIPS 2015)

Organizers: Jakub Konečný and Peter Richtárik

All Hands Meetings on Big Data Optimization - Semester 1, 2015-2016

Venue: James Clerk Maxwell Building ROOM: JCMB 6311 (6th floor)
Time: 12:15 - 13:15 (lunch provided: thanks to the support of the Head of School)

Date Speaker Paper
November 24, 2015 Nick Polydorides A quasi Monte Carlo method for large-scale inverse problems (Polydorides, Wang & Bertsekas - 2012) more resources: [regression, inverse, DP chapter]
November 17, 2015 Ran Zhang Path-following methods (Chapter 5 of Wright's "Primal-dual interior-point methods" book)
November 10, 2015 Jakub Konečný Why random reshuffling beats stochastic gradient descent (Gurbuzbalaban, Ozdaglar and Parrilo - 10/2015)
November 3, 2015 Nicolas Loizou
Stochastic gradient descent, weighted sampling and the randomized Kaczmarz algorithm (Needell, Srebro and Ward - 10/2013)
October 27, 2015 Dominik Csiba
A universal catalyst for first-order optimization (Lin, Mairal & Harchaoui - 6/2015)
October 20, 2015 No meeting

October 13, 2015 Robert M Gower Convergence rates of sub-sampled Newton methods (Erdogdu & Montanari - 8/2015)
October 6, 2015 Robert M Gower
Newton sketch (Pilanci & Wainwright - 5/2015)
September 29, 2015 Dominik Csiba
Beyond convexity: stochastic quasi-convex optimization (Hazan, Levy and S-Shwartz - 7/2015)
September 22, 2015 Jakub Konečný Communication Complexity of Distributed Convex Learning and Optimization (Arjevani and Shamir - 6/2015)

Organizers: Jakub Konečný and Peter Richtárik


All Hands Meetings on Big Data Optimization - Semester 2, 2014-2015

Venue: James Clerk Maxwell Building ROOM: JCMB 4312 (4th floor)
Time: 12:15 - 13:15 (lunch provided: thanks to the support of the Head of School)

Date Speaker Paper
May 19, 2015 Ian Wallace HELM: Holomorphic Embedding Load flow Method (papers: 1 and 2 )
May 12, 2015 Andreas Grothey Contingency generation for AC optimal power flow (Chiang and Grothey - 2012 [Optimization Online])
May 5, 2015 No meeting due to Optimization and Big Data 2015
April 28, 2015 Zheng Qu On lower and upper bounds for smooth and strongly convex optimization problems (Arjevani, Shalev-Shwartz and Shamir - 3/2015)
April 21, 2015 Alessandro Perelli Combining ordered subsets and momentum for accelerated X-ray CT image reconstruction (Donghwan, Ramani and Fessler - 1/2015, IEEE link)
April 14, 2015 Robert Gower Research discussion
April 7, 2015 No meeting due to Easter Break
March 31, 2015 Dominik Csiba Stochastic Dual Coordinate Ascent (SDCA): A Dual-Free Analysis (Shai Shalev-Shwartz - 2/2015)
March 24, 2015 Jakub Konečný Greedy coordinate descent vs randomized coordinate descent
March 17, 2015 Tom Mayo and Guido Sanguinetti Challenges for predictive modelling in high-throughput biology (papers: [1] and [2])
March 10, 2015 Zheng Qu Complexity bounds for primal-dual methods minimizing the model of objective function (Nesterov - 2/2015)
March 3, 2015 Kimon Fountoulakis Randomized numerical linear algebra meets big data optimization (Yang, Chow, Re and Mahoney - 2/2015 and Yang, Meng and Mahoney - 2/2015)
February 24, 2015 Robert M. Gower Action constrained quasi-Newton methods (Gower and Gondzio - 12/2014)
February 17, 2015 no meeting due to Innovative Learning Week
February 10, 2015 Chris Williams Linear dynamical systems applied to condition monitoring (papers [1] and [2]).

Abstract: We develop a Hierarchical Switching Linear Dynamical System (HSLDS) for the detection of sepsis in neonates in an intensive care unit. The Factorial Switching LDS (FSLDS) of Quinn et al. (2009) is able to describe the observed vital signs data in terms of a number of discrete factors, which have either physiological or artifactual origin. We demonstrate that by adding a higher-level discrete variable with semantics sepsis/non-sepsis we can detect changes in the physiological factors that signal the presence of sepsis. We demonstrate that the performance of our model for the detection of sepsis is not statistically different from the auto-regressive HMM of Stanculescu et al. (2013), despite the fact that their model is given "ground truth" annotations of the physiological factors, while our HSLDS must infer them from the raw vital signs data. Joint work with Ioan Stanculescu and Yvonne Freer.
February 3, 2015 Jakub Konečný Communication efficient distributed optimization using an approximate Newton-type method (Shamir, Srebro and Zhang - 12/2013)
January 27, 2015 Zheng Qu A lower bound for the optimization of finite sums (Agarwal and Bottou - 10/2014)
January 20, 2015 Ilias Diakonikolas Algorithms in Statistics (papers: long version [1] and short version [2])

Blurb: A broad class of big data – such as those collected from financial transactions, seismic measurements, neurobiological measurements, sensor nets, or network traffic records – is best modeled as samples from a probability distribution over a very large domain. One of the most basic statistical inference tasks in this setting is this: learn the underlying distribution that generated the data.

Organizers: Jakub Konečný, Zheng Qu and Peter Richtárik


All Hands Meetings on Big Data Optimization - Semester 1, 2014-2015

Venue: James Clerk Maxwell Building ROOM: 6311 (6th floor)
Time: Tuesdays, 12:15 - 13:15 (lunch provided: thanks to NAIS)

Date Speaker Paper
December 2, 2014 Charles Sutton Optimization in Modern Machine Learning: Four Vignettes (Exploratory data analysis: Mining transaction data, Unsupervised learning in neural networks, Signal disaggregation: Understanding household energy usage, Sampling from high dimensional distributions using continuous relaxations) (papers: [1] [2] [3] )
November 25, 2014 Dominik Csiba Iterative Hessian sketch: fast and accurate solution approximation for constrained least-squares (based on Pilanci and Wainwright - 11/2014)
November 18, 2014 Xavier Cabezas Cycle bases in network synchronization problems (based on [1, 2, 3])
November 11, 2014 Zheng Qu Large-scale randomized-coordinate descent methods with non-separable linear constraints (Reddy, Hefny, Downey, Dubey and Sra - 10/2014)
November 4, 2014 Ademir Ribeiro Towards a direct search method with adaptive directions/geometry (Ademir will describe some challenges of his ongoing research in the area; paper to read: Konecny and Richtarik - 09/2014)
October 28, 2014 Amos Storkey Machine learning markets (abstract)
October 21, 2014 Dominik Csiba A stochastic PCA algorithm with an exponential convergence rate (Shamir - 09/2014)
October 14, 2014 Jakub Konecny Parallelism in optimization (this is a brainstorming session about the limits of paralleism in optimization and is not based on any papers)
October 7, 2014 Robert Gower A stochastic quasi-Newton method for large-scale optimization (Byrd, Hansen, Nocedal and Singer - 2014)
September 30, 2014 Jakub Konecny Trade-offs of large scale learning (papers: 1 - Bottou and Bousquet, 2 - Bottou and Bousquet, 3 - Bottou)
September 23, 2014 Zheng Qu SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives (Defazio, Bach and Lacoste-Julien - 2014)
September 16, 2014 Kimon Fountoulakis Robust block coordinate descent (Fountoulakis and Tappenden - 2014)

Organizers: Jakub Konečný, Zheng Qu and Peter Richtárik


All Hands Meetings on Big Data Optimization - Semester 2, 2013-2014

Venue: James Clerk Maxwell Building NEW ROOM: 4312 (4th floor)
Time: Tuesdays, 12:15 - 13:15 (refreshments provided: thanks to NAIS)

Date Speaker Paper
June 17, 2014 Mojmír Mutný Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization (Martin Jaggi - ICML 2013)
June 10, 2014 no meeting (due to this event)
June 3, 2014 Lukas Szpruch Multilevel Monte Carlo methods for applications in finance (Giles and Szpruch)
May 27, 2014 Jakub Konečný Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning (Julien Mairal - 2014)
May 13, 2014 Zheng Qu First-order methods of smooth convex optimization with inexact oracle (Devolder, Glineur and Nesterov - 2011). Preprint here.
May 6, 2014 Robert M. Gower A Stochastic Quasi-Newton Method for Large-Scale Optimization (Byrd, Hansen, Nocedal and Singer - 2014). Plus maybe also some background from this paper.
April 30, 2014 Olivier Fercoq Adaptive Subgradient Methods for Online Learning
and Stochastic Optimization (Duchi, Hazan and Singer - 2011)
April 22, 2014 no meeting (spring break)
April 15, 2014 no meeting (spring break)
April 8, 2014 no meeting (spring break)
April 1, 2014 Martin Takáč A Proximal Stochastic Gradient Method with Progressive Variance Reduction (Xiao and Zhang - 2014)
March 25, 2014 no meeting
March 18, 2014 Jakub Konečný Subgradient Methods for Huge-Scale Optimization Problems (Nesterov - 2012) [Mathematical Programming 2013]
March 11, 2014 Kimon Fountoulakis Parallel Coordinate Descent Newton for Efficient L1-Regularized Minimization (Bian, Li, Liu and Yang - 2013)
March 4, 2014 Mehrdad Yaghoobi Efficient Projections onto the L1-Ball for Learning in High Dimensions (Duchi, Shalev-Shwartz, Singer, Chandra - 2008)
Feb 25, 2014 Zheng Qu Finding the stationary states of Markov chains by iterative methods (Nesterov and Nemirovski - 2013)
Feb 18, 2014 no meeting as many of us will attend this event
Feb 11, 2014 Olivier Fercoq Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems (Lee and Sidford - 2013)
Feb 4, 2014 Rachael Tappenden Feature Clustering for Accelerating Parallel Coordinate Descent (Sherrer, Tewari, Halappanavar and Haglin - 2012)
Jan 28, 2014 Jakub Konečný Minimizing Finite Sums with the Stochastic Average Gradient (Schmidt, Le Roux and Bach - 2013)

Organizers: Jakub Konečný and Peter Richtárik