An important pillar is developing practical optimization algorithms for nonconvex optimization problems. I believe that there is an urgent need for practical optimization algorithms in particular in the context of nonconvex datadriven optimization algorithms for which offtheshelf software is not available. I have worked on (i) developing exact combinatorial optimization algorithms for learning from highdimensional data such as genomics data and (ii) deriving faster firstorder methods for specialized nonconvex optimization problems such as those arising in optimal power flow problems.
Exact Combinatorial Optimization for HighDimensional Learning
Modern data often counts many more features than measured observations. Such highdimensional data sets are particularly prevalent in genomics and other areas of computational biology. Cancer data sets as an important example are comprised of the expression levels of many thousands of genes but typically concern only a few hundred individuals. Learning with highdimensional data is particularly challenging as most classical learning methods tend to fit the noise instead of the signal.
Simple and highly regularized learning methods are required here. Sparse linear models with a small number of nonzero coefficients are a popular choice when confronted with highdimensional data. It is indeed often the case that all but very few features in highdimensional data sets are truly relevant. As an example, most cancer types can be predicted based on a very limited number of genetic markers among all those measured. Equally important, sparse linear models are far more interpretable than are most other models. Fast heuristic sparse learning methods such as Lasso consequently rank among the most cited algorithms in the learning community.
Unfortunately, heuristic sparse methods tend to include many more irrelevant features than their exact counterparts do. I developed scalable learning methods to find the best sparse regressor, i.e.,
\begin{equation} \min_{\Vert w \Vert_0 \leq k} ~~ \frac12 \Vert Y X w \Vert^2 + \frac{1}{2\gamma} \Vert w \Vert^2 \end{equation}
or similarly the best classifier exactly using integer optimization. Moore’s law taken together with a comparable improvement in modern integer optimization solvers indeed represents a true revolution in this context. By exploiting an exact convex dual formulation exact sparse models can indeed be found using modern integer optimization for data sets counting up to a 100,000 observations and features which is two orders of magnitude larger than previously possible.
Related publications

Bertsimas, D., J. Pauphilet, and B.P.G. Van Parys (2021). “Sparse classification: a Scalable discrete optimization perspective”. In: Machine Learning 110.11, pp. 3177–3209. Link

Bertsimas, D. and B.P.G. Van Parys (2020a). “Sparse hierarchical regression with polynomials”. In: Machine Learning 109.5, pp. 973–997. Link

Bertsimas, D., J. Pauphilet, and B.P.G. Van Parys (2020). “Sparse regression: Scalable algorithms and empirical performance”. In: Statistical Science 35.4, pp. 555–578. Link

Bertsimas, D. and B.P.G. Van Parys (2020b). “Sparse highdimensional regression: Exact scalable algorithms and phase transitions”. In: Annals of Statistics 48.1, pp. 300–323. Link
Optimal FirstOrder Algorithms
Optimization formulations associated with learning problems based on huge data sets remain however out of reach for exact nonconvex optimization methods. Practical necessity dictates at such huge scale the use of firstorder methods which only use simple gradient information to determine a local optimizer. Most convex optimization problems are well studied and have associated optimal firstorder methods. In stark contrast, even for simple classes of nonconvex functions such as all smooth nonconvex functions no optimal firstorder method is known. I made significant progress in how to design faster firstorder methods to solve nonconvex optimization problems to local optimality. More significantly, such faster firstorder methods are not determined with pen and paper but rather found in a computerassisted fashion by solving an auxiliary optimization formulation \begin{equation} \min_{a\in \mathcal A} \max_{f\in \mathcal F} ~P_N(f, a) \end{equation} where the decision variable $a \in \mathcal A$ is best though of as characterizing a particular firstorder algorithm whereas the decision variable $f$ represents a particular functions from a class of interest $\mathcal F$, e.g., all smooth nonconvex functions. The performance of an algorithm $a$ on a function $f$ is denoted here as $P_{N}(f, a)$ and may for instance represent the norm of the gradient of the best iterate observed among the first $N$ iterates. I believe that automated discovery of better optimization methods for specialized classes of optimization problems using computer assisted tools is a promising research direction as it allows for fast tailored methods to be determined algorithmically rather than analytically.
Related publications

Das Gupta, S., B.P.G. Van Parys, and E.K. Ryu (2023). “Branchandbound performance estimation programming: A unified methodology for construction optimal optimization methods”. In: Mathematical Programming. Appeared also in Quanta Magazine. Link

Das Gupta, S., B. Stellato, and B.P.G. Van Parys (2022). “Exteriorpoint optimization for nonconvex learning”. In: SIAM Journal of Optimization. Submitted. Link