USC at the International Conference on Machine Learning (ICML)

Caitlin Dawson | July 25, 2023

USC researchers will present nine papers at the 40th International Conference on Machine Learning (ICML 2023), the leading international conference on machine learning

USC researchers will present nine papers at the 40th International Conference on Machine Learning (ICML 2023).

This year, USC researchers will showcase nine papers at the 40th International Conference on Machine Learning (ICML 2023), one of the most prestigious machine learning conferences, taking place July 23 – 29 in Honolulu, Hawai’i. ICML brings together the artificial intelligence (AI) community to share new ideas, tools, and datasets, and make connections to advance the field.

Accepted papers with USC affiliation:

Moccasin: Efficient Tensor Rematerialization for Neural Networks Tu, Jul 25, 17:00 — Poster Session 2

Burak BartanHaoming LiHarris TeagueChristopher LottBistra Dilkina

Abstract: The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called \textsc{Moccasin} with only O(n) integer variables, where n is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with O(n2) Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.

Refined Regret for Adversarial MDPs with Linear Function Approximation Tu, Jul 25, 17:00 — Poster Session 2

Yan DaiHaipeng LuoChen-Yu WeiJulian Zimmert

Abstract: We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over K episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in some known features, that is, a linear function approximation exists. The best existing regret upper bound for this setting (Luo et al., 2021) is of order ̃ (K2/3) (omitting all other dependencies), given access to a simulator. This paper provides two algorithms that improve the regret to  (K‾‾√) in the same setting. Our first algorithm makes use of a refined analysis of the Follow-the-Regularized-Leader (FTRL) algorithm with the log-barrier regularizer. This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest. Our second algorithm develops a magnitude-reduced loss estimator, further removing the polynomial dependency on the number of actions in the first algorithm and leading to the optimal regret bound (up to logarithmic terms and dependency on the horizon). Moreover, we also extend the first algorithm to simulator-free linear MDPs, which achieves ̃ (K8/9) regret and greatly improves over the best existing bound ̃ (K14/15). This algorithm relies on a better alternative to the Matrix Geometric Resampling procedure by Neu & Olkhovskaya (2020), which could again be of independent interest.

Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning We, Jul 26, 14:00 — Poster Session 3

Taoan HuangAaron FerberYuandong TianBistra DilkinaBenoit Steiner

Abstract: Integer Linear Programs (ILPs) are powerful tools for modeling and solving a large number of combinatorial optimization problems. Recently, it has been shown that Large Neighborhood Search (LNS), as a heuristic algorithm, can find high-quality solutions to ILPs faster than Branch and Bound. However, how to find the right heuristics to maximize the performance of LNS remains an open problem. In this paper, we propose a novel approach, CL-LNS, that delivers state-of-the-art anytime performance on several ILP benchmarks measured by metrics including the primal gap, the primal integral, survival rates and the best-performing rate. Specifically, CL-LNS collects positive and negative solution samples from an expert heuristic that is slow to compute and learns a new one with a contrastive loss. We use graph attention networks and a richer set of features to further improve its performance.

Fairness in Matching under Uncertainty We, Jul 26, 17:00 — Poster Session 4

Siddartha DevicDavid KempeVatsal SharanAleksandra Korolova

Abstract: The prevalence and importance of algorithmic two-sided marketplaces has drawn attention to the issue of fairness in such settings. Algorithmic decisions are used in assigning students to schools, users to advertisers, and applicants to job interviews. These decisions should heed the preferences of individuals, and simultaneously be fair with respect to their merits (synonymous with fit, future performance, or need). Merits conditioned on observable features are always \emph{uncertain}, a fact that is exacerbated by the widespread use of machine learning algorithms to infer merit from the observables. As our key contribution, we carefully axiomatize a notion of individual fairness in the two-sided marketplace setting which respects the uncertainty in the merits; indeed, it simultaneously recognizes uncertainty as the primary potential cause of unfairness and an approach to address it. We design a linear programming framework to find fair utility-maximizing distributions over allocations, and we show that the linear program is robust to perturbations in the estimated parameters of the uncertain merit distributions, a key property in combining the approach with machine learning techniques.

On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing Th, Jul 27, 13:30 — Poster Session 5

Sepanta ZeighamiCyrus Shahabi

Abstract: A fundamental problem in data management is to find the elements in an array that match a query. Recently, learned indexes are being extensively used to solve this problem, where they learn a model to predict the location of the items in the array. They are empirically shown to outperform non-learned methods (e.g., B-trees or binary search that answer queries in O(logn) time) by orders of magnitude. However, success of learned indexes has not been theoretically justified. Only existing attempt shows the same query time of O(logn), but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in O(loglogn) expected query time. We also show that allowing for slightly larger but still near-linear space overhead, a learned index can achieve O(1) expected query time. Our results theoretically prove learned indexes are orders of magnitude faster than non-learned methods, theoretically grounding their empirical success.

SurCo: Learning Linear SURrogates for COmbinatorial Nonlinear Optimization Problems Th, Jul 27, 16:30 — Poster Session 6

Aaron FerberTaoan HuangDaochen ZhaMartin SchubertBenoit SteinerBistra DilkinaYuandong Tian

Abstract: Optimization problems with nonlinear cost functions and combinatorial constraints appear in many real-world applications but remain challenging to solve efficiently compared to their linear counterparts. To bridge this gap, we propose SurCo that learns linear Surrogate costs which can be used in existing Combinatorial solvers to output good solutions to the original nonlinear combinatorial optimization problem. The surrogate costs are learned end-to-end with nonlinear loss by differentiating through the linear surrogate solver, combining the flexibility of gradient-based methods with the structure of linear combinatorial optimization. We propose three 𝚂𝚞𝚛𝙲𝚘 variants: 𝚂𝚞𝚛𝙲𝚘𝚣𝚎𝚛𝚘 for individual nonlinear problems, 𝚂𝚞𝚛𝙲𝚘𝚙𝚛𝚒𝚘𝚛 for problem distributions, and 𝚂𝚞𝚛𝙲𝚘𝚑𝚢𝚋𝚛𝚒𝚍 to combine both distribution and problem-specific information. We give theoretical intuition motivating 𝚂𝚞𝚛𝙲𝚘, and evaluate it empirically. Experiments show that 𝚂𝚞𝚛𝙲𝚘 finds better solutions faster than state-of-the-art and domain expert approaches in real-world optimization problems such as embedding table sharding, inverse photonic design, and nonlinear route planning.

Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions Th, Jul 27, 13:30 — Poster Session 5

Mahyar KhayatkhoeiWael AbdAlmageed

Abstract: Precision and Recall are two prominent metrics of generative performance, which were proposed to separately measure the fidelity and diversity of generative models. Given their central role in comparing and improving generative models, understanding their limitations are crucially important. To that end, in this work, we identify a critical flaw in the common approximation of these metrics using k-nearest-neighbors, namely, that the very interpretations of fidelity and diversity that are assigned to Precision and Recall can fail in high dimensions, resulting in very misleading conclusions. Specifically, we empirically and theoretically show that as the number of dimensions grows, two model distributions with supports at equal point-wise distance from the support of the real distribution, can have vastly different Precision and Recall regardless of their respective distributions, hence an emergent asymmetry in high dimensions. Based on our theoretical insights, we then provide simple yet effective modifications to these metrics to construct symmetric metrics regardless of the number of dimensions. Finally, we provide experiments on real-world datasets to illustrate that the identified flaw is not merely a pathological case, and that our proposed metrics are effective in alleviating its impact.

Conformal Inference is (almost) Free for Neural Networks Trained with Early Stopping Th, Jul 27, 13:30 — Poster Session 5

Ziyi LiangYanfei ZhouMatteo Sesia

Abstract: Early stopping based on hold-out data is a popular regularization technique designed to mitigate overfitting and increase the predictive accuracy of neural networks. Models trained with early stopping often provide relatively accurate predictions, but they generally still lack precise statistical guarantees unless they are further calibrated using independent hold-out data. This paper addresses the above limitation with conformalized early stopping: a novel method that combines early stopping with conformal calibration while efficiently recycling the same hold-out data. This leads to models that are both accurate and able to provide exact predictive inferences without multiple data splits nor overly conservative adjustments. Practical implementations are developed for different learning tasks — outlier detection, multi-class classification, regression — and their competitive performance is demonstrated on real data.

A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment Tu, Jul 25, 17:00 — Poster Session 2

Hanchen XieJiageng ZhuMahyar KhayatkhoeiJiazhi LiMohamed HusseinWael AbdAlmageed

Abstract: Dynamics prediction, which is the problem of predicting future states of scene objects based on current and prior states, is drawing increasing attention as an instance of learning physics. To solve this problem, Region Proposal Convolutional Interaction Network (RPCIN), a vision-based model, was proposed and achieved state-of-the-art performance in long-term prediction. RPCIN only takes raw images and simple object descriptions, such as the bounding box and segmentation mask of each object, as input. However, despite its success, the model’s capability can be compromised under conditions of environment misalignment. In this paper, we investigate two challenging conditions for environment misalignment: Cross-Domain and Cross-Context by proposing four datasets that are designed for these challenges: SimB-Border, SimB-Split, BlenB-Border, and BlenB-Split. The datasets cover two domains and two contexts. Using RPCIN as a probe, experiments conducted on the combinations of the proposed datasets reveal potential weaknesses of the vision-based long-term dynamics prediction model. Furthermore, we propose a promising direction to mitigate the Cross-Domain challenge and provide concrete evidence supporting such a direction, which provides dramatic alleviation of the challenge on the proposed datasets.

Published on July 25th, 2023

Last updated on July 26th, 2023

Share This Story