USC at ICML ‘22 Conference

Lillian Goodwin | July 21, 2022

USC researchers present 8 papers on notable topics including model explainability, game theory and multi-agent reinforcement learning.

The International Conference on Machine Learning (ICML), a haven for innovations in machine learning. Photo/iStock.

USC students and faculty are presenting their latest research at the International Conference on Machine Learning (ICML), July 17-21, a haven for innovations in machine learning and the premier academic conference in the field. Every year, the conference attracts publications from both industrial and collegiate researchers on this particular branch of artificial intelligence, covering its potential applications in everything from biology to robotics.

This year’s event in Baltimore marks the 39th ICML and will feature nine papers co-authored by USC students or professors in partnership with companies including Google, Amazon and Facebook. The papers feature a wide breadth of topics, such as game theory, language learning and mapping out neural networks. 

We asked the authors to summarize their research and its potential impact. (Responses have been edited for clarity.) 

Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games

Gabriele Farina (Carnegie Mellon University) · Chung-Wei Lee (University of Southern California) · Haipeng Luo (University of Southern California) · Christian Kroer (Columbia University)

We consider solving extensive-form games (EFG), which is a general framework that models a lot of strategy games including card games, such as Texas hold ’em or Blackjack and board games, such as Monopoly or Chess. The main application is to build better AI by playing these games. In this paper, we propose Kernelized Optimistic Multiplicative Weights Update (KOMWU), the first algorithm that enjoys important theoretical guarantees at the same time including better convergence, lower dependence on the size of the game, and nearly optimal ‘regret.’” Chung-Wei.

Learning Infinite-horizon Average-reward Markov Decision Process with Constraints

Liyu Chen (University of Southern California) · Rahul Jain (University of Southern California) · Haipeng Luo (University of Southern California)

“This paper studies how to learn a policy that maximizes long-term average reward while satisfying some constraints with Reinforcement Learning (RL). For example, in logistics management, you want to minimize the transportation cost while obeying the traffic rules and meeting all deadlines. We propose a new algorithm that achieves better learning performance compared to existing work (measured by a notion called regret). We are also the first to study a more general setting called the weakly communicating assumptions in this direction, and propose the first set of algorithms for this more general setting.” Liyu Chen 

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

Liyu Chen (University of Southern California) · Rahul Jain (University of Southern California) · Haipeng Luo (University of Southern California)

“This paper studies how to solve goal-reaching tasks, such as car navigation or robotic manipulation, with Reinforcement Learning (RL) when some kind of linear structure is imposed on the environment. The goal is to leverage this structure to make learning possible. Studying this setting is an important step toward understanding reinforcement learning with function approximation (such as deep neural networks). We propose three algorithms in this direction. The first algorithm achieves state-of-the-art learning performance (measured by a notion called regret) and it is computationally efficient. The second and the third algorithms give other forms of regret guarantee which is desirable for some tasks.” Liyu Chen 

A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

Daniel Lundstrom (University of Southern California) · Tianjian Huang (University of Southern California) · Meisam Razaviyayn (University of Southern California, ISE)

Deep neural networks are very powerful tools for prediction. For example, they can help doctors read medical scans or enable self-driving cars to interpret what their exterior cameras see. The internal workings of these models are so complex that professionals have a hard time explaining them, and various tools have been developed to explain how neural networks work. Our paper is a deep dive analysis of a popular method, Integrated Gradients, which is a model explainer that claims to be the only method to satisfy a desirable set of properties. 

We show that establishing the uniqueness of Integrated Gradients is more difficult than previously assumed, and work towards establishing it by introducing another key property, then proving key results with that property. We also introduce an algorithm to help experts interpret the role of internal components or neurons. With this algorithm, experts could understand what parts of the model are responding to a wheel when the model is identifying an image of a car, for example.” Daniel Lundstrom

No-Regret Learning in Time-Varying Zero-Sum Games

Mengxiao Zhang (University of Southern California) · Peng Zhao (Nanjing University) · Haipeng Luo (University of Southern California) · Zhi-Hua Zhou (Nanjing University

Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. However, more often in practice, the game is not fixed but changes over time because of the environment change and the players’ strategy change. Motivated by this, we focus on a natural yet underexplored variant of this problem where the game payoff matrix changes over time, possibly in an adversarial manner. 

We first discuss what the appropriate performance measures are for learning in non-stationary games and propose three natural and reasonable measures for this problem. Then, we design a new parameter-free algorithm that simultaneously enjoys favorable guarantees under the three different performance measures. These guarantees are adaptive to different non-stationarity measures of the payoff matrices and, importantly, recover the best-known results when the payoff matrix is fixed. Empirical results further validate the effectiveness of our algorithm.” Mengxiao Zhang 

UniREx: A Unified Learning Framework for Language Model Rationale Extraction

Aaron Chan (University of Southern California) · Maziar Sanjabi (Meta AI) · Lambert Mathias (Facebook) · Liang Tan (Facebook) · Shaoliang Nie (Facebook) · Xiaochang Peng · Xiang Ren (University of Southern California) · Hamed Firooz (Facebook)

“Neural language models (NLMs), which make complex decisions based on natural language text, are the backbone of many modern AI systems. Nonetheless, NLMs’ reasoning processes are notoriously opaque, making it difficult to explain NLMs’ decisions to humans. This lack of explainability also makes it hard for humans to debug AI systems when they behave problematically. To address this issue, our ICML paper proposes UNIREX, a unified framework for data-driven rationale extraction, which explains an NLM’s decision for a given input text by highlighting the words that most influenced the decision.

Our extensive empirical studies show that UNIREX vastly outperforms other rationale extraction methods in balancing faithfulness, plausibility, and task performance. Surprisingly, UNIREX is still effective in real-world scenarios with limited labeled data, capable of achieving high explainability when training on a very small number of annotated rationales. Plus, UNIREX rationale extractors’ explainability can even generalize to datasets and tasks that are completely unseen during training!” Xiang Ren

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

Dongsheng Ding (University of Southern California) · Chen-Yu Wei (University of Southern California) · Mihailo Jovanovic (University of Southern California) · Kaiqing Zhang (MIT)

“Can many independent agents learn good policies? This is an interesting question for real-world systems with multiple agents, from players in video games and robots in surveillance, to bidders in real-time bidding. Searching policies by multiple agents in tandem using reinforcement learning (RL) techniques has achieved great empirical performance in playing video games, for instance, StarCraft. However, it is critical to scale existing RL methods up in the number of agents and the size of state space since they are enormously large for real-world multi-agent systems.

We established a simple and natural method that solves a large-scale problem in multi-agent RL. No matter the number of agents and size of state space, agents can myopically maximize their private rewards by independently searching for better policies without communicating with each other. This significantly advances the state-of-the-art of RL for multi-agent systems, more generally the field of cooperative AI. Beyond being independent of other agents, we discovered that agents can learn good policies without knowing the type of games being played. This makes our method easy to use in either cooperative or competitive AI systems.” Dongsheng Ding

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization

Alberto Bietti (NYU) · Chen-Yu Wei (University of Southern California) · Miro Dudik (Microsoft Research) · John Langford (Microsoft Research) · Steven Wu (Carnegie Mellon University)

“We often rely on recommendation systems to make decisions. For example, to help us choose restaurants, movies, music, news, shopping and more. The recommendation systems need to aggregate feedback from users and build a collective model. Since every user has their own preference, the system may further need a ‘personalized model.’ While the main goal is to make good recommendations, such a system is subject to privacy constraints. That is, users may not want to share their precise data, such as location and transaction information. Naturally, the less data the users want to share, the less accurate the model. How can we build a good system under such privacy constraints?

This is a relevant question in the field of ‘personalized federated learning.’ We propose a system structure that leverages personalized models to respect the users’ privacy requirements. The personalized model can be kept at the user side, and training it does not cause privacy leakage. On the other hand, the accuracy of the global model heavily depends on how much data users want to share. Therefore, to strike a balance between privacy and accuracy, we control the relative weights between the global model and the personalized models. We show in theory and experiments that by properly tuning the relative learning rates between the global and personalized models, the system can achieve a better accuracy under a fixed privacy constraint.” Chen-Yu Wei 

Published on July 21st, 2022

Last updated on July 21st, 2022

Share This Story