With a new DARPA grant, the growing field of transfer learning has come to USC Viterbi’s Ming Hsieh Department of Electrical and Computer Engineering. The $1.5 million grant was awarded to three professors – Salman Avestimehr, professor of electrical and computer engineering, Antonio Ortega, professor of electrical and computer engineering, and Mahdi Soltanolkotabi, the Andrew and Erna Viterbi Early Career Chair and assistant professor of electrical and computer engineering and computer science. The trio, working in collaboration with Ilias Diakonikolas, professor of computer science at the University of Wisconsin, Madison, will address the theoretical foundations of this field.
Modern machine learning models are breaking new ground on data science tasks, achieving unprecedented performance, for example on classifying images on one thousand different image categories. This is achieved by training gigantic neural networks. “Neural networks work really well because they can be trained on huge amounts of pre-existing data that has previously been tagged and collected,” Avestimehr, who is the PI of the project, said. “But how can we train a neural network in scenarios with very limited samples, by for example leveraging (or transferring) the knowledge from a related problem that we have already solved? This is called transfer learning.”
Situations that humans can easily adapt to still cause problems for neural networks. Take navigation, for example. A robot may be trained to navigate effectively in New York City, but if you drop that same robot on the streets of Shanghai it usually fails. Faced with new data in the form of unrecognized street signs and a changed geography and language, this highly advanced neural network is suddenly rendered useless.
DARPA is supporting fourteen research teams to tackle the challenge of transfer learning in NLP, most of which are focused on the application side. The USC Viterbi researchers are one of only three teams focusing on the theoretical foundations.
“We are particularly excited to have an opportunity to focus on solving fundamental questions: what makes it possible to transfer information from one task to another? How much data is needed for a specific task, in addition to information that was transferred? Answers to these questions are essential to accelerate progress in machine learning problems for which large amounts of data are not available,” Ortega said.