Behind every good robot is a human with serious smarts, a big imagination, and a penchant for problem solving. Or, in the case of Google DeepMind’s Robotic Transformer 2 (or RT-2), a team of humans—including three recent USC computer science doctoral graduates.
“In movies and TV, successful research is almost always portrayed as the work of a lone genius, toiling away in spite of the doubters,” said Ryan Julian (’21). “In reality, research is a team sport, and it takes all kinds of people to make a great team. I’m so proud that I got to be a part of the incredible team that made RT-2 a reality.”
Julian and his Trojan teammates Karol Hausman (’18) and Yevgen Chebotar (’19) are research scientists at Google DeepMind’s headquarters in Mountain View, CA. The trio, who were advised by Professor Gaurav Sukhatme, contributed to the first-of-its-kind vision-language-action (VLA) model, which The New York Times described as “a quiet revolution.”
According to Google DeepMind’s Head of Robotics Vincent Vanhoucke in the company’s official blog: “Just like language models are trained on text from the web to learn general ideas and concepts, RT-2 transfers knowledge from web data to inform robot behavior. In other words, RT-2 can speak robot.” This means it can follow instructions and make abstract connections–the holy grail for robotics.
We sat down with Julian, Hausman and Chebotar to find out more about their journeys from USC to Google, their path to robotics, and their visions of the future.
Answers have been edited for style and clarity.
When did you realize you wanted to be a roboticist? Was there a lightbulb moment?
RJ Funny enough, I never dreamed I could have a career in robotics. It seemed too far-fetched. I tried several other options but nothing got me as excited and motivated as the robotics club I co-founded as an undergrad. So, I decided to try and make it work for me. It’s been a bumpy ride, but I’ve never looked back. To this day, I’m amazed and incredibly grateful that I get to do this for a living.
“To this day, I’m amazed and incredibly grateful that I get to do this for a living.” Ryan Julian.
KH It was only after I started my master’s in Munich that I realized that I could work in the field of intelligent robots. I was late for my first class and started looking around for a bathroom when I heard strange mechanical noises coming from behind big two white doors in one of the labs. Being curious, I opened the door and saw two human-size humanoid robots making popcorn! It was surreal to realize that intelligent robots exist, and there are people who work on them.
YC The starting point for me was my master’s thesis project on robot arm manipulation using tactile sensors. I realized that teaching robots instead of pre-programming them is one of the most interesting things I could work on and make a difference. I also enjoy working on different things at the same time, so I liked how robotics brought everything together: math, control, learning, vision, etc.
Can you share any interesting or memorable moments during the development of the model? How does it feel to be involved in such a momentous project?
RJ When the model identified and grasped the objects correctly, it was clear we were dealing with something new. Then, we started really pushing for unknown tasks and objects. We saw the model could now understand pictures, read text, and manipulate objects based on higher-level concepts. Seeing the model performing tasks for which we hadn’t collected any robotic data seemed to open a completely new set of opportunities.
“One of us said, somewhat jokingly, that we would look back on this as a big step in robot learning research.” Karol Hausman.
KH I remember the first time the model started outputting what looked like robot actions. Yevgen showed me the model running on a real robot, and we were both really surprised. One of us said, somewhat jokingly, that we would look back on this as a big step in robot learning research. I think we were both semi-serious at that time, but it felt like the beginning of something exciting.
YC For me, it was realizing how much the model could go beyond the robotic data. The model we ran on the real robot had 55 billion parameters, probably the largest ever used for robotic control.
How did you all end up working at Google on this particular project?
KH I applied for a research position at Google Brain right after my PhD at USC. In fact, I didn’t get into Google for an internship (I applied twice and got rejected both times) and instead interned at DeepMind in London, which I believe prepared me pretty well for my further career at Google and helped me build confidence that Google might be a place for me after all.
RJ While I was at USC, I actually started working with Karol, who had very similar interests to mine. He was a couple of years ahead of me and ended up joining Google after he graduated. We kept collaborating on research, and I joined him at Google as a summer intern. I got really lucky, and Google let me stay as a part-time researcher after my internship. Two years later, I finished my PhD, and joining Google again was the natural choice.
YC I also started my interaction with Google during my internships as a USC PhD student. My first internship was on a machine learning project in the speech recognition team. Later, I was lucky to intern on robotics teams at X (formerly Google X) and Google Brain where I could experience working on larger-scale robotics projects. After finishing my PhD, Google’s robotics team was the best match for me in terms of my research interests.
Looking back, what advice would you give students who aspire to follow a similar career path?
RJ If you want a career in robotics, or any other emerging field or technology, ask yourself if you’re ready to be stubborn, and focus on the long term. Emerging fields are immature and poorly understood, which means career paths working with them are not well-defined or straightforward. Rather than trying to find the single path to your goal, your central task is to keep working in your chosen field, however, you can make it work, even if that means making short-term sacrifices and ignoring the zeitgeist. Success is not guaranteed, but it is certainly possible.
What’s your long-term dream for robotics and language models?
RJ We have a very good toolbox for teaching robots to follow the rules of the road, or how to assemble exactly the same part over and over again. What we don’t have—and what I hope language models can provide—is a toolbox for giving robots grounded intuition for how the real world works. Armed with that intuition, they can start from the facts and expectations the language model contains about the world, and work from there to figure out how to do new things—just like humans do.
“We hope to overcome one of the biggest limitations in robot learning: the scarcity and difficulty of collecting robot data.” Yevgen Chebotar
KH I would like to see robots being helpful companions in human-centric environments where they can understand the requests of humans around them, rather than these big bulky machines in factories behind safety glass.
YC As our models start to understand the world by transferring knowledge from language and vision, we hope to overcome one of the biggest limitations in robot learning: the scarcity and difficulty of collecting robot data. I hope we will continue to see new generalization results in robotics over the next few years, making robots increasingly useful for society. In the long term, I think melding robotic models with other modalities, such as vision and language, can help us to develop robots that are both capable and easy to communicate with.
Published on October 23rd, 2023
Last updated on May 16th, 2024