“If you want to know the future, look at the past,” said Albert Einstein. Paleoclimatologists agree.
These scientists use data from geological samples to study past environments, climates, and ecosystems in order to better understand the climates of today and the future. A significant challenge in the field comes from the variability in how different sub-disciplines describe their data. For instance, while some researchers study historical climate through ice cores, others use tree rings or sediment layers, each with its terminology and methodologies. Researchers at USC’s Information Sciences Institute (ISI) are helping Earth scientists make the most of all that data with LinkedEarth – an initiative that brings together AI and paleoclimate research to create a cohesive understanding of historical climate data by revolutionizing the way data is managed and analyzed.
LinkedEarth originated in 2016 through a grant from the National Science Foundation (NSF). “It was the brainchild of Julien Emile-Geay at USC Dornsife, Yolanda Gil at ISI, and a colleague at Northern Arizona University, Nicholas McKay,” said Deborah Khider, a paleoclimatologist and research scientist currently working on LinkedEarth at ISI.
Gil is the Director of New Initiatives in AI and Data Science at the USC Viterbi School of Engineering, Senior Director for Major Strategic AI and Data Science Initiatives at ISI, and an expert in AI. She met Emile-Geay, Professor of Earth Sciences at USC Dornsife College of Letters, Arts & Sciences and an expert in climate science, including global warming and the climate of the past millennium, at an NSF workshop aimed at fostering collaborations between Earth scientists and computer scientists.
Brought Together by the Hockey Stick
The “hockey stick graph” is the informal way to refer to a famous graph published in the Intergovernmental Panel on Climate Change (IPCC) reports of average global temperatures over the past 500 to 2,000 years. First published in 1999, it shows a sharp rise in temperatures in the 20th century which indicates the impact of humans on global temperature. Since the initial publication, much of climate science has been devoted to the reconstruction of this graph in order to better understand it.
Gil said, “This graph tells you how the climate has evolved the last 2,000 years. What’s really interesting is that it takes many months to put it together and to include new data sets that have emerged. When we met, Julien was looking to reduce that time to get to a better global picture more quickly.” She continued, “I had been working on AI techniques to standardize data descriptions through crowdsourcing, and he found the idea really exciting.”
Speaking the Same Language
The starting point was to standardize data sets to allow researchers to easily access and analyze data using existing programming tools. “We wanted to create a crowdsourced platform that would allow geoscientists of any sub-discipline to curate their own data and add information, and it would all be open data and easily queried,” said Emile-Geay.
To do this, Gil and her team needed to understand the data. She said, “In AI, we use mathematical logic to represent objects and concepts. And in our initial discussions we saw that the way data was described was not consistent. So we started to work together to define in a precise way what different types of data represented.”
They began to develop an ontology – a structured framework for representing knowledge that describes concepts and the relationships that hold between them. Ontology development is more than creating a glossary; it involves mapping out how different concepts interrelate. Gil’s group used semantic technologies and knowledge graphs to develop AI tools for paleoclimatologists to describe their data using terms consistently.
They also created a reporting standard – minimum requirements for describing datasets. Emile-Geay explained, “as a community, we needed to come up with a set of things that we agreed were important to archive. When scientists put their data online, some of them were very detailed, and some of them very bare bones. We created a set of rules to ensure people could make use of new data.”
Letting Scientists Do Science
The next step was to create an analysis pipeline. The LinkedEarth team had created a new system to organize paleoclimate data, “but at the end of the day,” said Emile-Geay, “scientists want to do science.” So, over the years, Khider, Emile-Geay, and McKay have gotten several grants and created software for the paleoclimatology community to analyze their data.
Additionally, with the help of AI, they’ve created a system that walks scientists through the methodologies. Using AI to help with the workflows makes the database accessible to non-computer scientists. Khider envisions AI playing a pivotal role as a research assistant, providing recommendations and assisting scientists in navigating and interpreting datasets more effectively
Getting the Community on Board
“Once we put the data and the software in the hands of the community, the question became, ‘what do we do next?’” said Khider. The answer: training.
“It’s not enough to put this out there, you need to engage the community.” For the past few years the team has been running workshops every summer to assist the LinkedEarth community in mastering various aspects such as basic Python skills, software usage, data analysis techniques, and transitioning between programming languages. Additionally, emphasis has been placed on teaching participants how to ensure the reproducibility of their research findings for future use. These workshops cater to different skill levels, from beginners to more advanced users, including graduate students, postdocs, and faculty members.
FROGS (Facilitating Reproducible Open GeoScience) is the newest training initiative from LinkedEarth that links together science practice and publishing. The first FROGS session took place at ISI on June 3rd–6th, 2024 and included researchers of hydrology, atmospheric science, and paleoclimate.
What’s Next
Over the past few months, Khider and the LinkedEarth team have received three awards to continue their work, among them, an award from the U.S. National Science Foundation (NSF). In August 2024, NSF announced funding for AI technologies for the geosciences through the Collaborations in Artificial Intelligence and Geosciences (CAIG) program. This NSF investment aims to “advance the development and implementation of innovative AI techniques in geosciences while increasing technical capacity and expanding access to education and training opportunities for using AI approaches in geosciences research.” Among the 25 projects awarded was PaleoPAL: An AI Research Assistant for Paleoclimatology from the LinkedEarth team.
Published on September 11th, 2024
Last updated on September 16th, 2024