As COVID-19 continues to disrupt normal life, Mayank Kejriwal, research lead at the Information Science Institute (ISI) and research assistant professor at USC Viterbi’s Daniel J. Epstein Department of Industrial and Systems Engineering, aims to use AI in fighting the pandemic. Under Microsoft’s AI for Health initiative, Kejriwal is championing his project, “A COVID-19 Knowledge Graph Infrastructure for Assistive Expertise,” which seeks to go beyond simple keyword searches that are used to research finding a vaccine or cure. Kejriwal’s research findings were published in the Harvard Data Science Review in October.
Graphing the COVID-19 Crisis
In April 2020, Microsoft launched the AI for Health program, dedicating $20 million to help researchers develop technology that can play an important role in COVID-19 research, from using AI to crunch large datasets to identifying the impacts of treatments. For his project, Kejriwal was awarded $30,000 of sponsorship credits for Azure, a Microsoft hybrid cloud platform that allows users to run virtually any tools and applications across multiple clouds.
Going through the existing and constantly updated vast literature on the coronavirus family can be a formidable effort, even for experts, because simple keyword searches don’t work for several reasons. For example, a basic keyword search can’t provide a thorough overview of an entity like a protein or gene by analyzing relevant data across the literature. To address limitations like these, Kejriwal’s project proposed utilizing a Knowledge Graph (KG) infrastructure on Azure that can easily be queried by experts and is constantly updated as new scientific findings come out. Kejriwal was awarded a $30,000 grant from Microsoft’s AI for Health initiative to develop the infrastructure.
As Kejriwal explained, a KG is a model to get machines to understand ‘knowledge’, which is more than just 0’s and 1’s. “Knowledge is fundamental to human intelligence – it allows us to think about entities, events and relationships, and reason about them in our daily lives,” he said. “A KG is a way of bridging the gap between human conception of knowledge, and the 0’s and 1’s that machines like to work with.”
As more research is done on COVID-19, the KGs will be able to encode the expertise generated by subject matter experts into a form that AI algorithms can use. “In our research, KGs are central because we work in domains that are knowledge-rich,” Kejriwal said. “There’s already a lot of existing knowledge about COVID-19 (which is continuing to grow) and also other domains that I’ve worked in, such as fighting human trafficking, crisis informatics, and e-commerce.”
Working with KGs doesn’t come without its challenges, as KGs integrate data from multiple, non-standardized sources; however, most challenges tend to be more social than technical. “There’s sometimes a disconnect between what AI researchers think are challenges, and what subject matter experts (SMEs) think are challenges,” Kejriwal said. “This is more of a social challenge, but one I frequently deal with. For example, AI researchers are still very focused on ‘accuracy’ metrics, whereas SMEs want assurance that they can trust the results they are seeing.”
Additionally, as Kejriwal stated, full automation is overrated. “Most people are willing to put in some effort into ‘training’ an AI system if we make it easy for them to do so,” he said. “I think the primary challenges have less to do with AI and are more social in nature.”
There are at least two major projects that use KGs in an attempt to create an initial COVID-19 infrastructure. The Yahoo! COVID-19 KG project and the COVIDGraph project both utilize multiple datasets for COVID-19. Kejriwal’s project, however, will also integrate scientific literature on COVID-19, prioritizing quality and usability.
Generally, the vast majority of information used for these models will be publicly available data sources to prevent anonymization issues. “In the event that individual data is involved (such as when we use polling datasets or social media), we never reveal the full message or metadata, but only present results in aggregate,” he said. “For replication purposes, we’re usually very explicit about our methodology so others can achieve similar goals if they desire to do so.”
A key portion of Kejriwal’s project will require computational infrastructure and will focus on building and training machine learning models. “We hope to use Azure Machine Learning to simplify some of our ML pipelines, and to offset costs and inefficiencies that may have arisen had we not received this grant,” he said. “We also hope to build and test some novel models along the way.”
Kejriwal’s research has the potential of increasing trust in the use of AI and KGs in general, as KGs have come a long way as an ecosystem of tools and practices. Developing a COVID-19 KG is still very cutting-edge research that Kejriwal hopes to make significant progress on with the support of the Microsoft grant. The project will also be documented in meticulous detail to serve as a blueprint for similar efforts.
“What the current crisis has taught us is that we can never be too prepared for pandemics and crises that can cripple entire systems,” said Kejriwal. “Our hope is that with a replicable blueprint, future efforts will be even faster and higher-quality, and we hope to bring about a full discussion on costs, benefits, and best practices that’ll receive a stamp of approval from all stakeholders and lead to KGs and AI becoming more mainstream solutions for such crises in the future.”