Once upon a time,
There was a big data analyst who saw a special something
In the pattern of a beloved story foretold
And with his brains and valued open-source programming
Set off on a quest to uncover this bold
Shift from fairy tale to a bigger reality
Of a children’s story gone off rails.
September 22nd marks an important date on the Shire calendar. It is the birthday of the hobbits Bilbo and Frodo Baggins, two fictional characters in J. R. R. Tolkien’s popular set of books “The Hobbit” and “The Lord of the Rings.” In the books, both Bilbo and Frodo were born on September 22 – Bilbo in the year 2890 and Frodo in 2968 of the Third Age. Of course, that’s 1290 and 1368, respectively, in Shire-reckoning for die-hard Tolkien fans.
This Hobbit Day 2016, Dave Kale, a Ph.D. candidate in USC Viterbi’s Department of Computer Science specializing in machine learning and hobbit lore, decided to give the halflings a special birthday gift: a computer that could read and analyze their books with the power of Big Data.
When one says big data, it’s tempting to imagine a behemoth of information stored within a supercomputer – too overwhelming to understand, let alone sift through. The sort of thing presumed to be reserved for corporations of the same monstrous size and power. But the reality is, Big Data isn’t necessarily such a behemoth after all. In fact, the study could be attributed to a single book, as Dave Kale set out to prove in his spare time.
Kale’s main research, conducted under Greg Ver Steeg of the Information Sciences Institute, aims to use machine learning to extract insight from massive digital data in health care. He’s developing deep learning solutions for precision diagnostic technologies that can scan a patient’s health data and immediately come up with a diagnosis. Imagine a portable, wireless device in the palm of your hand that monitors and diagnoses your health conditions anytime, anywhere. HIs research is shaping this emerging technology.
As the co-founder of the annual Meaningful Use of Complex Medical Data (MUCMD) Symposium, Kale is not only an expert in his field, but also in the breadth of anything and everything penned under J. R. R. Tolkien.
For years, he has hosted forums and podcasts, such as the hit all-things-Tolkien show “Riddles in the Dark”, diving into the significance of his high fantasy storytelling style one scene at a time. In 2015, he decided it was time to introduce another player to his own journey through Tolkien’s novels: a computer.
“It seemed like a good blending of my interests.” Kale said. “My area of research is machine learning and artificial intelligence, but one of my main avocations is fantasy and sci-fi. When it comes to fantasy, Tolkien is king.”
A bag of words
Kale’s passion translated into building a community of people who love the creator of Middle-earth. With a little curiosity and motivation he set out to bring something new to that community, one project to rule them all – his computer would analyze “The Hobbit” page by page.
For several days, Kale fed a digital copy of the book through his algorithm, which he trained to fragment each page into a “bag of words.” The algorithm then scanned the “bag” for repeated words or phrases that, when grouped together, formed themes and topics. These were then edited and categorized to form the basis for a literary analysis.
“What could a serious student of literature do with these types of tools?”
What happened next surprised Kale. The computer started taking its critic job seriously. Kale didn’t intend to discover that his algorithm could identify chapter breaks and contextual evidence that showed how Tolkien deliberately structured plot progression. The computer’s main task was to look for something more complex: tone progression.
“‘The Hobbit’ starts as a children’s fairytale,” explained Kale. “It’s very silly, but by the time they get to the end, what you have is a gritty war story where a lot of the main characters are dead and there’s a lot of fighting. It’s quite tragic. There’s this really interesting shift that goes gradually through the book, but it feels like it comes on suddenly. I figured if I threw some math at the book, it will find this.”
More to this than meets the eye
Although the computer did not immediately give Kale the answers to his burning questions about tonal shift, it had defined plot progression and chapter structure without being told either of those existed within the text. It even identified the main, recurring plot line within the entire novel, which kindled Kale’s thoughts: “what else could this program find?” As Gandalf once said: “I think there’s more to this Hobbit than meets the eye.”
“That’s the exciting thing,” Kale said. “What could a serious student of literature do with these types of tools?”
Kale’s approach to the novel use of digital humanities reveals just how diverse the use of analytics truly is and what future of understanding it can pave for us. A researcher or a mere enthusiast can teach the algorithm to search for even deeper, more complex literary analysis.
In Kale’s mind, Big Data is not a monster. It is not some futuristic mumbo-jumbo that endangers the livelihoods of every working man and woman. In fact, it won’t replace experts in general.
“It’s a tool – a lens to study an artifact,” Kale said.
This tool will allow us to look deeper into a field of expertise than we have ever gone before. And that, in itself, is magical.