As one of the oldest and most disseminated books of all time, the Bible has made its mark on the world through history and continues to do so. Its influence is partially due to the fact that it has been translated into over 700 languages, making it the single most translated book ever.
However, there are still many languages the ancient text hasn’t reached yet. Some of these niche languages are on the brink of extinction, meaning a biblical translation could be their last hope for keeping the mother tongue alive.
The challenge is that Bible translation is a time consuming and labor-intensive process. It’s still primarily done on the ground by humans, sometimes even by hand. For context, there are 66 books in the Bible, and the common King James English version has over 700,000 words. Imagine transcribing each passage line by line, word by word, or even character by character in some cases… talk about taxing!
Not to fear, though – Ulf Hermjakob, senior research scientist at USC Information Sciences Institute (ISI), a research institute of USC Viterbi School of Engineering, and Joel Mathew, research engineer at ISI, are building tools using natural language processing (NLP) to help increase the efficiency of this process and allow for more languages to be reached and translated at a faster rate.
While the relatively small text corpus of the Bible is a challenge for NLP, there are many high-quality translations to hundreds of languages, which provides great research opportunities.
A Match Made in Heaven
Mathew grew up in India with parents involved in Bible translation. “There were a lot of areas where I felt software technology could really speed up, improve, support and help them,” Mathew said. “It’s one of my passions to see the Bible translated in all languages.”
ISI was the main reason he joined USC in 2015 — because of its strong natural language processing group and its work on low-resource machine translation. Once at ISI, Mathew met Hermjakob in the AI division. They were both passionate about Christianity and interested in low-resource machine translation.
Hermjakob explained, “People don’t realize that there are about 7100 languages in the world. Google Translate covers about 100 of them.” He continued, “For this Bible translation, we’re really targeting very low-resource languages that are not even in the top 500.”
Spreading the Word
Historical methods of Bible translation were one-man jobs and sometimes took an entire lifetime to complete.
“Traditionally, you would have a Western missionary who is also a linguist be sent to the area,” Hermjakob explained. “They would spend a decade or two learning the local language and then painstakingly translate the Bible, making it their life’s project in a sense.”
This method is expensive, does not scale well, and is obviously very slow. Recently, there has been a shift away from using these Western experts and instead replacing them with a team of people from the community who can then translate the Bible from a gateway language to their own native language. A “gateway language” is a regional language known to the translators of smaller languages. Rather than translating the Bible from the original Hebrew and Greek, which local translators typically do not know, they will translate from a regional gateway language to their smaller native language. For example, Hindi is one of the gateway languages in India.
Some aspects of biblical translation are objective, meaning they can be easily automated, taking some of the burden off those ground translator’s shoulders.
Hermjakob and Mathew’s work, the Greek Room, seeks to cover the majority of these objective pieces so that the subjective, harder parts are left up to humans.
There Are No Words to Describe It (No, Really…)
One big subjectivity challenge is the fact that some concepts simply don’t exist in certain languages.
“There is a community living in the mountains, and they live in huts without doors, so there’s no concept of a door in their culture,” Mathew said. “In the Bible there is a verse that says ‘behold I stand at the door and knock.’ The question is, how do you translate that for people so that it is meaningful for them?”
The rest of the verse goes on to say ‘if anyone hears my voice,’ which Mathew said reveals the bigger picture meaning behind the verse: the invitation itself.
“We try to then explain it as not specifically knocking at the door, but instead describe a scene where someone is standing at the entrance of your house and asking to be invited to come in,” he added.
With the help of the Greek Room, the goal is for the human to be able to focus their attention towards these more complex translation challenges, such as finding ways to convey meaning, rather than channeling their energy into the more tedious parts.
It’s All About Flagging
The Greek Room consists of a collection of tools – spell checking, consistency, and suggestions, to name a few – that target these objective aspects of the translation process.
One of these tools, known as “Wildebeest,” scans scripts and checks for surface level issues, such as misplaced or incorrect characters or punctuation. This is crucial, Mathew said, given that some of these scripts have specific guidelines in how they are written that are hard to capture or catch with the naked eye.
Another tool involves spell checking, but it isn’t as simple as you may think.
“When you enter a misspelled word plus a space in Microsoft Word or Google Docs, you see a red squiggly line underneath the word, you see the suggestion, fix it and move on. But with languages that are being written down for the first time, nobody really knows what’s correct or incorrect,” Mathew explained.
Since the correct spelling can be up to interpretation, the goal of the spell check tool is more to “aid the translators in at least flagging inconsistencies,” which could mean identifying an actual spelling error, or highlighting words that are strikingly similar to ones used multiple times in the past. The spell check tool also makes note of new inputs, which helps to improve the model’s suggestions down the line.
Quality control tools, which help make sure that spelling, terminology and script usage are consistent throughout a translation, are especially important because often there are multiple people working on different parts, books, or even chapters of the same Bible.
On top of that, there are many concepts and meanings referenced across chapters and books that must be communicated with precision and accuracy. Otherwise, the reader may not get an accurate image of what is actually being conveyed.
Yet another component of the Greek Room makes suggestions based on other biblical translations, which also works to boost speed.
“Maybe there are phrases or words that the computer can just learn and say, ‘this is probably what you want to say next,’ so you hit tab enter and it’s in, simple as that,” Mathew said.
A Bag of Tricks
Some languages are so niche that they lack ample data needed to develop a biblical translation. These so-called “ultra low-resource languages,” sometimes only have a couple of hundred sentences recorded, which makes it almost impossible to give meaningful suggestions or do quality checking.
Believe it or not, it can get even more niche than that. For some of these languages there is legitimately nothing written – the Bible may be the first major book ever translated.
This is in stark contrast to the situation of high-profile translations. When going from English to Chinese, Hermjakob said, there are millions or even billions of training examples where translation already exists.
To create these tools with so little data, the team had to create some magic. “We’re overcoming the scarcity of the data for these low resource languages with a bag of tricks,” he said.
One trick? The alignment visualization tool, which can cross reference the target language script with similar languages that already have Bible translations and identify if words are left out or pinpoint potential errors. Related languages tend to have similar words and phonetic similarities.
It’s kind of like solving a puzzle with only half the pieces, but with a completed reference photo printed on the box propped up nearby.
Chat Interface – A Translator’s Saving Grace
One future direction the team is working towards is trying to help translators answer questions using artificial intelligence. For example, if somebody comes across a concept in the Bible that is very complex or they do not have theological training to be able to understand exactly what it means, they will likely need to look it up.
The idea is to have a chat interface that can pull from resources and provide more clarification and detailed explanation much faster than if the human were to search through sources manually
“Rather than looking at five or 10 different places, what if there was a chat interface that could quickly and more accurately summarize the concept and give them a meaningful answer,” Mathew explained
Come One, Come All
Hermjakob and Mathew said they intend for the Greek Room to be open source – meaning available to translators everywhere.
“We want to make it so that other Bible translation efforts can use what we have built in for their own research as well, so one thing we decided early on is that we want to make our data and code public,” Hermjakob said.
Right now, the team is working on giving these tools a permanent place to live on an online platform, not as an end product, but as a way to give potential users space to explore the tools and see if they want to fuse them into their own software.
The team also hopes that in the near future parts of the Greek Room can be integrated into editing applications for spell checking and consistency checking. And though the Greek Room project focuses on Bible translation, its technology will benefit a wider range of books and applications in the future.
With the Greek Room, translators can be empowered with the right tools to bring the Bible to places it’s never been before – the ends of the earth might not be too far after all!
Published on June 19th, 2023
Last updated on June 21st, 2023