E-commerce sites are in a constant struggle to prevent from drowning in an ever-growing tsunami of information about the products they sell.
How can they best keep a handle on all that information, and, on the consumer side, how can potential customers make sure they can quickly and accurately get what they’re looking for?
The answer is by creating an effective and efficient taxonomy, a fancy word for a system used to classify information in a tree-like structure – think of the classic example of a company’s organizational chart listing the bigwig at the top and the titles of the people below indicating who reports to who.
But when it comes to e-commerce, things can get complicated very quickly. New versions of products come out all the time, and people sometimes use different names to describe the same products.
The ongoing challenge for mega-retailers like Walmart and Target, as well as smaller companies, is to create and maintain a taxonomy that keeps their products organized in the most efficient way and makes it easy for website visitors to navigate and find what they want. The information the big players manage is staggering: According to Google, Amazon currently sells more than 350 million products (including third-party sellers) and Walmart currently sells 75 million.
USC Viterbi’s Mayank Kejriwal, a research assistant professor in the Daniel J. Epstein Department of Industrial and Systems Engineering and a research lead at the USC Information Sciences Institute, has created an algorithm he says allows e-commerce and other web-based companies to quickly and cheaply build a taxonomy that can be easily customized to their needs.
Think of the tool this way, he said: Imagine you’re a kid. Now imagine you’re given thousands of pieces of paper with an item written on each piece, such as “baby powder,” “Coca-Cola,” “PlayStation” and “T-shirt.”
Now, suppose you’re asked to build a “tree” out of these pieces of paper so that you could easily find any item when asked.
How long would it take you to build that tree?
“Our system does it in seconds,” Kejriwal said, “and our trees are of similar quality to any that you might be able to build.”
Such lightning-fast speed can save companies money, he added.
The AI tool, Kejriwal explained, has the potential to benefit large advertising companies like Google and media companies that have to “match” product categories to customers so they can get to the right websites, as well as aggregators like eBay and Pinterest where there are many independent third-party sellers. Even if such companies build a taxonomy manually, it would be constantly changing.
In addition, Mayank said, the tool also would work at the level of hashtags so it would be useful even to social media companies that make their money on advertising.
Research paper published
Kejriwal’s algorithmic tool is an example of how AI is being used to refine intelligence analysis. The tool is detailed in a paper, “Transfer-Based Taxonomy Induction over Concept Labels,” recently published in the Engineering Applications of Artificial Intelligence journal.
Research paper co-author Ke Shen, one of Kejriwal’s Ph.D. students, presented the paper in November 2021 at the Institute of Electrical Electronics Engineering/Association for Computing Machinery International Conference on Advances in Social Networks Analysis and Mining,
“A lot of artificial intelligence research out there is geared toward either classification problems like better face recognition or predicting the ‘sentiment’ of a movie review, for which many systems already exist,” Kejriwal said.
“But there’s less focus on research that directly affects enterprises and that can be used as a support tool designed with an actual domain and users in mind,” he added. “Our work aims to contribute to this space.”
Collaboration with industry
Kejriwal and Shen wrote the paper in collaboration with two researchers at Yahoo. They tested the algorithm they created, called Taxonomy Induction over Concept Labels (TICL), on three databases: Google Product Taxonomy, a publicly available list of thousands of product categories designed by Google to uniformly categorize products in a shopping feed; PriceGrabber, a “smart shopping” website for a broad category of products; and a taxonomy they created from Walmart’s website.
Result: the algorithm they created automatically built taxonomies from thousands of products labels, and the taxonomies were found to be comparable in quality to manually generated taxonomies. And the tool is able to drill down to specific products – say, Pampers’ Swaddlers Diapers Size 1 – 192 ct. (weight 8- 14 lb.) and place such a product right below “diapers” in the taxonomy.
“This tool shows how AI can be used as a support tool for a business both for decision making as well as organizing and visualizing information,” Kejriwal said. “In seconds, it creates a taxonomy tree that you can interact with and that you can tweak. This tool will allow small e-commerce companies to set up a taxonomy quickly and cheaply. And the big e-commerce companies can benefit from it by improving their taxonomies by detecting any blind spots.”
One industry insider called the USC ISI-created tool very promising.
“Graphs are continuing to find new and bigger applications in industry,” said Russell Jurney, CTO and co-founder of Deep Discovery Inc., which developed an AI system for fighting financial crime. “This work is an example of how AI can be used to build graphs from raw data and help organizations manage information overload. It has great potential for impact.”
Kejriwal said what makes the TICL algorithm different from others is that it uses AI that has learned from sources like Wikipedia and has a good understanding of language. Not only that, but it also understands modern words that have entered our lexicon, such as iPhone and app.
And because the algorithmic tool can be applied to any set of data, Kejriwal added, it has applications beyond e-commerce – for example, in the fields of medical diagnostics, human resources, and project management.
“Information is powerful, but only if we can organize it right,” Kejriwal said. “Systems like TICL do the drudgery of organizing our information for us so we can focus on creative and strategic tasks that are, frankly, more fun.”
Published on January 19th, 2022
Last updated on April 12th, 2022