Demo - Shopify Product Understanding
TL;DR
Between July 2021 and May 2023 I was leading the Product Understanding team in Shopify. The team was focused on creating standardized data assets across all Shopify merchants in order to power various product features across the company. Two of them are Shop App and Shop AI.
You can play around with category navigation, product search and personalization in Shop.App. These features are powered by data assets extracted using ML models from product images titles and descriptions.
You can also chat with the Shop AI shopping assistant, which uses the same assets to search for products, provide drill-downs, and discuss their properties. Check it out at Shop.AI.
Product Understanding at Shopify
The Problem
Have you ever wondered what happens when you type $\sqrt{783225}$ in a handheld calculator? I mean, what marvelous algorithm runs this complicated task with so few resources and gives you the exact result in milliseconds? The answer is binary search, but that’s not the point. The point is that the best engineering projects are like this - they just work, and you never notice it until they break.
AI generated image for “binary search on abacus” Online product discovery is similar. You never notice a good search engine until it starts returning bad results, and you never question category navigation until yellow shoes appear in the “white sneakers” category.
While Amazon requires GTIN IDs of the products 1 and Google Shopping requires using Google Taxonomy2, Shopify’s philosophy aims to reduce the complexity of setting up an online store, so anything not crucial to selling the product is optional. In addition, Shopify merchants are not bound by any taxonomy or guidelines, so they are free to use any taxonomy and product attributes that work for them.
With the lack of restrictions, it’s easy to imagine how a merchant can tag their yellow shoes with white embellishment with both “yellow” and “white” tags to improve SEO. This is fine for the online store, but when creating a cross-merchant experience like Shop.App, yellow shoes can easily end up in the white sneakers section or appear in search results for a “white sneakers” query.
To tackle this problem, Shopify needed a layer of ML-powered software that processes all unstructured and semi-structured data on products and maintains an up to date snapshot of calculated product metadata in a structured form with standardized language across the organization, This software was built and maintained by the Product Understanding team.
Underlying Data Assets
The types of product metadata that we focused on include product categories, product attributes, and product embeddings.
Product categories refer to a hierarchical category tree, similar to Google’s taxonomy, that describes what the product is. An example of a product category from Google’s taxonomy would be Business & Industrial > Food Service > Hot Dog Rollers
.
A typical top has around 15 attributes calculated Product attributes, on the other hand, describe everything the product has. These are key-value tags, such as Sleeve Length: Short
. The attributes can apply to multiple types of products; in this example, it applies equally to tops, dresses, and jumpsuits.
Product embeddings are used to represent all other properties of the product that are not represented in the categories and attribute knowledge graph. This is because some properties are vague, such as “summer dress” or “boyfriend jeans,” while others cannot be defined at all, such as “has a similar vibe to this product.”
While I cannot disclose how these models were trained, I can mention that, unlike the approach taken at Donde Search, which focused mostly on automating supervised learning techniques and speeding up the manual annotation process, at Shopify we leveraged different NLP methods, zero-shot and few-shot models, to minimize the dependency on manual annotation.
The team has created dozens of models that have been deployed to Shopify’s production environment. These models are running in a data pipeline on billions of products daily. The pipeline runs efficiently, only predicting on individual images/texts that require an update. It can operate optimally given all possible edge cases, such as a single image being changed on a product, a single text being changed on a product, or a new model being released and applying to a subset of the products.
Conclusion
The Product Understanding data assets are utilized by numerous teams and contribute to many Shopify products. However, I chose to focus on two products that use these assets in their rawest form.
You can experiment with category navigation on Shop.App or search for a specific type or attribute. But be aware that the relevancy of the search results is not accidental; the team spent a lot of time perfecting them!
Alternatively, you can try chatting with the Shop AI assistant to learn about the available products. You will quickly notice just how deep its knowledge of the products is.