Composed Item Retrieval: A Whole New World
Have you ever been shopping and found something you like, but it’s not quite right? Perhaps you want a slightly different cut, a different colour, or maybe you’d prefer a similar item from a different brand. Or maybe you even like the what you’ve found, and you need something to go along with it.
If you are shopping in a physical store, you might ask an assistant to help with these types of queries, but if you are doing your shopping online you’d have to resort to browsing about in the hopes that you may get lucky enough to stumble upon what you’re after. What if there was a way to query online stores using a reference item, the same way you might in store?
What is Composed Retrieval?
In our last couple blogs we’ve spoken about the idea that e-commerce is naturally a multimodal environment. Products are represented by images, text descriptions, reviews, and more. Once we start to model e-commerce in this way, it opens up a lot of exciting possibilities for new types of shopping interfaces. One task in multimodal machine learning that we’ve mentioned in previous blogs is Composed Image Retrieval (CIR). CIR is the task of using a reference image together with a text modification to retrieve a target image. For example, you might have an image of a blue car, and you ask the model to find an image of “the same car but in red”.
Following from this, we’ve been experimenting with the idea of Composed Item Retrieval, whereby instead of operating on images we operate in the multimodal product space. This means that instead of applying text modifiers to images, we apply them to products. Similarly, instead of retrieving images, we retrieve actual products from a catalogue. This is a subtle but important distinction, as in e-commerce we ultimately care about finding products that users can actually purchase.
Composed Retrieval in Action
Our early experiments with Composed Item Retrieval have been very promising. The rest of this blog will showcase some of the exciting use cases we’ve been exploring. All screenshots shown are from a prototype e-commerce site we’ve built to demonstrate the concept, but are real results from our model searching over actual example catalogues.
1. Finding Variations of a Product
The most basic use case for this is the same as the one I described above, asking for a product similar to a reference product but with some modifications.
For example we can ask for similar products in a different colour eg. “this but in black”:

We can ask for similar products but from another brand eg. “this but Nike”:

Or we can ask for “alternative kits” of a football jersey and the model finds previous seasons’ jerseys, different cuts, and the away and warm-up kits:

2. Finding Complementary Products
A more interesting application is to use a found product as the starting point to look for complementary products.
For example, say we’ve found a nice formal blazer, we can use this as a reference item to look for “matching pants”:

Or simply “shoes” to find footwear that goes well with the original item:

We can also ask the model for “pants like this” to transfer the style of the reference item to find products that match the look and feel of the original item:

3. Showing Off
Finally, because of the semantic understanding abilities of our multimodal model, we can get quite creative with our queries, including my personal favourite “South African version” when on the product page of a generic rugby jersey:

What’s really cool about this one is that the Springbok jerseys don’t have the actual word “South Africa” anywhere in their product descriptions or metadata, and so not only is the model doing style transfer, it’s also associating the concept of South Africa with the Springbok team. Neat!
What’s next?
Whilst this feature still needs some time to polish, these early results are looking very promising and highlight the potential of Composed Item Retrieval to transform the online shopping experience. Furthermore, because of our multimodal modelling approach to e-commerce, we didn’t need to change much to get this working. It was a very natural extension of the existing capabilities of our model. What is really exciting about that, is it demonstrated the power of the approach, and its potential to unlock many more new and exciting shopping experiences in the future!
