productreleaseannouncements

Introducing Kimchi 1

Nick Wilkinson
0:00 / 0:00

Today we’re rolling out Kimchi 1, our latest and most capable model, across Solenya’s search and recommendations API.

Over the past few months, we've been working on something special at Solenya. Until now, the brains behind our zero-shot hyper-personalised discovery engine have been our Dill series of models. This was the first series of models we ever launched, and whilst very capable, they had a number of limitations.

The new Kimchi model has roughly 10x as many parameters as Dill, allowing for much better knowledge representation capacity. Whilst Dill 1.5 was an impressive achievement considering the model size, we were reaching a performance ceiling. With Kimchi 1, we have not only already surpassed our previous best, but we believe there are a number of performance gains still to come using this architecture and some new training methods.

Evaluations

Average relevance score (%) of Kimchi 1, the Dill series models, and some other search providers and algorithms.
Average relevance score (%) of Kimchi 1, the Dill series models, and some other search providers and algorithms.

We present our performance against two primary metrics, average relevance score (as a percent) and normalised discounted cumulative gain (NDCG). Together these metrics paint a picture of the retrieval and ranking performance of the model's product discovery abilities.

Average relevance is a retrieval metric. Using an LLM-as-a-judge with a carefully constructed rubric for scoring the relevance of each product, we score each item in a list of products suggested by the model. We then take the average of how relevant the returned results were for a given query.

NDCG is a standard metric for ranking performance. A score of 1 indicates the items are in perfect ranked order, and a score of 0 indicates none of the items were in the correct order.

Below is a table of results for our Solenya models, and some other search providers and algorithms.

Kimchi 1 and Dill performance compared against Shopify, BM25, and Algolia.

Kimchi 1Dill 1.5Dill 1Shopify
Default
BM25Algolia
Default
Avg Relevance94.0392.7591.3789.3088.6987.06
NDCG@100.91950.91350.90840.89860.85310.8306

What Customers Will Notice?

The metrics show how Kimchi 1 is a step above the previous best, but what do these improvements look like in practice? We've found improvements in two main areas: better use of visual information, and much stronger semantic understanding.

Visual Grounding

A comparison of Dill 1.5 (left) and Kimchi 1 (right) for the search term 'dark blue jeans', demonstrating visual capability, with failure cases marked in red.
A comparison of Dill 1.5 (left) and Kimchi 1 (right) for the search term 'dark blue jeans', demonstrating visual capability, with failure cases marked in red.

We see in the above example some clear improvements over the previous model. Of particular interest is Kimchi's ability to resolve conflicts between product text and product imagery, something multimodal models often struggle with. The sixth Dill 1.5 result (row 2, column 2) is a good example of this failure case: the product is labelled “dark blue,” but visually reads closer to grey. The seventh result of Kimchi 1 (row 2, column 3) shows an example of the model succeeding in this task. The text doesn't match the search directly, but the visual is a strong match, and so Kimchi 1 still ranks it above products with a stronger text match. This kind of image-text reconciliation is an area where Kimchi 1 consistently outperforms Dill 1.5.

Stronger Semantic Understanding

A comparison of Dill 1.5 (left) and Kimchi 1 (right) for the search term 'clothes for a man to wear to a summer wedding', demonstrating semantic capability, with failure cases marked in red.
A comparison of Dill 1.5 (left) and Kimchi 1 (right) for the search term 'clothes for a man to wear to a summer wedding', demonstrating semantic capability, with failure cases marked in red.

Here we look at an example with a more descriptive semantic query: "clothes for a man to wear to a summer wedding". Whilst Dill 1.5 is a capable semantic model, we see that for longer, more nuanced queries it struggles. The hard failure cases are marked in red, however, there are a number of other results that don't match the brief well.

Kimchi 1, on the other hand, has a much stronger understanding. This is particularly useful when combined with an agentic harness. As part of ongoing work, we're developing an MCP for discovery, and we've seen that Kimchi 1 allows agents to be more precise in their queries against a catalogue, yielding a better overall experience.

Kimchi 1 is now generally available for all Solenya customers. We're excited to share this next step in performance, and can't wait for the future performance improvements that will be unlocked by this model series.