Query, Key, Value: The Heartbeat of Search – And AI/LLM – Spaces

I’ve written about a lot in this blog – from the challenges of measuring AI/LLM search spaces, to research that exposes the non-classical behavior in these spaces, to potential measurement solutions through quantum mathematical machinery.
A few weeks ago I dug into some of the internal mechanisms of the transformer and attention function – and how the turn toward vector space models in search served as the seed to the emergent non-classical behavior we’re struggling to measure currently.
In that post, I mentioned how the construction and representation within both vanilla search and AI/LLM search share similar makeups – today I want to take an even closer look at the similarities between these two spaces and why we’re ultimately aiming at the same target through SEO.
These similarities also overlap with non-classical spaces (noted previously and important for future posts), but for this post I want to focus specifically on the overlaps between search & AI/LLM spaces.
Queries, Keys & Values In Search Spaces (Vector Space Retrieval)
Search spaces vary by platform, but at the very heart of what’s happening is retrieval – a query is made, a comparison measure between that query with a set of candidate keys (which represent some values) is made and finally a retrieval of the values associated with the keys that are most relevant to the query (through some decision making mechanism) – and response is presented to the user.
Modern search is much more sophisticated these days, but stepping back a bit, these are mostly the essential ingredients – a query, a representational key and a value.
In modern search, the queries, keys and associated values can be projected into vector spaces — numerical representations that take positions in those spaces.
A user enters a query, it gets encoded into a vector, that vector is then compared to the closest (or best matched) keys in the space and the payload (the actual links to documents, web pages, passages, et. al. – values – represented by those keys) is retrieved and returned to the user.
Again, the adjustments made before returning those results can get quite complex (very complex actually) depending on the platform but this can be the general process – retrieval and a decision making mechanism to present results.
As you can imagine, over time these spaces can shift as documents and other content become more (or less) relevant to certain queries (along with other tasks related to decision making mechanisms before presenting results), but the essence of Query, Key and Value processes remain at the core of these systems.
Queries, Keys & Values In LLM/AI Spaces (Contextual, Recursive Vector Space Retrieval)
Looking directly into the heart of the transformer & attention mechanism of AI/LLM spaces, you’ll find the familiar Query, Key and Value terminology – directly lifted from the retrieval world.
Although not directly the same, the primary purpose is: establishing a relevance relation between objects. The objects here are more granular, however.
Being careful here not to be confusing (admittedly this can be hard to follow the first time you see this), but consider the prompt:
“The fox jumped over the fence.”
As that is ingested by the AI/LLM space, it passes through the transformer and self-attention mechanism – through tokenization (assigning a single number/integer to each word) and cast into a high dimensional vector which represents some point in an embedding space. This process is called “encoding”.
So we have a high dimensional vector for each word – this is where we see the Query, Key and Value terminology return.
Each word vector is then projected into three new vectors: a Query vector, a Key vector and a Value vector (through pre-learned weights), yielding three new vectors for each input word vector.
The self-attention mechanism goes through each word vector in the sequence and compares its Query vector with every other Key vector (including its own Key – thus the “self” in “self-attention”) in the sequence to evaluate a relationship between those words.
Take the word “fox” – this is our Query.
This Query vector for “fox” is compared with the Key vectors of every other token in the space (including itself, once again).
The more related the “Key” and “Query” are, the higher the attention score. In this example the “jumped” Key would have a high attention score with the “fox” Query, since it describes what the fox is doing.
This attention score is massaged with some mathematical machinery and turned into attention weights, which are then worked into the original “Value” vector of each word vector, to form an updated “Value” for each word vector.
This new “Value” vector for “fox” then has contextual remnants of the rest of the sequence, with “jumped” adding a significant amount of contribution to this new “Value” vector in this particular context. Repeating this process for each “Query” and “Key” combination above creates a new binding – a non-local connection (across distance) between these words, now with contextual awareness for this particular sequence.
When presented with an input in a similar context, it can leverage this updated contextual awareness to retrieve a relevant output response.
This response is retrieved recursively in the decoding process — a recursive relevance function, more or less, token by token (word by word). Each new word goes through the Query, Key and Value process above, appended to the previous sequence of output words and is used to retrieve the next word. The process repeats until a full/complete response is complete.
Again, this is simplifying the process of most modern AI/LLM platforms, but the heartbeat of search remains in the Query, Key and Value process used within these systems (a meta contextual awareness embedding of search, if you will).
Contextuality & Fidelity
Last week I wrote about how SEO is about context management: intrinsic context (on-site representation) and extrinsic context (off-site representation), and how keeping them aligned properly (high fidelity) is vital for accurate representation within those spaces.
This fidelity through context/relevance management is important in both spaces, and helps ensure the content, web pages and other related items for a website, brand or other entity are both properly embedded (through the Query, Key and Value process above) within the right contexts so they can be presented in a response (output) when faced with similar contexts.
The challenge, then, turns to measuring how context changes over time for different users (observer-dependent outcomes) during a query/prompt session — and how those contextualized spaces change and influence the responses at any given moment.
This challenge represents the core of the measurement issues I write about in this blog – it’s not unsolvable, but I believe it requires a new, non-classical approach to be done properly.
Regardless of the measurement issues, you should be able to now see why core/foundational SEO is even more valuable, with the heartbeat of search still very much beating within these “new” spaces, so to speak.
More notes next week.



