Defining Surfaces: A Foundation For Rigor In AI/LLM Search Spaces

Surfaces.
If you’ve been following along with posts on this blog, you’ve likely seen me mention the word “surface” or “surfaces” quite a bit.
More than likely you probably interpreted that word a few different, logical ways – a search result that “surfaces” given a query, a response that gets “surfaced” through a particular prompt or perhaps you translated that as the surface of your device or the app that you use these platforms on.
While those are all valid and relevant – and ultimately correct – interpretations – if you dig deeper into the heart of these AI/LLM search spaces, there’s actually a deeper meaning that’s connected to the internal machinery that allows us to ground our understanding of them with more mathematical rigor.
Last week I mentioned some of the challenges these AI/LLM search spaces present – ultimately leading to the need for a deeper understanding of users to really understand how content, web pages, websites and other entities get included in user-dependent responses (responses that change depending on the context of when they were retrieved/generated).
While this is our job in the search world (ultimately digital marketing), it doesn’t mean we can take out our microscope to examine the shared representational space that is search/AI/LLMs to add a bit more rigor to SEO and digital marketing in general.
These search/AI/LLM spaces are getting more sophisticated, but fundamentally they all work with the same foundational mathematical ingredients: vectors, matrices, tensors, embeddings and manifolds.
Thanks to the world of Mathematics, these terms already come with well-known and trusted, built-in rigor. Mathematics are the brand guidelines to almost every discipline on the planet, you’ll find – shadow guardrails that guide the science, engineering, accounting and governance of our world.
Walking our SEO practices through the AI/LLM search spaces via the Mathematics world allows us to leverage this built-in, shadow rigor giving us everything we need to sharpen our vision on what matters most to being “surfaced” in the right places, at the right times – and for the right set of people.
The Foundation: Vectors, Matrices & Tensors
I’ve written about vectors here often — these are simply points in space represented by an ordered sequence of numbers.
Take the ordered set: [3 5] on the x-y plane.
This represents a point in space that is 3 positive units on the x-axis, and 5 units on the y-axis. Drawing a line from the origin [0 0] to this point allows us to visually see where this point in space is “pointed”.
Adding another dimension, z, [3 5 4] for example with 4 being this new coordinate along the z-axis.
Our world is limited to these three coordinates (at least what we can visualize), but vectors can scale up to any number of coordinates – [x y z … n], for example.
Matrices are the next object to understand — these are just a collection of vectors:
|3 5 4|
|2 1 7|
|5 9 8|
This represents a 3×3 matrix – 3 rows and 3 columns. These, too, can scale up to any number of rows and columns.
Next up are tensors – which you can roughly think of as a “matrix of matrices” – a matrix made up of matrices.
|[0 1] [3 4] [2 8]|
|[3 7] [9 1] [3 5]|
|[9 5] [7 2] [2 6]|
Every vector and matrix is also a tensor: vectors are 1-tensors, matrices are 2-tensors and a single number (scalar) is a 0 tensor.
If a vector is a list, a matrix is a list of lists, and a tensor is a list, of lists – of lists, so to speak.
The Bridge: Embeddings
In the search/AI/LLM spaces, embeddings and vectors are sometimes used interchangeably and sometimes together (embedding vector, etc.) but the deeper meaning of the word “embedding” comes from the world of topology (topology will likely be another post altogether, but this a very fun branch of the Mathematics world).
The strict, technical mathematical definition of an embedding is a bit more complicated and needs additional background knowledge, but the essence is that a mapping takes place from points in one space to another.
The phrase “A cheetah sprints at opportune times” can be broken down into individual tokens:
[“A”, “cheetah”, “sprints”, “at”, “opportune”, “times”]
As this phrase is ingested by the search/AI/LLM spaces, each word is converted to an integer (number) in its vocabulary associated with each word (this tokenization process has a lot of issues which I’ll write about one day, but this is a very simple breakdown).
After tokenization, the phrase becomes (using arbitrary numbers):
[59 1026 99 46 333 64]
It then goes through an embedding process – a mapping from the raw tokens into a high dimensional vectors. Each token is represented by a high dimensional vector (that is pre-learned and retrieved in the embedding process) in this raw embedding space.
As mentioned in a previous post, these high dimensional vectors are then fed into the transformer and attention mechanism, which transforms the raw text embeddings into contextualized embeddings ( this is where matrix and tensors come into play) – each word embedding is updated with new representative embedding which includes a contextual understanding of the other words in the sequence; a new, contextual mapping, so to speak.
Collectively, the set of contextualized embeddings create a learned embedding surface – this learned embedding surface exists as something known as a manifold (the raw embeddings are also manifolds, if you look closely).
Side note: there is some disagreement in the search/AI/LLM world whether these are truly manifolds — a few years ago I actually felt like there was something off there – which turned out be true to a degree, but for the purposes of this post we’ll maintain the manifold hypothesis. As with many things in the science/engineering world, the pure mathematics gets fudged – but I digress.
The Surface: Manifolds
Again, the finer details of manifolds likely need a bit more background (with topology mentioned above), but they are essentially a generalization of a curve, surface or volume into a much higher, arbitrary dimension.
Globally it’s hard for us to imagine what a manifold looks like due to its high dimensionality, but if you look at each individual point on a manifold it looks very familiar — locally it looks just like some ordinary vector space.
Imagine a piece of rubber stretched over a sphere (a 2-manifold, technically). If you take a point on that sphere and cut out a piece of the rubber, it can be flattened into an ordinary, flat plane without twisting or tearing it (this is why topology is often called “rubber sheet geometry”).
Circling back to the learned embedding space or manifold, it’s not as “pretty” as a sphere – it exists in a much higher dimension, but locally it looks just like normal vector space we’re used to and can visualize in 3 dimensions (with some additional machinery).
This local characteristic means that the contextualized embeddings — our phrase above – exist close together on that surface.
Should a search/AI/LLM space be presented with an input that is similar to the phrase above “when does a cheetah sprint?” – it can (more or less) retrieve the answer from the location on the learned manifold “a cheetah sprints at opportune moments”.
As you can imagine the shape of this manifold surface is not static — it changes with context and many other factors it recognizes when retrieving tokens for a response. This is the beginning of the challenges in measurement come from — understanding how this surface changes given a certain context – and what the response spectrum looks like.
Imagine Your Web Pages And Other Content Represented On These Surfaces
When you consider your digital presence – your digital content, your web pages, entities – anything you add to the digital space, when it get digested by most of these AI/LLM search spaces, you can think of them as eventually mapped into some learned embedding manifold – surface (or many, depending on context).
Your digital presence representation on these surfaces is not in isolation, either — it’s also learned collectively with others, which means what is written, mentioned, cited about you, your website and business becomes equally important to your representation on these surfaces.
This intrinsic (on-site) and extrinsic (off-site) context – when managed properly (and improved over time through SEO), creates a high fidelity representation on this learned embedding manifold.
Clear, sharp representation can make it easier to be remembered/retrieved from that learned manifold in important moments when these AI/LLM search spaces are presented with queries/prompts in different, relevant contexts.
Many, many factors can go into the final response/results from these spaces depending on platform or model (I’ll touch on Retrieval Augmented Generation soon), but the foundation mapped out above should suffice for a baseline mathematical representation in most cases.
Final Takeaway
Surfaces.
The next time you see this word in the search/AI/LLM media or used by different platforms, I hope you’ll now with a little attention, you’ll map it to this deeper, contextualized meaning. And, with some of these already well-known Mathematical elements, we can dig even deeper into the “shadow rigor” that already exists.
Much more in the weeks to come (including a circle back around to measurements, contextuality and non-classical behavior – and associated mathematical machinery).

