Breaking Bell: Revealing AI Search Visibility’s (Classical) Limitations
“If I have seen further, it is by standing on the shoulders of giants.”
Last week’s post was many years in the making.
In that post I mentioned how I, over the course of my career in search, have been quietly watching things evolve – and how my original intended pathway in Mathematics and Particle Physics have allowed me to, perhaps, see that evolution from a slightly different perspective.
In particular, the emergence of LLMs and AI from the search industry and the eventual shift from deterministic responses (lists of websites and other search features) to probabilistic responses – ones that are ultimately shaped by not only a query or prompt, but also the surrounding context around that query or prompt – and the user behind that query or prompt.
This shift from deterministic to probabilistic responses has caused quite a stir in the industry – from folks trying to rename the search industry (SEO) to every acronym under the sun (AEO, AISEO, GEO, et. al.) to new names for old search practices (I’ll go more into this in future posts).
Confusion, chaos – and uncertainty.
However, the source of this uncertainty isn’t with any of our practices – as mentioned above – those are still useful and very relevant to this day (and for the foreseeable future).
The root cause of the uncertainty really lies in our ability to accurately measure these new systems defined by probabilistic outputs.
Our Current Situation In Search
If you’re reading this in 2026, you know all of the research and difficulties with getting reliably consistent responses from these systems from leaders in our field.
The same queries or prompts by two different users can yield dramatically different responses from these systems.
Truth be told, this type of behavior was known to occur before the roll out of LLMs/AI responses — anyone that’s worked in local search can tell you that, but now this behavior is amplified by the introduction of LLM responses, leading to even more dramatic variance in output/responses.
Ultimately, these responses are being shaped not just by the words that are used in these surfaces (prompts or queries), but also by context surrounding those words – user-dependent features that are either explicitly known or indirectly inferred by these surfaces.
Very small changes in either those words – or surrounding context – can lead to dramatically different results.
Results are shaped not just by the words that initiated them (queries or prompts), but also the observers of those results – and the context associated to those observers.
Measurement of these results is now an act of creation – not just recording – something that is missed by many folks in our industry.
Observer-dependent outcomes – and measurement that creates on top of recording – while incredibly frustrating in the search world, are nothing new to the field of quantum physics.
Treading Carefully (And Thoughtfully)
This search-quantum field connection is something that I’ve been following now for quite some time (one of my earliest intuitions of my career), but it’s not an observation one can wield without adding more confusion to an already somewhat confusing field (at least to some outside of search).
Quantum mechanics has been around for a very long time and one of its defining traits is that even the most studied, experienced folks in the field often struggle with interpretations of it.
I won’t go into a history lesson here (others have done that better than I ever could), but one of the most famous quotes – attributed to Richard Feynman is:
“I think I can safely say that nobody understands quantum mechanics.”
Richard Feynman, by the way, was one of the most prolific theoretical physicists of the 20th century.
Humility, it seems, is the best attribute one can have if you want to wield the concepts of the quantum world.
That said, I don’t pretend to be a quantum physicist (I’m not), nor do I have extensive lab experience (I don’t) – I’m simply connecting dots from a perspective of someone that has studied the quantum world (in college) and spent nearly two decades in search.
For the longest time I thought I was alone in making this connection.
Even up until last year I felt like the timing wasn’t right – adding more to the chaos that these new interfaces created didn’t seem right (when only measurement was changing, not our core fundamentals).
That’s when I came across some folks that had parallel interpretations and intuitions. I wasn’t alone – there was, as it turns out – dozens others following along with me.
The Bell Test
In 2025, a research paper called “A Quantum Semantic Framework for Natural Language Processing” was published and, perhaps fatefully, came into my view earlier this year.
I want to be sure the authors of this paper get full credit for this work as I believe it to be the basis of a whole new school of thought when it comes to interpreting these new spaces – and ultimately measuring them.
It would not do them justice by explaining the finer details of their experiment here, but the premise is simple: the belief that these systems exhibit behavior that more closely resembles non-classical frameworks – in other words, they more closely resemble a quantum system.
Quantum theory provides mathematical machinery better equipped to explain these observer-dependent (contextuality) outcomes, now so prevalent with these new surfaces, essentially. They go on to try to prove this by conducting experiments across several different frontier models (Gemini, Claude, DeepSeek and OpenAI) with something they call a “Semantic Bell test”.
A Bell test (or Bell inequality test), in short, can help verify the system you’re working on is, indeed a quantum system.
The “Semantic” in Semantic Bell test measures how these systems interpret meanings to words under certain experimental conditions. These words hold many different meanings, so the experiment generates data on which meaning is resolved under certain conditions.
If the experimental results of the Bell inequality test exceed (violate) a predefined boundary – it can certify the system as quantum, essentially.
The Results
I wasn’t surprised by the results the first time I read through this paper, to be frank.
It was that thing that was bothering me for years – that there was much more to these search spaces (and transitively LLM & AI spaces) than meets the eye.
The experimental results of this research seemingly have confirmed that intuition: they frequently and significantly violated the Bell test – essentially confirming these new spaces as more closely resembling quantum systems.
What This Means For (Properly) Measuring Our New World
Admittedly, there’s a lot to take in above.
It was part of the reason that I struggled so much with posting these notes — quantum mechanics and related concepts are, by definition, really difficult to grasp.
I won’t be imploring you or anyone in search to become quantum physics experts — there’s really no need (again, I’ll go into why our practices aren’t changing in next week’s post).
If you come away with one thing from these notes, it’s this: the one-off snapshots that many mass tracking or measuring tools offer in our world are no longer capable of giving us the full picture of what’s going on in these spaces.
This isn’t a new insight — literally everyone in the search space agrees – measurement reliability has declined and needs more attention.
What I hope to achieve above is a pathway for understanding the behavior of these spaces, so that we can get a reliable handle on measurement. Throwing a bunch of prompts or queries into these platforms won’t cut it anymore – it’s classical thinking in a non-classical environment. (I also have quite a few notes on why these are behaving like they do, but that’s for another day)
Raw prompts and queries only tell you the partial truth from the generic perspective of whatever tool you happen to be using.
Much like these new surfaces, the correct answers will appear only when the right questions are asked.
And next week I’ll start asking the real questions we need to ask ourselves in this world – and why our search practices are even more valuable in this new era of AI search.




