Non-Determinism: Why it is a Feature, not a Bug, in LLMs. Plus, what Thinking Machines Lab’s quest for consistent output mean for Responsible AI.

If you have ever asked an LLM the same question twice and received different answers, you have experienced non-determinism. While this might seem like a bug, it is actually a fundamental characteristic of these powerful models. To understand why, let’s contrast it with a more traditional, deterministic system.

Determinism vs. Non-Determinism: A Personal Story

As a grad student, I spent a lot of time with finite state machines (FSMs). FSMs are computational models that can be in exactly one of a finite number of states at any given time. It transitions from one state to another based on specific inputs. đź’ˇ The light switch is a perfect example of a deterministic finite state machine. It’s “deterministic” because what it does is always 100% predictable. It’s a “finite state machine” because it has only a few, specific “states” it can be in.

Here’s how it works:

States: The light switch has only two states: On and Off.
Inputs: The only input is you flipping the switch.

The key is that an FSM is deterministic if given the same state and input, it always produces the same output. There is no room for variation! Some real world problems, especially in human language, although a bit more complicated than the light switch example, follow similar patterns. My graduate school project was to build an FSM to model phonotactics, the rules governing which sounds can appear together in a language. As an illustrative example of phonotactics rule in Nepali, we tend to avoid word initial consonant clusters (https://www.ling.upenn.edu/Events/PLC/PLC35/abstracts/5b_Koirala.pdf). I was able to use Probabilistic Deterministic FSMs for modeling this aspect of human language.

However, other aspects of human language are more complex. Consider this sentence as an example: “Put the book on the table with the pencil.”

This sentence is a little bit of a puzzle. It has two possible meanings:

Meaning 1: Use the pencil to help you put the book on the table. (This sounds silly, right?)
Meaning 2: Put the book on the table where the pencil is already sitting. (This makes a lot more sense!)

A deterministic system might get confused by the word “with” because it might not know which meaning is the right one.

A non-deterministic system knows both interpretations exist, and if designed correctly, it can guess that in this situation when people say “with the pencil,” they likely mean a location, not a tool.

Okay, back to LLMs. LLMs generate text based on probabilities. When an LLM predicts the next word in a sentence, it considers a wide range of possibilities, each with a different likelihood. The model doesn’t just pick the most probable word every time; it samples from the distribution of likely words. This is where non-determinism comes in. By introducing a degree of randomness, an LLM can produce diverse, creative, and sometimes surprising outputs for the same prompt.

Why Non-Determinism is a Feature of LLMs

This variability is not a flaw; it is a feature that makes LLMs so versatile. Think about it:

Creativity: For tasks like writing stories, poems, or marketing copy, you don’t want a static, predictable output. Non-determinism allows the model to explore different ideas and styles, which is essential for creative work.

Avoiding Repetition: If an LLM were perfectly deterministic, it would likely get stuck in repetitive loops or generate the same uninspired text over and over again. Non-determinism helps it break out of these ruts.

Adaptability: It allows the model to produce different outputs for the same prompt, which can be useful when you are iterating on a task and looking for new angles or perspectives.

The Quest for Determinism

However, for some applications, this variability is a significant problem. I worked in AI Governance for a healthcare organization for past 3 years, and non-determinism was undoubtedly one of the biggest bottlenecks for LLM adoption beyond pilots.

A new paper from Mira Murati’s Thinking Machines Lab (https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/) claims to have defeated non-determinism in LLMs, at least on a technical level. They argue that the source of non-determinism isn’t just the random sampling we talk about, but a deeper issue with how GPU kernels handle parallel processing in different batch sizes . By re-engineering the kernels to be batch-invariant, they can ensure that the same input always produces the same output.

This is a great feat because it opens up the door for using LLMs in more restricted and critical domains, such as healthcare. For example, a deterministic LLM could be used to:

Automate clinical documentation where consistency and accuracy are paramount.

Generate patient summaries that are always the same for a given set of medical records, which is essential for legal and auditing purposes.

Assist in diagnostic support where consistent outputs are necessary to build trust and ensure safety.

Why Rigorous Testing is Still Critical

The work by Thinking Machines is a crucial step toward giving us the best of both worlds, but a deterministic LLM is not a silver bullet. While the output may be consistent, the model can still be prone to hallucinations—generating information that is factually incorrect. But testing a model for safety and reliability goes far beyond just checking for hallucinations. Here is why rigorous testing is still non-negotiable, even with a deterministic model:

Accuracy: Consistency doesn’t equal correctness. A deterministic model might consistently provide an inaccurate diagnosis based on flawed training data. Testing must verify that the models outputs align with established facts and expert knowledge. In healthcare, this means validating the LLMs outputs against medical guidelines and professional review.

Bias: The black box problem and the risk of bias don’t disappear with determinism. A deterministic LLM can consistently produce outputs that are biased against a particular gender, race, or socioeconomic group. This happens when the training data is not representative or reflects societal prejudices. Testing must include auditing the model for fairness and ensuring it doesn’t perpetuate or amplify existing biases.

Security: Even a deterministic model can be vulnerable to security risks. A malicious actor could exploit the system to inject harmful content or manipulate its outputs. In a healthcare setting, a security breach could expose sensitive patient data or compromise the integrity of the diagnostic tool. Testing must include robust security audits to protect against these threats.

Governance and Oversight: The advent of deterministic LLMs doesn’t eliminate the need for clear AI governance. We still need policies and procedures that define who is responsible for the model, how it will be monitored, and what happens in the event of a failure. These frameworks, as discussed in my previous blog post, are what ensure accountability in the long run.

In the end, it seems we need both. Non-determinism is valuable for creative tasks, but determinism is essential for high-stakes, consistent applications. The work by Thinking Machines is a crucial step, but it only solves one part of the puzzle. The true promise of AI can only be realized when we combine these technical advances with a steadfast commitment to rigorous testing, ethical governance, and a proactive approach to safety and security.

Tags

governance LLM