Notes on building trustworthy AI.
Field notes from the practitioners building AI governance for regulated enterprises. Long-form writing on accountability, evaluation, and the workflow that makes responsible AI possible.
RAG Eval Sets: The Secret to a Reliable AI Product
Imagine an AI system advising a doctor on treatment protocols or summarizing a patient’s medical history. A single hallucinated fact could be disastrous. This is why Retrieval-Augmented Generation (RAG) is essential: it anchors the LLM to a verified knowledge source (like internal clinical guidelines or legal documents), ensuring the answer is accurate, up-to-date, and fully […]
Read the full postNon-Determinism: Why it is a Feature, not a Bug, in LLMs. Plus, what Thinking Machines Lab’s quest for consistent output mean for Responsible AI.
If you have ever asked an LLM the same question twice and received different answers, you have experienced non-determinism. While this might seem like a bug, it is actually a fundamental characteristic of these powerful models. To understand why, let’s contrast it with a more traditional, deterministic system. Determinism vs. Non-Determinism: A Personal Story As […]
Read the full postHumanizing AI.
Conversations with the academics, authors, and practitioners shaping how AI gets governed. Hosted by Cesar Koirala.
