Trustworthy AI Isn’t Accurate—It’s Auditable
Why GenAI’s hallucinations are a feature, not a flaw, and how lawyers can use it without falling into the trap of false authority
GenAI can do extraordinary things for lawyers: generate cross-examination outlines, suggest legal strategies, reframe arguments, craft compelling case narratives, and even help organize discovery. But there’s a catch, and it’s a well known if not a minor one.
It hallucinates.
And not occasionally. Hallucination is a structural feature of how GenAI works. As the technology is adopted by more and more lawyers, from Big Law to solo criminal defense, the risks grow with each passing month.
This problem isn’t going away anytime soon. Therefore, we need to stop waiting for GenAI to become accurate, and start designing legal workflows that make accuracy optional, or at least easily auditable.
The Fallacy of Fixing Hallucinations
Damien Charlotin, who earned a PhD in law from the University of Cambridge, and is a research fellow at HEC Paris, is maintaining a running tally of federal cases involving hallucinated citations. As of July 2025, the number has surpassed 207 incidents.³ These aren’t just careless typos. They’re detailed, plausible-looking citations to cases that never existed, fabricated case quotes, and fallacious arguments, submitted in real court filings.
How is this happening?
Because GenAI doesn’t “know” anything. It’s not a database. It’s not a legal research engine. Large language models are built to identify patterns and to thereby predict the next most likely word. They are not built, or intended, to retrieve facts. They’re trained on enormous corpora of text, learning common word sequences and mimicking reasoning without any inherent understanding of meaning or truth.²
This is not to say that the topic of “thinking” or “understanding” is a settled question, partly because the current cognitive science tools aren't sufficient to answer these kinds of questions about large language models. But the consensus, according to researchers Melanie Mitchell and David Krakauer,1 is that:
While LLMs exhibit extraordinary formal linguistic competence, the ability to generate grammatically fluent, humanlike language, they still lack the conceptual understanding needed for humanlike functional language abilities, the ability to robustly understand and use language in the real world.
This “lack of conceptual understanding” is why GenAI can produce elegant, well-structured legal arguments that are utterly false. They’re not glitches. They’re the natural output of a system based on prediction rather than truth.
A Better Frame: When Accuracy Doesn’t Matter
In my recently published article for the National Association for Criminal Defense Lawyer’s Champion Magazine,⁴ I suggest a different paradigm: rather than trying to make GenAI more accurate, lawyers should focus on using it in ways that don’t require accuracy at all.
Here’s the key idea:
The more important it is to be right, the less useful GenAI becomes.
The more freedom you have to be creative, the more powerful it gets.
That’s why some of the most trustworthy uses of GenAI are also the most imaginative: brainstorming arguments, simulating opposing perspectives, exploring ways to structure a cross. If the output is wrong, you discard it. If its sparks ignite, you refine it. Either way, nothing explodes.
A Risk Matrix for Legal Use
To use GenAI responsibly, lawyers need to ask two questions:
How important is it that the answer is correct?
How easily can I verify it myself?
These two axes produce four categories of risk and reward:
GenAI Legal Use Matrix
Accuracy matters + Easy to verify
Use with caution. GenAI can save time, but you must confirm every output. Think: summarizing discovery you’ve already reviewed or drafting motions you’re qualified to revise.Accuracy matters + Hard to verify
Don’t use GenAI. This is where hallucinations do the most damage—when the stakes are high and you lack the expertise to catch errors.Accuracy optional + Easy to verify
Perfect for GenAI. Use it for brainstorming, outlining, narrative modeling, and other creative tasks where you can quickly sense what’s useful or not.Accuracy optional + Hard to verify
Use carefully. GenAI can still generate ideas, but you’ll need to bring your judgment. Don’t assume anything is reliable without a gut check.
Most serious hallucination risks live in the “Accuracy matters, hard to verify” quadrant, like when lawyers use GenAI to draft arguments in unfamiliar domains. That’s where trust breaks down. But when you stay in the right quadrants, GenAI becomes not just safe, but transformative.
What Trustworthy Use Actually Looks Like
In Rethinking Generative AI in Legal Practice: Toward a Trustworthy Paradigm, I outline three use patterns that mitigate hallucination risk and still deliver value:
1. Filtering
Use GenAI to sort through material you already have access to: discovery files, police reports, lab records. Ask it to extract, tag, or summarize, then verify against the source. Because the answer is grounded in your own materials, errors are easy to catch.
2. Navigation
Ask GenAI not to answer questions, but to direct you to where the answers live. Instead of saying “What does Rodriguez v. United States say?”, ask “Where in my brief bank is a motion that cites Rodriguez?” This keeps authority in your hands and makes hallucination irrelevant.
3. Ideation
Use GenAI as a brainstorming tool. Ask for five different ways to frame a defense. Or simulate how a skeptical juror might react. This is where GenAI shines, when you’re not asking for a definitive answer, but a spark of insight.
Explaining Your Use to Clients and Courts
It’s increasingly common to hear clients, or judges, ask: “Are you using ChatGPT for this?”
Here’s one way to answer:
“Yes, I use GenAI to help me think faster and more broadly. But I never rely on it for factual accuracy or legal citations unless I can independently verify them. I think of it like a whiteboard, not a law book.”
This kind of framing makes clear that you are the source of judgment. The AI is just scaffolding. Used this way, GenAI becomes more like a junior associate, one that’s fast, fluent, and occasionally delusional. You don’t trust it blindly. You supervise it carefully.
Hallucination-Resilient Lawyering
We need to stop asking, “Can I trust GenAI?” and start asking, “Can I check GenAI?”
Trustworthiness in legal practice isn’t about perfection, it’s about designing systems where errors are obvious and manageable. Hallucinations aren’t going away. But they don’t have to stop us from using this powerful technology, so long as we keep control over the output, and never outsource our judgment.
Next up: how to harness GenAI for powerful prompting strategies that push creative boundaries without relying on accuracy at all.
Want to know more about Patrick Barone’s criminal defense law firm and their mission to help people win back their lives? Visit our website: Barone Defense Firm.
References
Jiwei Li et al., A Systematic Study of Hallucination in Large Language Models, 2023 Transactions of the Ass’n for Comput. Linguistics, https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00595/113039
Anthropic, Understanding Hallucinations in Language Models, Anthropic Research Report (2023), https://arxiv.org/abs/2306.00520
Damien Charlotin, Tracking Hallucinations in Legal Filings, https://www.damiencharlotin.com/hallucinations/ (last visited June 9, 2025)
Patrick T. Barone, Rethinking Generative AI in Legal Practice: Toward a Trustworthy Paradigm, NACDL/The Champion (July 2025)
Melanie Mitchell & David C. Krakauer, The Debate over Understanding in AI’s Large Language Models, 120 Proc. Nat’l Acad. Sci. U.S. e2215907120 (2023), https://doi.org/10.1073/pnas.2215907120.
This is a great post arguing that AI tends to be good in creative modes and not so good when accuracy is paramount.
It's important that we develop principles for using AI tools. They've become a huge part of my daily work. With software like Excel, programmers added every feature consciously.
I might not know about all its features or know how to use them. But someone chose them.
In contrast, even AI creators don't know exactly how it arrives at a particular answer. So it's going to take time to learn when it's effective and when it's not.