Why AI Gets Forensic Evidence Wrong Even When the Answer Sounds Right

Competence, reflective prompting, and the hidden risk of confident output in criminal defense

Feb 10, 2026

I had an interesting experience recently while using generative AI for a non‑legal task, on a subject in which I am not particularly well versed. I started the way I often do, with a familiar prompting structure and clear instructions. The initial response was serviceable, so I continued refining the prompt. Yet no matter how I adjusted it, the output never quite aligned with what I was trying to accomplish.

Eventually, frustration set in and I abandoned the exercise. If you have spent any time working with AI, that experience will likely feel familiar.

After some reflection, the problem became obvious. I had violated the cardinal rule of trustworthy AI use: only use AI within your area of competence. That does not mean AI is unusable when the subject matter is unfamiliar. It does mean that the inquiry must be structured differently, with greater attention to how conclusions are reached rather than how quickly they appear.

That distinction matters most in criminal defense practice, where premature conclusions do not merely mislead but can quietly import prosecution‑friendly assumptions into advocacy and analysis. Nowhere is this risk more apparent than in cases involving forensic evidence, and blood alcohol testing provides a clear example.

Why Confident AI Answers Are the Riskiest Ones

A great deal of recent commentary has focused on getting AI systems to “think about their thinking,” prompting models to reflect on their own reasoning rather than racing to an answer. For lawyers, the value of this idea is not philosophical. It is practical. Properly constrained, AI can be forced to follow the same analytical hierarchy a competent defense expert would use, rather than collapsing complex forensic questions into a single, deceptively simple conclusion.

A Toxicologist Does Not Start With Impairment

When lawyers ask AI whether a reported BAC supports the UBAL theory of DUI, the model is almost guaranteed to respond. It will reference statutory thresholds, general correlations between BAC and impairment, and often jury‑friendly language. What it will not do, unless constrained, is stop to ask whether the number itself deserves trust.

A great deal of recent commentary has focused on getting AI systems to “think about their thinking,” prompting models to reflect on their own reasoning rather than racing to an answer.

For lawyers, the value of this idea is not philosophical. It is practical. Reflective prompting can be used to impose discipline on AI output, forcing the model to follow the same analytical hierarchy a competent defense expert would use, rather than collapsing complex forensic questions into a single, deceptively simple conclusion.

A Toxicologist Does Not Start With Impairment

What it will not do, unless constrained, is stop to ask whether the number itself deserves trust. This is because appropriate guardrails were not imposed. Said differently, the prompt was not specific enough to create the desired, or even useful, output.

Think about it this way? Does a toxicologist begin with the question of impairment? Maybe if they’re a State expert, but a competent defense expert begins by asking whether the test result is scientifically credible at all. That inquiry unfolds in layers, starting long before a chromatogram is ever produced.

Reflective prompting allows the lawyer to impose that same structure on the AI.

What Reflective Prompting Changes

A conventional prompt might ask whether a BAC of .14 supports the prosecutor’s theory of impairment. The AI will answer, and it will sound authoritative. A reflective prompt instead requires the model to identify the assumptions that must be true before that question can even be asked.

For example, a reflective prompt can require the AI to explain what assumptions must hold for a whole‑blood alcohol result to be considered scientifically reliable, and to articulate those assumptions before addressing any conclusions about impairment. Framed this way, the model is forced to slow down and surface pre‑analytical variables that are otherwise invisible.

This is where subject‑matter competence begins to matter in concrete ways. In Michigan, whole‑blood alcohol testing is typically performed using dual‑column headspace gas chromatography with an autosampler and flame ionization detection. Including those facts in a prompt immediately sharpens the analysis. So does identifying the instrument manufacturer, the laboratory’s standard operating procedures, and the specific collection materials used.

Variables that matter at this stage include who performed the blood draw, the technique used, the draw site, and the potential for contamination. They include the condition and integrity of the gray‑stoppered vials, the presence and concentration of preservative and anticoagulant, the possibility of clotting or fermentation, and issues related to storage, transportation, temperature, duration, and chain of custody.

None of these questions address impairment. All of them determine whether the reported number has scientific meaning. The more accurately those variables are identified and incorporated into the prompt, the more useful the output becomes. When that information is missing, the lawyer can use Socratic or step‑back prompting to surface it before moving forward.

Once those assumptions are identified, reflective prompting can then require the model to explain why each factor matters, rather than merely listing them. At that stage, the lawyer is not outsourcing expertise, but rehearsing the reasoning process a competent expert would expect.

Only after that layer is explored does it make sense to move to analytical reliability. Here again, disciplined prompting prevents shortcut thinking. Instead of asking whether gas chromatography is reliable in the abstract, the model can be required to explain what assumptions are embedded in that claim, including calibration integrity, linearity, carryover, quality‑control results, analyst preparation of samples, adherence to laboratory SOPs, and instrument maintenance history. Variability is examined rather than presumed away.

At this point, the AI begins to resemble a junior associate who has been instructed to show their work. The output becomes more cautious, more conditional, and more useful. The lawyer can see not only what the model knows, but what it is relying on.

Only then does it make sense to address chromatographic output itself, and even there, reflective prompting can require the model to explain what the reported number does and does not represent, how uncertainty is expressed or concealed, and what inferential steps are required to move from concentration to functional impairment. At that stage, the AI is no longer performing advocacy. It is mirroring process.

Competence Is the Constraint, Not the Model

What makes this approach especially valuable is that it exposes a truth lawyers sometimes resist.

AI can help expand a lawyer’s competence, but it cannot safely replace it.

The quality of AI output is inseparable from the lawyer’s subject‑matter understanding, and reflective prompting makes that relationship visible by revealing gaps rather than concealing them.

The point is not that every lawyer must become a chemist. The point is that every lawyer must recognize the order in which these questions must be asked. Reflective prompting allows lawyers to discover what they do not yet know, but it does not absolve them of responsibility for understanding what matters and why.

Using AI to Slow Down Instead of Speed Up

The broader implication is straightforward. AI should not be used to reach conclusions faster. It should be used to slow reasoning down. In criminal defense practice, speed is rarely the virtue we imagine it to be. Discipline is.

Used this way, reflective prompts become reusable tools. They can be modified to fit the facts of a particular case, whether the issue is blood alcohol testing, drug analysis, digital forensics, or crash reconstruction.

The structure remains constant. Identify assumptions. Explain why they matter. Sequence the analysis the way an expert would. Only then address the ultimate question.

Illustrative Reflective Prompt Examples

The following examples assume the analytical framework described above and are intended to apply it, not restate it. Each illustrates how reflective prompting can be used once the relevant assumptions and sources of variability have been identified.

A reflective prompt can then be used once pre-analytical issues have been identified. At that stage, the model can be asked to explain how the specific method used, such as dual-column headspace gas chromatography with flame ionization detection, establishes accuracy and precision, and to identify where variability or error may be introduced if assumptions are not met.

A further prompt can require the model to separate analytical reliability from interpretive inference. For example, the model can be instructed to explain what the reported numerical result does and does not establish, what uncertainty is inherent in the measurement, and what additional assumptions are required to move from concentration to impairment.

Finally, a reflective prompt can be used as a check against premature conclusions. The model can be asked to explain how its analysis would change if a single assumption were altered or removed, such as uncertainty about preservative concentration, storage conditions, or analyst preparation. This forces the model to show how dependent its conclusions are on facts that may be contested or unknown.

Capstone Reflective Prompt: Applying the Framework to the Case File
You have previously identified the assumptions and conditions necessary for a blood alcohol test result to be considered scientifically reliable, including pre-analytical factors, analytical methodology, and sources of variability.
Apply that framework to the attached toxicology case file, which includes chromatograms from the test run, calibration checks, blanks, and unknown samples.
For each category of data, explain what the materials do and do not demonstrate about analytical reliability. Identify any deviations, gaps, or ambiguities that affect confidence in the reported result.
Do not draw conclusions about impairment. Do not assume facts not contained in the file. Where information is missing or unclear, state what additional information would be required to evaluate reliability. Present the analysis sequentially and explain your reasoning at each step.

About the Author

Patrick T. Barone is a nationally recognized criminal defense attorney known for his work defending drinking and driving cases and other complex criminal matters. He is the founder of Barone Defense Firm and a thought leader on the responsible use of artificial intelligence in legal practice. His writing focuses on disciplined advocacy, forensic evidence, and practical ways lawyers can integrate emerging technology without sacrificing professional judgment.

Join the Conversation

If this article resonated with you, or if you have ever felt uneasy about an AI answer you could not quite explain, that reaction matters. Share your thoughts in the comments, pass this along to a colleague who is experimenting with AI, or subscribe to continue the conversation about using these tools thoughtfully and responsibly in criminal defense practice.

AI on Trial

Discussion about this post

Ready for more?