LLM 幻覺稽核的實作

LLM 幻覺稽核的實作

1 min read

// BASTION Technical Explanation

LLM Hallucination Audit Implementation #

Cross-checking the numbers AI writes against real-device logs

Author: Hideyuki Chinda / BESTNET LLC

1. The Trigger — The AI Wrote “639.” The Real Count Was 0 #

Some time ago, our local LLM wrote “639 authentication failures” in an auto-generated monitoring report. When we actually checked the logs, the number of authentication failures in that time window was 0. The LLM had simply produced a plausible-looking number out of thin air (we wrote about this episode in “The Story of a Local LLM Fabricating Monitoring-Report Numbers, and the Countermeasures”).

This is not a defect of the LLM; it is its nature. An LLM excels at generating “contextually plausible text,” and it has no built-in mechanism that guarantees “factually accurate numbers.” It often writes correct numbers, but not because correctness is guaranteed — it just happens to be right.

In AI Ops, this cannot be overlooked. A monitoring report you cannot trust is as good as nonexistent, or worse, dangerous. If someone believes “639 authentication failures” and acts on it, they spend time on a problem that does not exist. Conversely, a real anomaly may be buried.

So we built a mechanism that does not blindly trust the numbers the LLM writes — hallucination auditing. This article describes the thinking behind its implementation.

2. The Principle — Facts Are Not Made by the LLM. Logs Make Them #

The starting point of auditing is the separation of roles.

The “facts” that appear in a monitoring report — numbers, counts, IP addresses, timestamps — are obtained not from the LLM’s text generation, but from deterministic queries against real-device logs. The LLM’s role is limited to wording the confirmed facts into text that humans can read easily.

  • Facts (numbers, targets, timestamps) → made by queries against the logs (ground truth)
  • Narrative (summary, explanation, prose) → made by the LLM

This separation alone greatly reduces the room for fabrication. Asking an LLM to “read the logs and count” invites miscounting (= fabrication); but if you aggregate the counts first, hand over the confirmed values, and ask it to “explain the situation using these numbers,” the numbers no longer move.

3. Catching the Fabrication That Still Remains, Through an “Audit” #

Even with roles separated, the LLM may, while writing, slip in numbers it was never given, or mistake the target. So we place an audit step that, after the output is produced, once again cross-checks the LLM-generated report against the source logs.

The rough flow is as follows.

1. Aggregate logs to build the "confirmed facts"       <- ground truth
2. Hand the confirmed values to the LLM to generate the report text
3. Extract the quantitative claims from the generated report
   (e.g., "N authentication failures on host A")
4. Cross-check each claim against the source logs (the confirmed facts in step 1)
5. If an unsupported or contradictory claim is found,
   replace it with the confirmed value, or regenerate the report
6. Record every detected mismatch (when, where, what was fabricated)

There are two key points.

The cross-check is deterministic. We do not let another LLM do the matching. Auditing a hallucination with something that can itself hallucinate is pointless. The cross-check is a mechanical check against an immovable fact — the source logs.

The "claim" is the unit of verification. Rather than fishing numbers out of free text afterward, receiving the LLM's output from the outset in an easily verifiable structure (a "target / metric / value" tuple) makes the matching far more robust. We look not at the prose's form, but at whether each individual claim is backed by ground truth.

An illustration of the matching (values are for explanation):

[AUDIT] report_id=████
  claim: host=A  metric=auth_fail  value=639
  truth: host=A  metric=auth_fail  value=0
  => MISMATCH  (replace with confirmed value / regenerate)

  claim: host=B  metric=port_scan  value=47
  truth: host=B  metric=port_scan  value=47
  => OK

4. This Is Not About "Making the LLM Smarter" #

This is the crucial point as a design philosophy. Hallucination auditing is not an approach that tries to make the LLM smarter or more accurate. Without changing the premise that "the LLM makes mistakes," it makes the output trustworthy by construction.

Even if the LLM writes a wrong number, it is cross-checked against ground truth in the audit, so the numbers in the final report are guaranteed. We do not entrust our confidence to the LLM's cleverness, but to the verification mechanism.

5. Design Tradeoffs (Honestly) #

It is not a silver bullet.

  1. Cost increases. Aggregation, generation, and auditing add passes. Still, we judge that the value of "being trustworthy" in a production monitoring report is worth this cost.
  2. What it catches is the fabrication of "facts and quantities." Verifiable claims such as "the count of authentication failures" can be cross-checked, but the validity of subjective narrative such as "this trend looks dangerous" is hard to verify mechanically. That is precisely why we limit the LLM's role to summarizing and formatting, and design it not to make judgments.
  3. It assumes the ground truth is correct. The audit rests on the premise that "the source logs are fact." The accuracy of log collection and device-level determination itself is the foundation (the collection and automatic-determination mechanisms are covered in separate articles).

6. The Same Principle as BASTION's Roots #

This stance of "not trusting the numbers the LLM writes" is continuous with BASTION's overall design.

Decisions that actually move the system — the final determination of an attacking IP, executing a firewall block — are made by deterministic determination based on real-device logs, not by the LLM (How Multi-Layer Correlation Campaign Detection Works). What the LLM is good at is summarizing and formatting natural language, not production decision-making — this line is exactly the verification discipline we described in "The More Beautiful the Design, the More You Doubt It with Real Data."

Hallucination auditing applies that same principle to the reporting layer. The LLM, and the (potentially compromised) Agent alike, are to us things to "distrust and verify."

7. Summary #

The wider the scope we delegate to AI, the more "how we verify the AI's output" decides the product's trustworthiness.

  • Facts (numbers) are not made by the LLM; they are taken from the logs
  • The claims the LLM writes are cross-checked against ground truth after output
  • The cross-check is deterministic. Do not audit a hallucination with a hallucination
  • The LLM's role is limited to summary and formatting; it does not make judgments

Do not swallow the numbers the LLM writes whole. It is unglamorous, but it is the foundation for running AI Ops in production with confidence.

Contact Us #

At BASTION, we progressively publish concept-level design decisions and operational know-how on our tech blog (specific customer information and patent-related formulas are not disclosed). Companies interested in adopting or jointly validating our closed-network AI Ops Platform are welcome to reach out via the contact form.

Free consultation / Contact us →

Updated on 2026年6月13日

What are your feelings

  • Happy
  • Normal
  • Sad