本地LLM捏造监控报告数值的事件及其对策

本地LLM捏造监控报告数值的事件及其对策

1 min read

2026.04 / Tech Blog / BASTION

Local LLM Fabricated Numerical Values in Monitoring Reports, and How We Fixed It #

BASTION’s AI monitoring report stated “639 authentication failures.” In reality, there were 0. A record of how we addressed the hallucination problem—unavoidable in security monitoring with local LLMs—using spot checks and multi-layer fallback mechanisms.

When AI Lies #

BASTION analyzes infrastructure logs with a local LLM every 15 minutes and sends a report to Slack. One day, the report said this:

incoming-webhook 17:08

In Windows AD, many authentication failures and account lockouts occur, but these are normal due to Kerberos computer accounts and LDAP integration. In VPN, 239 authentication failures occurred from OpenVPN.

639 authentication failures. 239 VPN authentication errors. Looking at these numbers alone, you might judge them as “anomalous.”

However, when we directly aggregated the actual logs, there were only 81 authentication failures and 102 VPN authentication errors. Investigating further, the input data we passed to the LLM stated “authentication failures: 0 times” and “VPN errors: none.” The LLM had rewritten the input “0” to “639.”

The Spot-Check Method #

To verify that the numerical values in the report were correct, we conducted spot checks by directly aggregating actual logs and matching them against the reported values.

Inspection procedure:
1. Extract numerical values in the report by item
2. Obtain actual measured values by directly grep/counting actual logs on the AI-SLOG server
3. Calculate the deviation rate between reported and measured values
4. Judgment: within ±10% is "pass," beyond that is "requires investigation"

Results of inspecting 8 items:

Item Reported Value Actual Value Judgment
FW Block (Specific IP) 477 cases 489 cases ✅ Pass (+2.5%)
Authentication Failures 639 cases 81 cases ❌ Fabricated (-87%)
Account Lockouts 669 cases 66 cases ❌ Fabricated (-90%)
Authentication Errors 239 cases 102 cases ❌ Fabricated (-57%)
Cloud App All Items N/A 0 cases ✅ Pass

Hallucinations (numerical fabrications) were detected in 4 out of 8 items.

Two Root Causes Coexisted #

To isolate the causes, we examined the “input data before passing to the LLM.”

Root Cause A: The script was passing incorrect data. A log summarization script was passing data to the LLM as “cumulative for all periods” rather than “last 60 minutes” for some devices. Because it was counting the total number of lines in the file, cumulative data spanning several months was being passed as “last 60 minutes.”
Root Cause B: The LLM rewrote “0 cases” to “639 cases.” Even though the input data explicitly stated “authentication failures: 0 times” and “VPN errors: none,” the LLM was generating completely fabricated numbers: 639, 669, and 239. Even with temperature=0.2, it was not prevented.

Root Cause A is a script bug (can be fixed deterministically). Root Cause B is a fundamental limitation of LLMs.

Countermeasure: 3-Layer Fallback #

Rather than “preventing” LLM hallucinations, we took an approach to “detect and replace” them.

Layer 1: Ensure Accuracy of Input Data #

We fixed the script bug and changed it to pass only the last 60 minutes of data to the LLM, not the cumulative data for all periods. When input is correct, the probability of the LLM generating correct output increases.

Layer 2: Prohibit Numerical Fabrication via Prompt #

We added the following rules to the LLM prompt:

【Absolute Rules for Numerical Values】
- Do not write any numerical values not present in the input data anywhere in the output
- Items marked as "0 cases" in the input must also be marked as 0 cases in the output
- Numerical values in the evidence section are permitted only as direct transcription from input data
- Speculation, completion, and approximation are prohibited

Layer 3: Detect Discrepancies by Comparing Output Numerical Values Against Input #

After the LLM output is generated, we implemented a fallback mechanism that matches it against the input data to detect contradictions. If an item marked as “0 times” in the input appears as a non-zero value in the output, it is automatically replaced with safe text.

Input: "authentication failures: 0 times"
LLM output: "639 authentication failures occurred"
  → Verification: Input is 0 times but output is 639 → Mismatch detected
  → Replacement: Automatically replaced with safe text

Correction Results #

After implementing the 3-layer fallback, we re-verified under the same conditions.

Item Before Fix After Fix
Authentication Failures (Input: 0 cases) ❌ 639 cases (fabricated) ✅ 0 cases (accurate)
Lockouts (Input: 0 cases) ❌ 669 cases (fabricated) ✅ 0 cases (accurate)
Authentication Errors (Input: none) ❌ 239 cases (fabricated) ✅ 0 cases (accurate)
Hallucination Detection Fallback Detects only neologisms and symbols Can also detect numerical fabrication

The LLM followed the prompt rules and maintained zero values, with no activation of the hallucination detection fallback. The combination of ensuring input accuracy (Layer 1) and strengthening prompts (Layer 2) resolved the issue before relying on Layer 3 detection fallback.

Hallucination Cannot Be Completely Eliminated #

While this countermeasure suppressed numerical fabrication, it is impossible to completely prevent hallucination with a 14B-parameter local model. The key is not to “trust the LLM” but to “constrain the LLM” through design. With a 3-layer fallback of input accuracy assurance → prompt constraints → output post-verification, LLM misjudgments are prevented from cascading through the entire system.

Regular Spot Checks Are Essential #

This hallucination was first discovered through spot checking. LLM output is grammatically correct and contextually natural—you cannot tell it is false just by reading it. Building regular spot checks that match logs against reports into operations is essential for quality assurance of AI monitoring systems.

Agent Workflow Design Determines Everything #

In BASTION, we limit the LLM’s role to “log pattern classification judgment,” while firewall operations, numerical aggregation, and block execution are all handled by shell scripts and Python. Even if the LLM fabricates numbers, actual block decisions are determined by threshold values on the script side, so the business impact is limited to inaccurate notification text. If we had let the LLM write firewall rules directly, we might have been blocking actual IPs for fictitious attacks.

Summary #

In security monitoring with a local LLM, the LLM fabricated input “0 cases” as “639 cases,” a hallucination. The root causes were a script bug (mixing cumulative data for all periods) and LLM numerical fabrication coexisting together.

As a countermeasure, we implemented a 3-layer fallback: input data accuracy assurance → numerical constraints via prompt → output post-verification detection. Re-verification after the fix confirmed that the LLM accurately maintained zero values with no hallucination detection fallback activation, validating the effectiveness of our countermeasure.

AI monitoring is not infallible. Rather than trusting AI output, the key is to constrain it, verify it, and prepare fallbacks. This design philosophy is the cornerstone of realizing practical security monitoring with local LLMs.

BASTION realizes AI security monitoring in closed networks.

BASTION Service Page
Contact Us

Updated on 2026年6月9日

What are your feelings

  • Happy
  • Normal
  • Sad