Implementation Record: Balancing Accuracy and Determinism with Qwen2.5-14B + GPUStack

Implementation Record: Balancing Accuracy and Determinism with Qwen2.5-14B + GPUStack

8 min read

// BASTION Technical Explanation

Implementation Record: Balancing Accuracy and Determinism with Qwen2.5-14B + GPUStack

Author: Hideyuki Chinoda / BESTNET LLC

1. Introduction — Why Local LLM #

The core of BASTION is local LLM (Qwen2.5-14B). Rather than cutting-edge external LLM APIs like GPT-4 or Claude, we chose Qwen2.5 running on our own GPU server.

The natural question is: “Why choose a local model with inferior performance when you could use the more accurate option?” This article explains the rationale behind this decision and how we actually operate it.

To state the conclusion upfront: Local LLM has three strong advantages distinct from “accuracy,” and in the infrastructure operations domain, those advantages are more important.

2. Three Barriers to Cloud LLM #

In the early stages of BASTION development, we naturally considered using OpenAI API or Anthropic API. However, we hit three barriers:

2.1 Data Sovereignty Barrier #

The logs BASTION handles contain sensitive customer information: attack source IPs, internal system architecture, usernames, authentication failure details, and more. Sending this to external LLM APIs is prohibited by many customer contracts.

Particularly in finance, government, and healthcare sectors, “sending logs externally constitutes contract breach” is not uncommon. Cloud LLM is simply not usable in these cases.

2.2 Cost Opacity #

External LLM APIs charge on a per-usage basis. Services like AWS DevOps Agent charge by the second. With BASTION, which runs continuously 24/7, costs scale linearly with log volume. Monthly budgets become unpredictable.

The result: “Months with more attacks blow the budget” and “quiet months waste allocated funds,” making cost optimization extremely difficult.

2.3 Network Dependency #

Cloud LLM requires internet connectivity. In closed networks (networks isolated by dedicated lines, etc.), it simply cannot be used.

Some BASTION targets want “a monitoring system that works in environments with no internet connection.” Capturing this segment was key to differentiation.

3. BASTION’s LLM Utilization Strategy — Three Principles #

Based on these factors, BASTION follows three principles for LLM usage:

PrincipleMeaning
① Local LLM as FoundationNever send sensitive logs externally. Operate Qwen2.5-14B via GPUStack
② Don’t Delegate Decisions to LLMCampaign determination and blocking execution are handled by deterministic logic
③ Use LLM as “Organizer”Primary use: log summarization, natural language conversion, human-readable report generation

In short, use LLM as an “intelligent secretary,” leaving judgment to deterministic code. This differs slightly from mainstream LLM usage in the industry.

4. Architecture — GPUStack + Qwen2.5-14B #

BASTION’s LLM foundation runs on a simple configuration:

BASTION Central Server (AI-SLOG)
  │
  ├── rsyslog receives logs from each device
  ├── Shell script group for preprocessing and summarization
  │     │
  │     ↓ HTTP API call (OpenAI-compatible)
  │
  └── GPUStack Cluster
        ├── Qwen2.5-14B (primary model)
        ├── Lightweight model (fallback)
        └── Automatic load balancing + automatic restart

Here’s why we chose this setup:

4.1 Why Qwen2.5-14B #

In BASTION, log summarization and natural language report generation are the primary jobs. These require:

  • Natural Japanese output — customer reports mixed with English are unusable
  • 14B model size — runs on a single 24GB GPU without quantization
  • Long-context support — log summarization demands sufficient context length
  • Open license — commercial use is explicitly permitted

We also evaluated Llama 3 series, but Qwen2.5 won on Japanese naturalness. This was our internal evaluation conclusion.

4.2 Why GPUStack #

GPUStack is an OSS that treats multiple GPU servers as a cluster. For BASTION:

  • OpenAI-compatible API for external access
  • Easy model switching (fallback when primary model fails)
  • Distributed operations possible (parallel processing across multiple GPU servers)
  • Operations dashboard included (model status visible at a glance)

These met our commercial environment requirements.

5. Pipeline — From Logs to Notification #

BASTION’s LLM usage spans multiple steps:

1. Receive device logs        (rsyslog)
        ↓
2. Classify and save by device (shell + filesystem)
        ↓
3. Summarize per device       (LLM call #1)
        ↓
4. Full correlation analysis  (LLM call #2)
        ↓
5. Severity determination     (deterministic logic)
        ↓
6. Slack notification         (auto-post only if CRITICAL)
        ↓
7. Detailed analysis          (operator calls on mention)

Critically, each step has clearly defined LLM responsibilities.

StepOwnerLLM Role
SummarizationLLMConvert massive logs to natural language “what happened”
Correlation analysisLLMCombine multiple device summaries into overall trend narrative
Severity determinationDeterministic codeNo LLM (misjudgment risk mitigation)
Block executionDeterministic codeNo LLM (irreversible action)
Notification text generationLLMFormat into readable Japanese for operators

“Don’t let LLM decide, only delegate formatting” — this principle is consistent throughout.

6. What to Delegate to LLM, What Not To #

This is the core of BASTION’s design philosophy, so I’ll elaborate:

BASTION separates, from the design stage, scenarios where LLM output is trusted versus not trusted:

6.1 What We Can Trust LLM With #

  • Natural language log summarization — minor phrasing variations are acceptable
  • Consolidating multiple device statuses into one paragraph — for human reading
  • Slack notification text formatting — emojis and emphasis by severity are delegated
  • Referencing similar past incidents — presenting “this happened last week”

6.2 What We Cannot Trust LLM With #

  • Final determination of whether an IP is malicious — decided mathematically
  • Decisions to execute blocks — conditions explicit in code
  • Decisions to unblock — 24-hour auto-expiry or operator action only
  • Production config changes — operator manual action only
  • Final security severity rating — rule-based determination

While operators often read LLM output to make decisions, LLM output almost never directly controls system behavior in BASTION.

7. Hybrid LLM Strategy — Local + External API Interplay #

As of May 2026, BASTION is not only leveraging local LLM, but exploring integration with high-performance external LLM APIs like Claude and GPT in an experimental phase.

However, this is not “abandoning local LLM for external migration.” We’re building a selection harness based on use case.

Use CaseLLM ChoiceReason
Daily log analysisLocal (Qwen2.5)Never externalize sensitive data
Routine task automation (report organization, etc.)External APINo sensitive data involved, superior structure capability
Complex reasoning for unexpected issuesExternal API (as needed)Complex scenario analysis when required
Production control judgmentNoneDeterministic logic only

For customers with strict data sovereignty requirements, we continue providing configurations where external LLM is disconnected, running local-only. The product is “extensible to use external LLM when desired” rather than “must use external LLM.”

8. Critical Design Decisions #

Several important decisions emerged from local LLM operations:

1. Version-control prompts: LLM prompts (queries) are version-controlled like code. Output quality varies significantly with prompt changes, so we maintain history.

2. Enforce JSON output: Freeform LLM output can’t be parsed by downstream code. BASTION enforces JSON format output with retry on parse errors.

3. Keep temperature low: For stable output from identical input, temperature stays low (0.0–0.3). Creativity unnecessary; determinism critical.

4. “No fabrication” prompt instructions: Always include directives like “never create information absent from logs” and “return ‘unknown’ explicitly when uncertain.” This won’t eliminate hallucinations completely, so we conduct hallucination audits downstream (details in a future article).

5. Fallback on model failure: If the primary model (Qwen2.5-14B) becomes unresponsive, we fallback to a lightweight model. Continued basic summarization beats complete shutdown.

9. Operational Results — Noise Reduction and Determinism #

After about one month of BASTION production operation, here’s the LLM operational record:

  • Log analysis automation rate: ~95% (replaced operator log tail with LLM summaries)
  • Slack notification noise reduction: Switching to CRITICAL-only notifications reduced routine notification frequency by ~8x
  • Judgment determinism: Identical input always yields identical judgment (LLM probabilistic output is eliminated)
  • False positive rate: 0% (zero incidents of blocking internal/partner IPs)
  • Hallucination detections: Audit logic continuously detects hallucinations; detected events are discarded and re-analyzed

“100% determinism” is particularly important. LLM-based judgment can yield different results for the same input. BASTION avoids using LLM for judgment, so this kind of “non-reproducible misbehavior” never occurs.

10. Design Tradeoffs #

To be honest, local LLM operation has constraints:

1. Initial GPU cost: Running Qwen2.5-14B comfortably requires a 24GB-class GPU. Minimal config runs tens of thousands of dollars. Unlike cloud APIs, there’s no “free tier” option.

2. Inference performance ceiling: 14B models don’t match GPT-4 or Claude 3.5 Sonnet reasoning capability. Complex multi-step reasoning may require external API fallback.

3. Model update tracking: New models require evaluation and switching decisions. In-house technical judgment capacity is needed (covered by BASTION maintenance contract).

4. Ongoing prompt refinement: Optimal prompts differ per environment. Initial tuning and continuous operational improvement are mandatory.

These are “unavoidable tradeoffs of choosing local LLM”. We accept them in exchange for data sovereignty, cost predictability, and closed-network capability.

11. Future Development #

The LLM foundation will continue evolving:

  • Tracking new-generation models: Qwen3 and next-generation model evaluation and migration
  • Operationalizing hybrid LLM strategy: Standardizing use-case-driven automatic routing
  • Multimodal support: Feeding network diagrams and traffic graphs to LLM
  • Per-customer environment tuning: Automating prompt optimization matched to each customer’s log characteristics

We’ll tackle these incrementally.

13. Contact #

For enterprises considering BASTION adoption or interested in collaborative proof-of-concept programs, please reach out via our contact form.

We can also provide consultation and support for building and operating local LLM infrastructure (GPUStack + Qwen2.5) in conjunction with BASTION adoption. We’ll offer proposals with individual quotations based on scope.

Free consultation and contact →
Updated on 2026年6月9日

What are your feelings

  • Happy
  • Normal
  • Sad