Implementation Record: Balancing Accuracy and Determinism with Qwen2.5-14B + GPUStack

8 min read

// BASTION Technical Explanation 2026-05-14

Author: Hideyuki Chinoda / BESTNET LLC

1. Introduction — Why Local LLM #

The core of BASTION is local LLM (Qwen2.5-14B). Rather than cutting-edge external LLM APIs like GPT-4 or Claude, we chose Qwen2.5 running on our own GPU server.

The natural question is: “Why choose a local model with inferior performance when you could use the more accurate option?” This article explains the rationale behind this decision and how we actually operate it.

To state the conclusion upfront: Local LLM has three strong advantages distinct from “accuracy,” and in the infrastructure operations domain, those advantages are more important.

2. Three Barriers to Cloud LLM #

In the early stages of BASTION development, we naturally considered using OpenAI API or Anthropic API. However, we hit three barriers:

2.1 Data Sovereignty Barrier #

The logs BASTION handles contain sensitive customer information: attack source IPs, internal system architecture, usernames, authentication failure details, and more. Sending this to external LLM APIs is prohibited by many customer contracts.

Particularly in finance, government, and healthcare sectors, “sending logs externally constitutes contract breach” is not uncommon. Cloud LLM is simply not usable in these cases.

2.2 Cost Opacity #

External LLM APIs charge on a per-usage basis. Services like AWS DevOps Agent charge by the second. With BASTION, which runs continuously 24/7, costs scale linearly with log volume. Monthly budgets become unpredictable.

The result: “Months with more attacks blow the budget” and “quiet months waste allocated funds,” making cost optimization extremely difficult.

2.3 Network Dependency #

Cloud LLM requires internet connectivity. In closed networks (networks isolated by dedicated lines, etc.), it simply cannot be used.

Some BASTION targets want “a monitoring system that works in environments with no internet connection.” Capturing this segment was key to differentiation.

3. BASTION’s LLM Utilization Strategy — Three Principles #

Based on these factors, BASTION follows three principles for LLM usage:

Principle	Meaning
① Local LLM as Foundation	Never send sensitive logs externally. Operate Qwen2.5-14B via GPUStack
② Don’t Delegate Decisions to LLM	Campaign determination and blocking execution are handled by deterministic logic
③ Use LLM as “Organizer”	Primary use: log summarization, natural language conversion, human-readable report generation

In short, use LLM as an “intelligent secretary,” leaving judgment to deterministic code. This differs slightly from mainstream LLM usage in the industry.

4. Architecture — GPUStack + Qwen2.5-14B #

BASTION’s LLM foundation runs on a simple configuration:

BASTION Central Server (AI-SLOG)
  │
  ├── rsyslog receives logs from each device
  ├── Shell script group for preprocessing and summarization
  │     │
  │     ↓ HTTP API call (OpenAI-compatible)
  │
  └── GPUStack Cluster
        ├── Qwen2.5-14B (primary model)
        ├── Lightweight model (fallback)
        └── Automatic load balancing + automatic restart

Here’s why we chose this setup:

4.1 Why Qwen2.5-14B #

In BASTION, log summarization and natural language report generation are the primary jobs. These require:

Natural Japanese output — customer reports mixed with English are unusable
14B model size — runs on a single 24GB GPU without quantization
Long-context support — log summarization demands sufficient context length
Open license — commercial use is explicitly permitted

We also evaluated Llama 3 series, but Qwen2.5 won on Japanese naturalness. This was our internal evaluation conclusion.

4.2 Why GPUStack #

GPUStack is an OSS that treats multiple GPU servers as a cluster. For BASTION:

OpenAI-compatible API for external access
Easy model switching (fallback when primary model fails)
Distributed operations possible (parallel processing across multiple GPU servers)
Operations dashboard included (model status visible at a glance)

These met our commercial environment requirements.

5. Pipeline — From Logs to Notification #

BASTION’s LLM usage spans multiple steps:

1. Receive device logs        (rsyslog)
        ↓
2. Classify and save by device (shell + filesystem)
        ↓
3. Summarize per device       (LLM call #1)
        ↓
4. Full correlation analysis  (LLM call #2)
        ↓
5. Severity determination     (deterministic logic)
        ↓
6. Slack notification         (auto-post only if CRITICAL)
        ↓
7. Detailed analysis          (operator calls on mention)

Critically, each step has clearly defined LLM responsibilities.

Step	Owner	LLM Role
Summarization	LLM	Convert massive logs to natural language “what happened”
Correlation analysis	LLM	Combine multiple device summaries into overall trend narrative
Severity determination	Deterministic code	No LLM (misjudgment risk mitigation)
Block execution	Deterministic code	No LLM (irreversible action)
Notification text generation	LLM	Format into readable Japanese for operators

“Don’t let LLM decide, only delegate formatting” — this principle is consistent throughout.

6. What to Delegate to LLM, What Not To #

This is the core of BASTION’s design philosophy, so I’ll elaborate:

BASTION separates, from the design stage, scenarios where LLM output is trusted versus not trusted:

6.1 What We Can Trust LLM With #

Natural language log summarization — minor phrasing variations are acceptable
Consolidating multiple device statuses into one paragraph — for human reading
Slack notification text formatting — emojis and emphasis by severity are delegated
Referencing similar past incidents — presenting “this happened last week”

6.2 What We Cannot Trust LLM With #

Final determination of whether an IP is malicious — decided mathematically
Decisions to execute blocks — conditions explicit in code
Decisions to unblock — 24-hour auto-expiry or operator action only
Production config changes — operator manual action only
Final security severity rating — rule-based determination

While operators often read LLM output to make decisions, LLM output almost never directly controls system behavior in BASTION.

7. Hybrid LLM Strategy — Local + External API Interplay #

As of May 2026, BASTION is not only leveraging local LLM, but exploring integration with high-performance external LLM APIs like Claude and GPT in an experimental phase.

However, this is not “abandoning local LLM for external migration.” We’re building a selection harness based on use case.

Use Case	LLM Choice	Reason
Daily log analysis	Local (Qwen2.5)	Never externalize sensitive data
Routine task automation (report organization, etc.)	External API	No sensitive data involved, superior structure capability
Complex reasoning for unexpected issues	External API (as needed)	Complex scenario analysis when required
Production control judgment	None	Deterministic logic only

For customers with strict data sovereignty requirements, we continue providing configurations where external LLM is disconnected, running local-only. The product is “extensible to use external LLM when desired” rather than “must use external LLM.”

8. Critical Design Decisions #

Several important decisions emerged from local LLM operations:

1. Version-control prompts: LLM prompts (queries) are version-controlled like code. Output quality varies significantly with prompt changes, so we maintain history.

2. Enforce JSON output: Freeform LLM output can’t be parsed by downstream code. BASTION enforces JSON format output with retry on parse errors.

3. Keep temperature low: For stable output from identical input, temperature stays low (0.0–0.3). Creativity unnecessary; determinism critical.

4. “No fabrication” prompt instructions: Always include directives like “never create information absent from logs” and “return ‘unknown’ explicitly when uncertain.” This won’t eliminate hallucinations completely, so we conduct hallucination audits downstream (details in a future article).

5. Fallback on model failure: If the primary model (Qwen2.5-14B) becomes unresponsive, we fallback to a lightweight model. Continued basic summarization beats complete shutdown.

9. Operational Results — Noise Reduction and Determinism #

After about one month of BASTION production operation, here’s the LLM operational record:

Log analysis automation rate: ~95% (replaced operator log tail with LLM summaries)
Slack notification noise reduction: Switching to CRITICAL-only notifications reduced routine notification frequency by ~8x
Judgment determinism: Identical input always yields identical judgment (LLM probabilistic output is eliminated)
False positive rate: 0% (zero incidents of blocking internal/partner IPs)
Hallucination detections: Audit logic continuously detects hallucinations; detected events are discarded and re-analyzed

“100% determinism” is particularly important. LLM-based judgment can yield different results for the same input. BASTION avoids using LLM for judgment, so this kind of “non-reproducible misbehavior” never occurs.

10. Design Tradeoffs #

To be honest, local LLM operation has constraints:

1. Initial GPU cost: Running Qwen2.5-14B comfortably requires a 24GB-class GPU. Minimal config runs tens of thousands of dollars. Unlike cloud APIs, there’s no “free tier” option.

2. Inference performance ceiling: 14B models don’t match GPT-4 or Claude 3.5 Sonnet reasoning capability. Complex multi-step reasoning may require external API fallback.

3. Model update tracking: New models require evaluation and switching decisions. In-house technical judgment capacity is needed (covered by BASTION maintenance contract).

4. Ongoing prompt refinement: Optimal prompts differ per environment. Initial tuning and continuous operational improvement are mandatory.

These are “unavoidable tradeoffs of choosing local LLM”. We accept them in exchange for data sovereignty, cost predictability, and closed-network capability.

11. Future Development #

The LLM foundation will continue evolving:

Tracking new-generation models: Qwen3 and next-generation model evaluation and migration
Operationalizing hybrid LLM strategy: Standardizing use-case-driven automatic routing
Multimodal support: Feeding network diagrams and traffic graphs to LLM
Per-customer environment tuning: Automating prompt optimization matched to each customer’s log characteristics

We’ll tackle these incrementally.

13. Contact #

For enterprises considering BASTION adoption or interested in collaborative proof-of-concept programs, please reach out via our contact form.

We can also provide consultation and support for building and operating local LLM infrastructure (GPUStack + Qwen2.5) in conjunction with BASTION adoption. We’ll offer proposals with individual quotations based on scope.

Free consultation and contact →

Updated on 2026/6/9

What are your feelings

Happy
Normal
Sad

Implementation Record: Balancing Accuracy and Determinism with Qwen2.5-14B + GPUStack

Implementation Record: Balancing Accuracy and Determinism with Qwen2.5-14B + GPUStack

1. Introduction — Why Local LLM #

2. Three Barriers to Cloud LLM #

2.1 Data Sovereignty Barrier #

2.2 Cost Opacity #

2.3 Network Dependency #

3. BASTION’s LLM Utilization Strategy — Three Principles #

4. Architecture — GPUStack + Qwen2.5-14B #

4.1 Why Qwen2.5-14B #

4.2 Why GPUStack #

5. Pipeline — From Logs to Notification #

6. What to Delegate to LLM, What Not To #

6.1 What We Can Trust LLM With #

6.2 What We Cannot Trust LLM With #

7. Hybrid LLM Strategy — Local + External API Interplay #

8. Critical Design Decisions #

9. Operational Results — Noise Reduction and Determinism #

10. Design Tradeoffs #

11. Future Development #

12. Related Articles #

13. Contact #

Share This Article :

Services

AI Solutions

Resources