Design with “Do Not Trust Agent” Premise in DMZ Environments That May Be Compromised
1. Introduction — DMZ has “blind spots” in security monitoring #
For cloud providers and hosting providers, public web servers and reverse proxies in the DMZ (demilitarized zone) are the most targeted locations.
Even in such places, it is not uncommon to have configurations that aggregate logs via syslog forwarding. However, there is an important premise here:
BASTION’s DMZ Agent is designed with the premise that “the Agent itself might be compromised.” This article explains the “Agent untrust model” and the validation engine design that supports it.
2. Why existing EDR and SIEM Agents are insufficient #
EDR (Endpoint Detection & Response) and SIEM product Agent features can detect endpoint anomalies. However, most of these products are based on the following premise:
- The Agent is trustworthy (= signed binary issued by server side)
- Events sent from the Agent are correct (= pre-validated inside the Agent)
- If the Agent is silent, the server is safe (= the Agent should notify if abnormal)
These premises are valid during normal times, but fail completely on a compromised DMZ server:
- Even with a signed binary, if root privilege is obtained, the Agent process itself can be modified
- If the attacker understands the event validation logic inside the Agent, they can generate “fake events” that pass validation
- By killing the Agent and starting a fake Agent, “normal silence” can be staged
BASTION is designed to close all these loopholes.
3. Agent Untrust Model — Three responsibility separations #
The relationship between BASTION’s Agent and the central server (AI-SLOG) is clearly separated into three responsibilities.
| Responsibility | Owner | Reason |
|---|---|---|
| Log collection and primary event generation | Agent (DMZ side) | Low latency and real-time responsiveness are important |
| Event validation and judgment | AI-SLOG (internal side) | Must be performed in a location that cannot be compromised |
| Defense action (blocking) | Agent (DMZ side) | Can only be blocked on the relevant server |
The key point is that “detection” and “validation” are physically separated.” Even if the Agent notifies “attack detected,” AI-SLOG does not trust it at face value. It always performs independent validation using raw logs directly pulled from the same server via a separate pathway.
DMZ Server AI-SLOG (Internal)
┌──────────────┐ ┌──────────────┐
│ Agent │── WebSocket ──→│ Receive │
│ ・Log Monitor│ (Events) │ │
│ ・Primary │ │ ↓ │
│ Detection │ │ Validation │←─ Raw logs
│ │ │ Engine │ (via rsyslog)
│ ・Block │←── WebSocket ──│ ↓ │
│ Execution │ (block cmd) │ Send Command │
│ Only │ │ │
└──────────────┘ └──────────────┘
↑ ↑
Agent cannot do more Cross-reference with
than blocking even if lying raw logs; discard event
if mismatch4. Validation Engine — Cross-referencing Agent events with raw logs #
The validation engine is a mechanism to independently verify whether events arriving from the Agent are “facts.”
Example event from Agent (attack detection notification):
{
"agent_id": "dmz-web-01",
"event_type": "vuln_scan_detected",
"src_ip": "203.0.113.42",
"target": "/.env",
"timestamp": "2026-05-13T10:23:45+09:00",
"evidence_lines": [
"203.0.113.42 GET /.env HTTP/1.1 404",
"203.0.113.42 GET /.git/config HTTP/1.1 404",
"203.0.113.42 GET /wp-admin HTTP/1.1 404"
]
}Upon receiving this event, AI-SLOG validates it through the following steps:
- Raw log acquisition: Apache/Nginx logs from the relevant DMZ server are already obtained separately via rsyslog
- Search around relevant timestamp: Search for requests from the same IP within ±30 seconds of the event’s timestamp
- Cross-reference with evidence_lines: Verify that the three lines the Agent claims actually exist in the real logs
- Discard if mismatch: If even one line does not match, this event is discarded as “possible fabrication,” and a warning is issued
If the Agent operates honestly, the evidence_lines will match the actual logs and pass validation without issue. Even if the Agent is compromised by an attacker and sends fake events, they are easily filtered by cross-referencing with raw logs.
The important part is that the raw log acquisition pathway and Agent event pathway are completely separated. Log forwarding via rsyslog runs on a separate pathway from the Agent process, and even if an attacker controls the Agent, they cannot modify the rsyslog reception on the internal side.
5. Agent Permission Design — Minimized capabilities #
Permissions granted to the Agent itself are also reduced to the absolute minimum.
| Operation | Permission | Note |
|---|---|---|
| IP blocking (ufw deny etc.) | Allowed | Necessary for server defense |
| IP block removal (ufw delete etc.) | Denied | Prevent attacker self-removal |
| Configuration file editing | Denied | Prevent config.yaml rewrite attacks |
| External shell command execution | Denied | Whitelist-based allowed_commands in config.yaml |
| Access to other servers | Denied | Prevent lateral movement |
Particularly important is the design that “the Agent cannot remove blocks itself.” Even if an attacker gains control of the Agent, they cannot unblock their own IP to re-enter.
Then how are legitimate users unblocked if falsely detected? This is addressed through two pathways: “automatic expiration after 24 hours” (described later) and manual unblock commands from the AI-SLOG side.
6. Heartbeat Freeze — Safety device when inactive #
We mentioned earlier not to trust the premise “if Agent is silent, it is safe.” So how does BASTION actually handle the situation where an Agent is stopped?
Each Agent periodically sends heartbeats to AI-SLOG. When this is interrupted for a certain period, BASTION performs the following:
- Freeze all new events from the relevant Agent (discard without processing)
- Notify operators via Slack
- Stop accepting events until Agent restart and heartbeat recovery
This is a countermeasure against scenarios where the Agent silently sends false information while remaining quiet. An attacker could conceivably continue sending only heartbeats while stopping actual log forwarding, but in this case, the separate rsyslog pathway can detect that “logs have suddenly stopped.”
In other words, we intentionally have multiple pathways to judge Agent status.
7. 24-hour automatic expiration — Prevent blocking permanence #
Blocks executed by the Agent (ufw deny etc.) are designed to automatically expire after 24 hours.
# Agent-side cron.hourly
/opt/bastion-agent/bastion-ufw-prune.sh
# → Judge elapsed time from timestamp embedded in ufw comments
# → Delete entries exceeding 24 hours with ufw delete
# → Do not wait for unblock instruction from AI-SLOG (autonomous local operation)This design has three intentions:
- Prevent false positive persistence: Even if temporarily blocked by false detection, it self-recovers after 24 hours
- Agent needs no unblock authority: With automatic expiration, there is no need to grant the Agent unblock permissions
- Reduce AI-SLOG dependency: Expiration processing completes locally on the Agent; no problem if AI-SLOG is down
If the same attacker returns after 24 hours, they are naturally detected and blocked again. As long as the attacker continues activity, blocking continues to activate.
8. Implementation Decision — Why WebSocket was chosen #
Communication between Agent and AI-SLOG is implemented using WebSocket. Let me share the reasoning.
| Option | BASTION Decision |
|---|---|
| HTTP POST (Agent → AI-SLOG) | Not adopted. Separate pathway needed to send back commands |
| MQTT | Not adopted. Adding broker increases operational burden; excessive in closed networks |
| gRPC | Not adopted. Protocol definition operational burden is large; difficult to debug |
| WebSocket (bidirectional) | Adopted. Bidirectional communication in single connection, TLS, HTTP compatible |
Particularly important is that “Agent → AI-SLOG event sending” and “AI-SLOG → Agent block commands” can be handled in a single connection. This simplifies communication across firewalls and greatly reduces operational burden.
Additionally, it can be terminated with standard reverse proxies like HAProxy (same semantics as HTTP), so TLS termination and authentication integration work with existing assets.
9. Design tradeoffs and constraints #
To be honest, this mechanism has constraints.
1. Validation engine computational cost: Since each Agent event is cross-referenced with raw logs, processing time increases with event volume. BASTION limits evidence_lines to 3-5 lines and narrows the search range by time window to control costs.
2. Raw log acquisition pathway redundancy: The validation engine uses logs via rsyslog, but if rsyslog itself stops, validation cannot occur. To address this, rsyslog health monitoring is separately implemented, with immediate alerts when it stops.
3. Validation logic transparency: Publishing detailed validation logic would allow attackers to design fake events that circumvent it. Therefore, specific validation algorithm details are not public (this article explains concepts only).
4. Block removal operational burden: Since the Agent lacks removal authority, immediate removal of false blocks requires operation from the AI-SLOG side. This becomes “endure for 24 hours or have operators manually intervene.”
These are inevitable tradeoffs derived from the fundamental premise of “do not trust a compromised Agent.” The design deliberately tilts the balance toward security over convenience.
10. Effects in actual operations — Validation in our environment #
BASTION’s DMZ Agent is currently running in production on three public web servers. As of this article’s writing:
- Agent rejected events: 0 (events discarded by validation = all legitimate events)
- Agent heartbeat anomalies: 0 (zero heartbeat interruptions)
- False blocking incidents: 0 (zero blocking of company IPs/partner IPs)
- Unintended removal by 24-hour expiration: 0 (all functioning as expected)
This is a state where “the Agent operates honestly, so validation passes,” and the untrust model shows its true value only when compromise occurs. It operates quietly during normal times and protects operators when something happens.
11. Future development #
The Agent and validation engine are currently in practical use, but room for improvement exists.
- Go language migration: Migrate the current Python Agent implementation to Go, enabling single-binary distribution (easier deployment to customer environments)
- Signature verification: Add Agent binary signature validation and self-verification logic at startup
- Audit log blockchain: Implement tamper-proof recording of validation engine decision history (for customer auditing)
- Multiple validation pathways: Multi-layered validation using not just rsyslog but also SNMP and network flow information
These will be implemented sequentially.
12. Related articles #
- Multi-Layer Correlation Campaign Detection Mechanism — How primary events detected by Agent are judged as an overall attack campaign
- Taking BASTION from “Security Product” to “AI Ops Platform” — The evolution story of BASTION as a whole
- Coming soon Infrastructure log automatic analysis using local LLM
- Coming soon LLM hallucination auditing implementation
13. Contact us #
For companies considering BASTION deployment or interested in joint validation programs, please contact us via the contact form.
For customers with DMZ or isolated environments, the Agent untrust model described in this article becomes a significant competitive differentiator. We will provide proposals with individual quotes tailored to your scope.