Monitoring AI Agents: Logging, Alerting, and Audit Trails
How to monitor what your AI agent is doing in real-time. Set up logging, action auditing, and alerts for suspicious behavior.
AI agents are no longer just answering questions. They execute shell commands, read and write files, send messages on your behalf, make network requests, and interact with APIs. That level of autonomy demands serious monitoring. If you deploy an AI agent without logging, alerting, and audit trails, you are flying blind with a copilot that has root access.
This guide covers practical techniques for monitoring AI agents in production — from structured logging to kill switches.
Why Monitoring AI Agents Is Critical
A traditional chatbot produces text. A modern AI agent acts. It can:
- Execute arbitrary code via shell commands
- Read, create, modify, and delete files on disk
- Send emails, Slack messages, or social media posts
- Make HTTP requests to internal and external services
- Query databases and modify records
- Install packages and change system configuration
Every one of those actions carries risk. A hallucinated rm -rf / is not a hypothetical — it is one bad prompt away. An agent that curls data to an unknown external domain could be exfiltrating secrets. Without monitoring, you will not know until the damage is done.
The principle is simple: trust but verify, in real time.
What to Log
Log everything the agent does. Not just its final responses — every intermediate action. At minimum, capture:
- Tool calls: Every function invocation, its arguments, and its return value
- File access: Every read, write, create, and delete operation with full paths
- Network requests: URLs, methods, headers, response codes
- Messages sent: Any outbound communication (email, chat, webhooks)
- Command execution: Full shell commands, working directory, exit codes, stdout/stderr
- Authentication events: Token usage, API key access, privilege escalations
- Decision metadata: The prompt or context that led the agent to take the action
Structured Log Format
Use structured JSON logs. They are parseable, searchable, and compatible with every modern log aggregation tool.
{
"timestamp": "2025-06-01T14:32:07.412Z",
"agent_id": "agent-main-01",
"session_id": "sess_a8f3c901",
"action": "exec",
"command": "cat /etc/passwd",
"working_dir": "/home/dev",
"exit_code": 0,
"risk_level": "high",
"trigger": "user_prompt",
"prompt_hash": "sha256:e3b0c44298fc..."
}
Store logs in append-only storage. Immutable logs are essential for forensic analysis and compliance. Ship them to a central aggregator — ELK stack, Loki, or even a dedicated S3 bucket with write-only IAM policies.
Real-Time Alerting
Logging without alerting is just archaeology. You need real-time detection of suspicious behavior.
Dangerous Patterns to Flag
Define rules for actions that should trigger immediate alerts:
- Destructive commands:
rm -rf,mkfs,dd if=/dev/zero,DROP TABLE - Data exfiltration signals:
curlorwgetto unknown external domains - Sensitive file access:
/etc/shadow,.env, private keys,~/.ssh/ - Privilege escalation:
sudo,chmod 777,chown root - Unusual volume: More than N tool calls per minute (burst detection)
- Network anomalies: Connections to IP addresses outside an allow list
Simple Bash Monitoring Script
Here is a lightweight watcher that tails your agent log and fires alerts on suspicious patterns:
#!/bin/bash
# agent-monitor.sh — Real-time AI agent action monitor
LOGFILE="/var/log/ai-agent/actions.log"
ALERT_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
PATTERNS=(
"rm -rf"
"/etc/shadow"
"/etc/passwd"
"chmod 777"
"curl.*\|.*bash"
"wget.*\|.*sh"
"DROP TABLE"
"sudo su"
)
COMBINED=$(IFS="|"; echo "${PATTERNS[*]}")
tail -F "$LOGFILE" | while read -r line; do
if echo "$line" | grep -qEi "$COMBINED"; then
ALERT_MSG="🚨 SUSPICIOUS AGENT ACTION DETECTED:\n$line"
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$ALERT_MSG\"}" \
"$ALERT_WEBHOOK"
logger -t agent-alert "SUSPICIOUS: $line"
fi
done
Run it as a systemd service and it becomes a persistent watchdog for your agent.
Audit Trails for Compliance
If you operate under SOC 2, GDPR, HIPAA, or any regulatory framework, you need provable audit trails. AI agent actions must be traceable back to a trigger (who asked, what prompt, what context).
Auditd Rules for Agent Processes
Use Linux auditd to monitor file access and command execution at the kernel level:
# /etc/audit/rules.d/ai-agent.rules
# Monitor agent workspace for all file operations
-w /home/agent/workspace/ -p rwxa -k agent_file_access
# Monitor sensitive directories
-w /etc/ -p r -k agent_etc_read
-w /root/.ssh/ -p rwxa -k agent_ssh_access
# Track all execve calls by the agent user
-a always,exit -F arch=b64 -S execve -F uid=1001 -k agent_exec
# Monitor network-related syscalls
-a always,exit -F arch=b64 -S connect -F uid=1001 -k agent_network
Load the rules with sudo augenrules --load and query them using ausearch -k agent_exec.
Syslog Configuration
Route agent logs to a dedicated facility for separation and retention:
# /etc/rsyslog.d/50-ai-agent.conf
# Create a dedicated log for AI agent actions
if $programname == 'ai-agent' then /var/log/ai-agent/actions.log
if $programname == 'agent-alert' then /var/log/ai-agent/alerts.log
# Forward to remote syslog server for tamper-proof storage
if $programname startswith 'ai-agent' then @@syslog.internal.corp:514
This ensures your agent logs survive even if the agent compromises the local machine.
Rate Limiting Agent Actions
An unconstrained agent can execute hundreds of commands per second. Implement rate limits as a safety layer:
- Per-action limits: Maximum 10 shell commands per minute
- Per-session limits: Maximum 100 tool calls per session
- Cooldown periods: Force a 5-second delay after any destructive action
- Escalation thresholds: After 3 flagged actions, pause the agent and notify an operator
Build this into your agent middleware. A simple token bucket or sliding window counter in Redis works well:
# Check rate limit before allowing agent action
CURRENT=$(redis-cli INCR "agent:actions:$(date +%M)")
redis-cli EXPIRE "agent:actions:$(date +%M)" 60
if [ "$CURRENT" -gt 10 ]; then
echo "RATE LIMIT: Agent exceeded 10 actions/minute. Pausing."
kill -STOP "$AGENT_PID"
# Alert operator
fi
Kill Switches
Every production AI agent needs an emergency stop mechanism. This is non-negotiable.
Implement kill switches at multiple layers:
- Application level: A flag in your config store (Redis key, environment variable) that the agent checks before every action
- Process level: A systemd service you can
systemctl stopinstantly - Network level: Firewall rules that cut the agent off from external services
- Session level: Ability to terminate a specific agent session without affecting others
A kill switch is not paranoia — it is basic operational hygiene. You would not run a CNC machine without an emergency stop button. Do not run an AI agent without one either.
Putting It All Together
A practical monitoring stack for AI agents looks like this:
- Structured JSON logging on every tool call, routed through syslog
- Auditd rules watching the agent user and workspace at the kernel level
- Real-time pattern matching on the log stream with alerts to Slack or PagerDuty
- Rate limiting middleware with automatic pause on threshold breach
- Append-only remote log storage for compliance and forensics
- Kill switch accessible via API, CLI, and dashboard
None of these components are complex individually. Together, they form a defense-in-depth monitoring system that lets you harness the power of AI agents without losing control.
The agents are getting more capable every month. Your monitoring should be keeping pace. Start logging today — because when something goes wrong, the only thing worse than an incident is an incident you cannot investigate.