Incident Response Playbook: From Detection to Recovery

When a breach happens, panic is your enemy. The difference between a minor incident and a catastrophic breach often comes down to one thing: preparation.

This playbook provides a structured approach to incident response that you can adapt to your organization.

The Incident Response Lifecycle

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Preparation │───►│  Detection  │───►│ Containment │
└─────────────┘    └─────────────┘    └──────┬──────┘

┌─────────────┐    ┌─────────────┐    ┌──────▼──────┐
│   Lessons   │◄───│   Recovery  │◄───│ Eradication │
│   Learned   │    │             │    │             │
└─────────────┘    └─────────────┘    └─────────────┘

Phase 1: Preparation (Before Incidents)

Build Your IR Team

RoleResponsibility
IR LeadCoordinates response, makes decisions
Security AnalystTechnical investigation, forensics
IT OperationsSystem access, containment actions
CommunicationsInternal/external messaging
LegalRegulatory compliance, liability
ManagementResource allocation, escalation

Essential Documentation

## IR Runbook Contents
□ Contact list (24/7 phone numbers)
□ Escalation matrix
□ System inventory and owners
□ Network diagrams
□ Backup locations and procedures
□ Vendor contacts (ISP, cloud providers, security tools)
□ Legal/regulatory requirements
□ Communication templates

Tools Ready to Deploy

# Forensics toolkit
- Volatility (memory analysis)
- Autopsy/Sleuth Kit (disk forensics)
- Wireshark (network capture)
- KAPE (artifact collection)

# Response tools
- osquery (endpoint visibility)
- Velociraptor (DFIR at scale)
- TheHive (case management)
- MISP (threat intel sharing)

Phase 2: Detection

Alert Triage Process

Alert Received


┌─────────────────┐
│ Is it a true    │──No──► Close as false positive
│ positive?       │        Document why
└────────┬────────┘
         │ Yes

┌─────────────────┐
│ What's the      │
│ severity?       │
└────────┬────────┘

    ┌────┴────┐
    ▼         ▼
Critical   Low/Medium
    │         │
    ▼         ▼
Page IR    Queue for
Team       investigation

Severity Classification

SeverityDescriptionResponse TimeExample
P1 - CriticalActive breach, data exfilImmediate (15 min)Ransomware executing
P2 - HighConfirmed compromise1 hourMalware on endpoint
P3 - MediumSuspicious activity4 hoursFailed login spike
P4 - LowMinor policy violation24 hoursUnauthorized software

Initial Assessment Questions

## Quick Triage (5 minutes)
1. What systems are affected?
2. What data could be at risk?
3. Is the attack ongoing?
4. What's the potential blast radius?
5. Do we need to escalate immediately?

Phase 3: Containment

Short-Term Containment (Stop the Bleeding)

# Network isolation
iptables -I INPUT -s $ATTACKER_IP -j DROP
iptables -I OUTPUT -d $ATTACKER_IP -j DROP

# Disable compromised account
net user compromised_user /active:no  # Windows
usermod -L compromised_user           # Linux

# Isolate host (but keep it running for forensics)
# Option 1: Network isolation
ifconfig eth0 down

# Option 2: VLAN quarantine
# Move to isolated VLAN via switch config

# Option 3: EDR isolation
# Use your EDR's network containment feature

Evidence Preservation

CRITICAL: Preserve evidence BEFORE eradication!

# Memory dump (do this FIRST - volatile!)
# Linux
sudo dd if=/dev/mem of=/mnt/forensics/memory.dump

# Windows (using winpmem)
winpmem_mini_x64.exe memory.raw

# Disk image (if system can be taken offline)
sudo dd if=/dev/sda of=/mnt/forensics/disk.img bs=4M status=progress

# Network capture
tcpdump -i eth0 -w /mnt/forensics/capture.pcap

# Collect logs
tar -czvf /mnt/forensics/logs.tar.gz \
    /var/log/ \
    /var/log/auth.log \
    /var/log/syslog

Document Everything

## Incident Timeline
| Timestamp (UTC) | Action | Actor | Notes |
|-----------------|--------|-------|-------|
| 2025-06-15 14:23 | Alert triggered | SIEM | Suspicious PowerShell |
| 2025-06-15 14:25 | Analyst assigned | @jsmith | P2 severity |
| 2025-06-15 14:32 | Confirmed malicious | @jsmith | C2 beacon identified |
| 2025-06-15 14:35 | Host isolated | @mchen | Network containment via EDR |

Phase 4: Eradication

Identify Root Cause

## Root Cause Analysis
1. How did the attacker get in?
   - [ ] Phishing
   - [ ] Vulnerable service
   - [ ] Stolen credentials
   - [ ] Supply chain
   - [ ] Insider

2. How did they move laterally?
   - [ ] Credential dumping
   - [ ] Exploiting trust relationships
   - [ ] Misconfigured permissions

3. What persistence mechanisms exist?
   - [ ] Scheduled tasks
   - [ ] Services
   - [ ] Registry run keys
   - [ ] Web shells
   - [ ] Backdoor accounts

Remove the Threat

# Remove malware
rm -f /path/to/malware

# Kill malicious processes
pkill -9 -f "malicious_process"

# Remove persistence
# Check crontabs
crontab -l
crontab -r  # Remove if malicious

# Check systemd services
systemctl list-units --type=service
systemctl disable malicious.service
rm /etc/systemd/system/malicious.service

# Check startup scripts
ls -la /etc/init.d/
ls -la ~/.bashrc ~/.profile  # Check for backdoors

# Reset compromised credentials
# ALL credentials the attacker could have accessed

Verify Eradication

# Scan for remaining IOCs
grep -r "malicious_string" /
find / -name "*.suspicious" 2>/dev/null

# Check for unknown processes
ps auxf | grep -v "known_good"

# Verify network connections
netstat -tulpn | grep ESTABLISHED
ss -tulpn

# Run security scan
clamscan -r /
rkhunter --check

Phase 5: Recovery

Restore Operations

## Recovery Checklist
□ Restore from clean backups (verified uncompromised)
□ Rebuild systems from known-good images
□ Reset ALL potentially compromised credentials
□ Patch vulnerabilities that enabled the attack
□ Increase monitoring on affected systems
□ Gradual return to production (staged)
□ Verify business functionality

Validation Period

# Enhanced monitoring for 30 days post-incident
# - Additional logging
# - More aggressive alerting thresholds
# - Daily IOC sweeps

# Example: Watch for reinfection
watch -n 60 'grep -c "IOC_STRING" /var/log/syslog'

Phase 6: Lessons Learned

Post-Incident Review (48-72 hours after closure)

## PIR Template

### Incident Summary
- **ID**: INC-2025-0615
- **Duration**: 14:23 - 18:45 UTC (4h 22m)
- **Severity**: P2 - High
- **Impact**: 3 endpoints compromised, no data exfiltration confirmed

### Timeline
[Detailed timeline from documentation]

### What Went Well
- Detection within 5 minutes of initial activity
- Containment prevented lateral movement
- Clear communication throughout

### What Could Be Improved
- Initial triage took 15 minutes (target: 5)
- Backup restoration process unclear
- Missing runbook for this attack type

### Action Items
| Action | Owner | Due Date |
|--------|-------|----------|
| Create ransomware-specific runbook | @jsmith | 2025-06-30 |
| Improve backup restore documentation | @mchen | 2025-06-25 |
| Add detection for this TTP | @security | 2025-06-22 |

### Root Cause
Phishing email bypassed email security, user clicked malicious link,
macro executed PowerShell downloader.

### Recommendations
1. Implement macro blocking for external emails
2. Deploy browser isolation for risky clicks
3. Conduct phishing awareness training

Communication Templates

Internal Notification (to staff)

Subject: [ACTION REQUIRED] Security Incident - Password Reset

Team,

Our security team detected suspicious activity on our network. 
As a precaution, please reset your password immediately at [LINK].

What to do:
1. Reset your password now
2. Enable MFA if you haven't already
3. Report any suspicious emails to [email protected]

What NOT to do:
- Don't click links in unexpected emails
- Don't share your credentials with anyone

We'll provide updates as we learn more.

- Security Team

External Notification (to customers)

Subject: Important Security Notice

Dear Customer,

We recently identified unauthorized access to some of our systems. 
We immediately took action to contain the incident and engaged 
cybersecurity experts to investigate.

What happened:
[Brief, factual description]

What information was involved:
[Specific data types]

What we're doing:
[Actions taken]

What you can do:
[Actionable steps for customers]

For questions, contact: [email protected]

We sincerely apologize for any concern this may cause.

Quick Reference: IR Checklist

## Immediate (First 15 minutes)
□ Confirm the incident is real
□ Classify severity
□ Alert IR team
□ Begin documentation
□ Preserve volatile evidence

## Short-term (First hour)
□ Contain the threat
□ Identify affected systems
□ Collect additional evidence
□ Notify stakeholders
□ Establish communication channel

## Medium-term (First 24 hours)
□ Complete forensic collection
□ Identify root cause
□ Eradicate threat
□ Begin recovery planning
□ Legal/regulatory assessment

## Long-term (Week+)
□ Full recovery
□ Enhanced monitoring
□ Post-incident review
□ Implement improvements
□ Update documentation

Conclusion

Incident response is a skill developed through practice, not just reading. Key takeaways:

  1. Prepare before incidents — Have plans, tools, and contacts ready
  2. Document everything — Memory fades, logs don’t
  3. Preserve before you eradicate — Evidence is fragile
  4. Communicate clearly — Panic spreads faster than malware
  5. Learn from every incident — Each one makes you stronger

The goal isn’t to prevent all incidents — that’s impossible. The goal is to detect quickly, respond effectively, and emerge stronger.


Related: Linux Log Analysis for Security Monitoring