16 — Drift Detection

Automatic configuration change tracking between Ansible runs. Captures host fact snapshots, compares them to detect drift, and alerts when thresholds are exceeded.

Architecture

Job completes (fact cache enabled)
         │
         ▼
  finish_fact_cache()          ── updates host.ansible_facts
         │
         ▼
  capture_fact_snapshot        ── Celery task (async)
         │                        computes SHA-256 hash of facts
         │                        skips if hash matches previous snapshot
         ▼
  HostFactSnapshot created     ── full ansible_facts stored
         │
         ▼
  detect_drift                 ── Celery task (async)
         │                        compares with previous snapshot
         │                        categorizes changes (packages, kernel, ...)
         │                        assigns severity (low → critical)
         ▼
  DriftDetection records       ── one per changed fact key
         │
         ▼
  evaluate_drift_alerts        ── Celery task (async)
         │                        checks all enabled DriftAlertRules
         │                        threshold, window, cooldown
         ▼
  DriftAlert + notification    ── if threshold exceeded

The entire pipeline is asynchronous — capture_fact_snapshot.delay(job.id) is called from RunJob.post_run_hook() after finish_fact_cache() completes, so it does not slow down job completion.

Models

HostFactSnapshot

Point-in-time capture of ansible_facts for a single host.

Field	Type	Description
`host`	FK → Host	The host whose facts were captured
`job`	FK → Job (nullable)	The job that triggered the capture
`inventory`	FK → Inventory	Inherited from host
`organization`	FK → Organization	Inherited from inventory
`captured_at`	DateTime	When the snapshot was taken
`facts`	JSON	Full `ansible_facts` dictionary
`facts_hash`	CharField(64)	SHA-256 of sorted JSON — for quick equality check

Key behavior: A snapshot is only created when facts_hash differs from the most recent snapshot for the same host. This prevents storing duplicate snapshots when facts haven't changed.

DriftDetection

A single detected configuration change between two consecutive snapshots.

Field	Type	Description
`host`	FK → Host	Which host changed
`snapshot_before`	FK → HostFactSnapshot	Previous state
`snapshot_after`	FK → HostFactSnapshot	New state
`job`	FK → Job	Which job caused the change
`category`	CharField	packages, services, users_groups, network, mounts, kernel, other
`severity`	CharField	low, medium, high, critical
`fact_path`	CharField	Top-level fact key (e.g. `ansible_packages`)
`summary`	CharField	Human-readable change description
`detail`	JSON	`{before, after, diff_type}`
`acknowledged`	Boolean	Whether an admin has reviewed this change
`acknowledged_by`	FK → User	Who acknowledged
`acknowledged_at`	DateTime	When acknowledged

Category Classification

Category	Matched facts	Default severity
`packages`	ansible_packages, ansible_pkg_mgr, package, pip	medium
`services`	ansible_services, ansible_service_mgr, systemd	medium
`users_groups`	ansible_user_, user, group, passwd*	high
`network`	ansible_all_ipv4/6_addresses, ansible_interfaces, tcp, port	high
`mounts`	ansible_mounts, ansible_devices, disk, lvm	medium
`kernel`	ansible_kernel*, ansible_sysctl, ansible_selinux	critical
`other`	Everything else	low

Volatile keys skipped: ansible_date_time, ansible_uptime_seconds, ansible_local, module_setup, gather_subset.

DriftAlertRule

User-defined rule for alerting when drift exceeds a threshold.

Field	Type	Description
`name`	CharField	Rule name (unique per org)
`organization`	FK → Organization	Scope
`enabled`	Boolean	Active or paused
`inventory`	FK → Inventory (optional)	Filter by inventory
`host_filter`	CharField	fnmatch pattern (e.g. `web-*`)
`categories`	JSON list	Which drift categories to match (empty = all)
`severity_min`	CharField	Minimum severity to count (low/medium/high/critical)
`threshold_count`	Integer	How many drift items trigger the alert
`threshold_window_minutes`	Integer	Time window for counting
`cooldown_minutes`	Integer	Minimum time between alert firings
`notification_template`	FK → NotificationTemplate	Where to send alert

DriftAlert

Immutable record of a triggered alert.

Field	Type	Description
`alert_rule`	FK → DriftAlertRule	Which rule triggered
`host`	FK → Host	Which host caused it
`drift_count`	Integer	How many drift items were counted
`summary`	Text	Human-readable description
`notification_status`	CharField	pending, sent, failed
`notification_error`	Text	Error message if send failed

API Endpoints

Fact Snapshots (read-only)

GET    /api/v2/fact_snapshots/                          # List (filterable by host, inventory, job)
GET    /api/v2/fact_snapshots/{id}/                     # Detail (includes full facts)

Drift Detections (read-only + acknowledge)

GET    /api/v2/drift_detections/                        # List (filter: host, inventory, category, severity, acknowledged)
GET    /api/v2/drift_detections/{id}/                   # Detail (includes before/after diff)
POST   /api/v2/drift_detections/{id}/acknowledge/       # Mark as acknowledged
POST   /api/v2/drift_detections/compare/                # Compare two snapshots: {snapshot_a, snapshot_b}
GET    /api/v2/drift_detections/export/                 # CSV compliance report (filter: host, inventory, date range)
GET    /api/v2/drift_detections/summary/                # Dashboard stats: {total, unacknowledged, by_category, by_severity}

Drift Alert Rules (CRUD)

GET    /api/v2/drift_alert_rules/                       # List
POST   /api/v2/drift_alert_rules/                       # Create
GET    /api/v2/drift_alert_rules/{id}/                  # Detail
PATCH  /api/v2/drift_alert_rules/{id}/                  # Update
DELETE /api/v2/drift_alert_rules/{id}/                  # Delete
POST   /api/v2/drift_alert_rules/{id}/enable/           # Enable
POST   /api/v2/drift_alert_rules/{id}/disable/          # Disable

Drift Alerts (read-only)

GET    /api/v2/drift_alerts/                            # List (filter: alert_rule, host, notification_status)
GET    /api/v2/drift_alerts/{id}/                       # Detail

Host Drift History (nested)

GET    /api/v2/hosts/{id}/drift/                        # All drift for a specific host

Alert Rule — Create Example

curl -X POST https://forge.example.com/api/v2/drift_alert_rules/ \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Critical kernel changes",
    "organization": 1,
    "categories": ["kernel", "users_groups"],
    "severity_min": "high",
    "threshold_count": 1,
    "threshold_window_minutes": 60,
    "cooldown_minutes": 30,
    "notification_template": 5
  }'

CSV Compliance Export

# Export all drift for an inventory in the last 30 days
curl -H "Authorization: Bearer <token>" \
  "https://forge.example.com/api/v2/drift_detections/export/?inventory=3&date_from=2026-03-01" \
  -o drift_report.csv

Output columns: ID, Host, Detected At, Category, Severity, Fact Path, Summary, Diff Type, Acknowledged.

Snapshot Cleanup

Periodic task cleanup_old_snapshots runs as a Celery beat task:

Default retention: 90 days
Always keeps at least 2 snapshots per host
Run manually: forge-manage shell -c "from forge.main.tasks.drift import cleanup_old_snapshots; cleanup_old_snapshots()"

Frontend

Four pages under the Compliance sidebar section:

Page	Route	Description
Drift Detections	`/drift_detections`	Filterable list with category/severity badges, CSV export
Drift Detection Detail	`/drift_detections/:id`	Before/after JSON diff, acknowledge button
Drift Alert Rules	`/drift_alert_rules`	CRUD list with create/edit/enable/disable
Drift Alert Rule Detail	`/drift_alert_rules/:id`	Config summary, recent triggered alerts
Drift Alerts	`/drift_alerts`	Read-only list of triggered alerts
Drift Alert Detail	`/drift_alerts/:id`	Alert summary, notification status/error
Fact Snapshots	`/fact_snapshots`	Browse captured snapshots by host/job

Security

RBAC: All endpoints require authentication. Non-admin users only see drift for hosts in their organization.
No write access to drift data: DriftDetection records are created automatically by the system — users can only acknowledge, not modify.
Immutable snapshots: HostFactSnapshot records cannot be updated or deleted through the API.