16 — Drift Detection

Automatic configuration change tracking between Ansible runs. Captures host fact snapshots, compares them to detect drift, and alerts when thresholds are exceeded.


Architecture

Job completes (fact cache enabled)
         
         
  finish_fact_cache()          ── updates host.ansible_facts
         
         
  capture_fact_snapshot        ── Celery task (async)
                                 computes SHA-256 hash of facts
                                 skips if hash matches previous snapshot
         
  HostFactSnapshot created     ── full ansible_facts stored
         
         
  detect_drift                 ── Celery task (async)
                                 compares with previous snapshot
                                 categorizes changes (packages, kernel, ...)
                                 assigns severity (low  critical)
         
  DriftDetection records       ── one per changed fact key
         
         
  evaluate_drift_alerts        ── Celery task (async)
                                 checks all enabled DriftAlertRules
                                 threshold, window, cooldown
         
  DriftAlert + notification    ── if threshold exceeded

The entire pipeline is asynchronous — capture_fact_snapshot.delay(job.id) is called from RunJob.post_run_hook() after finish_fact_cache() completes, so it does not slow down job completion.


Models

HostFactSnapshot

Point-in-time capture of ansible_facts for a single host.

Field Type Description
host FK → Host The host whose facts were captured
job FK → Job (nullable) The job that triggered the capture
inventory FK → Inventory Inherited from host
organization FK → Organization Inherited from inventory
captured_at DateTime When the snapshot was taken
facts JSON Full ansible_facts dictionary
facts_hash CharField(64) SHA-256 of sorted JSON — for quick equality check

Key behavior: A snapshot is only created when facts_hash differs from the most recent snapshot for the same host. This prevents storing duplicate snapshots when facts haven't changed.

DriftDetection

A single detected configuration change between two consecutive snapshots.

Field Type Description
host FK → Host Which host changed
snapshot_before FK → HostFactSnapshot Previous state
snapshot_after FK → HostFactSnapshot New state
job FK → Job Which job caused the change
category CharField packages, services, users_groups, network, mounts, kernel, other
severity CharField low, medium, high, critical
fact_path CharField Top-level fact key (e.g. ansible_packages)
summary CharField Human-readable change description
detail JSON {before, after, diff_type}
acknowledged Boolean Whether an admin has reviewed this change
acknowledged_by FK → User Who acknowledged
acknowledged_at DateTime When acknowledged

Category Classification

Category Matched facts Default severity
packages ansible_packages, ansible_pkg_mgr, package, pip medium
services ansible_services, ansible_service_mgr, systemd medium
users_groups ansible_user_, user, group, passwd* high
network ansible_all_ipv4/6_addresses, ansible_interfaces, tcp, port high
mounts ansible_mounts, ansible_devices, disk, lvm medium
kernel ansible_kernel*, ansible_sysctl, ansible_selinux critical
other Everything else low

Volatile keys skipped: ansible_date_time, ansible_uptime_seconds, ansible_local, module_setup, gather_subset.

DriftAlertRule

User-defined rule for alerting when drift exceeds a threshold.

Field Type Description
name CharField Rule name (unique per org)
organization FK → Organization Scope
enabled Boolean Active or paused
inventory FK → Inventory (optional) Filter by inventory
host_filter CharField fnmatch pattern (e.g. web-*)
categories JSON list Which drift categories to match (empty = all)
severity_min CharField Minimum severity to count (low/medium/high/critical)
threshold_count Integer How many drift items trigger the alert
threshold_window_minutes Integer Time window for counting
cooldown_minutes Integer Minimum time between alert firings
notification_template FK → NotificationTemplate Where to send alert

DriftAlert

Immutable record of a triggered alert.

Field Type Description
alert_rule FK → DriftAlertRule Which rule triggered
host FK → Host Which host caused it
drift_count Integer How many drift items were counted
summary Text Human-readable description
notification_status CharField pending, sent, failed
notification_error Text Error message if send failed

API Endpoints

Fact Snapshots (read-only)

GET    /api/v2/fact_snapshots/                          # List (filterable by host, inventory, job)
GET    /api/v2/fact_snapshots/{id}/                     # Detail (includes full facts)

Drift Detections (read-only + acknowledge)

GET    /api/v2/drift_detections/                        # List (filter: host, inventory, category, severity, acknowledged)
GET    /api/v2/drift_detections/{id}/                   # Detail (includes before/after diff)
POST   /api/v2/drift_detections/{id}/acknowledge/       # Mark as acknowledged
POST   /api/v2/drift_detections/compare/                # Compare two snapshots: {snapshot_a, snapshot_b}
GET    /api/v2/drift_detections/export/                 # CSV compliance report (filter: host, inventory, date range)
GET    /api/v2/drift_detections/summary/                # Dashboard stats: {total, unacknowledged, by_category, by_severity}

Drift Alert Rules (CRUD)

GET    /api/v2/drift_alert_rules/                       # List
POST   /api/v2/drift_alert_rules/                       # Create
GET    /api/v2/drift_alert_rules/{id}/                  # Detail
PATCH  /api/v2/drift_alert_rules/{id}/                  # Update
DELETE /api/v2/drift_alert_rules/{id}/                  # Delete
POST   /api/v2/drift_alert_rules/{id}/enable/           # Enable
POST   /api/v2/drift_alert_rules/{id}/disable/          # Disable

Drift Alerts (read-only)

GET    /api/v2/drift_alerts/                            # List (filter: alert_rule, host, notification_status)
GET    /api/v2/drift_alerts/{id}/                       # Detail

Host Drift History (nested)

GET    /api/v2/hosts/{id}/drift/                        # All drift for a specific host

Alert Rule — Create Example

curl -X POST https://forge.example.com/api/v2/drift_alert_rules/ \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Critical kernel changes",
    "organization": 1,
    "categories": ["kernel", "users_groups"],
    "severity_min": "high",
    "threshold_count": 1,
    "threshold_window_minutes": 60,
    "cooldown_minutes": 30,
    "notification_template": 5
  }'

CSV Compliance Export

# Export all drift for an inventory in the last 30 days
curl -H "Authorization: Bearer <token>" \
  "https://forge.example.com/api/v2/drift_detections/export/?inventory=3&date_from=2026-03-01" \
  -o drift_report.csv

Output columns: ID, Host, Detected At, Category, Severity, Fact Path, Summary, Diff Type, Acknowledged.


Snapshot Cleanup

Periodic task cleanup_old_snapshots runs as a Celery beat task:


Frontend

Four pages under the Compliance sidebar section:

Page Route Description
Drift Detections /drift_detections Filterable list with category/severity badges, CSV export
Drift Detection Detail /drift_detections/:id Before/after JSON diff, acknowledge button
Drift Alert Rules /drift_alert_rules CRUD list with create/edit/enable/disable
Drift Alert Rule Detail /drift_alert_rules/:id Config summary, recent triggered alerts
Drift Alerts /drift_alerts Read-only list of triggered alerts
Drift Alert Detail /drift_alerts/:id Alert summary, notification status/error
Fact Snapshots /fact_snapshots Browse captured snapshots by host/job

Security