06 — Database Schema

Forge uses PostgreSQL 15. This document covers key tables, relationships, maintenance, and useful queries for diagnostics.


ER Diagram — Main Relationships

Organization ──1:N──► Project
             ──1:N──► Inventory ──1:N──► Host
                                ──1:N──► Group (M:N with Host)
             ──1:N──► Credential
             ──1:N──► JobTemplate ──1:N──► Job ──1:N──► JobEvent
             ──1:N──► Team ──M:N──► User
             ──1:N──► NotificationTemplate
             ──M:N──► InstanceGroup ──M:N──► Instance

WorkflowJobTemplate ──1:N──► WorkflowJobNode
Schedule ──► UnifiedJobTemplate (any template type)

Organization ──1:N──► EventRule ──1:N──► EventLog
             ──1:N──► OutboundWebhook

Host ──1:N──► HostFactSnapshot ──1:N──► DriftDetection
Organization ──1:N──► DriftAlertRule ──1:N──► DriftAlert

Role ──M:N──► User
     ──M:N──► Team
     ──parent/child──► Role (hierarchy)

Key Tables

Core Resources

Table Description Growth rate
main_organization Tenant containers Slow (1-10)
auth_user User accounts Slow (10-500)
main_team User groups Slow
main_project Git repositories with playbooks Slow
main_inventory Host collections Slow
main_host Managed systems Medium (10-10,000+)
main_group Host groupings Medium
main_credential Encrypted secrets Slow
main_jobtemplate Templates for execution Slow

Event-Driven Automation

Table Description Growth rate
main_eventrule Webhook rules with conditions and actions Slow (admin-created)
main_eventlog Incoming webhook events and evaluation results Fast (depends on webhook traffic)
main_outboundwebhook Outbound webhook configurations Slow

Drift Detection

Table Description Growth rate
main_hostfactsnapshot Point-in-time host fact captures Fast (one per host per changed job run)
main_driftdetection Detected configuration changes Fast (depends on change frequency)
main_driftalertrule Alert rules for drift thresholds Slow (admin-created)
main_driftalert Triggered alert records Medium

Self-Service Portal

Table Description Growth rate
main_servicecatalogitem Curated portal entries wrapping JT/WFJT Slow (admin-curated)
main_servicerequest End-user requests with lifecycle (pending → approved/rejected → running → terminal) Medium

Authentication (WebAuthn / FIDO2)

Table Description Growth rate
main_webauthncredential FIDO2 credentials owned by users Slow (one or two per user)
main_webauthnregistrationchallenge Short-lived (5 min) registration challenges Volatile — purged on every begin call
main_webauthnauthenticationchallenge Short-lived (5 min) assertion challenges Volatile — purged on every begin call

Policy-as-Code

Table Description Growth rate
main_policy Stored Rego policies pushed to OPA on save Slow
main_policydecision One row per evaluation hit (warn or deny) Medium-fast

IaC Scanning & Supply Chain Security

Table Description Growth rate
main_scanner Configured scanner rows (ansible-lint / checkov / pip-audit) with severity threshold + enforcement. Columns: id, name, description, organization_id, tool, config (jsonb), severity_threshold, enforcement, enabled, applies_to (jsonb), trigger_count, last_run_at, last_run_status, created, modified, created_by_id, modified_by_id. Slow
main_scanresult One row per scanner execution at launch time. Columns: id, scanner_id (SET_NULL), scanner_name, unified_job_id (SET_NULL), unified_job_template_id (SET_NULL), organization_id, triggered_by_id, status (ok/warn/blocked/error/timeout), duration_ms, finding_count, highest_severity, message, raw_output, created, modified. Medium-fast
main_scanfinding One row per finding at or above threshold. Columns: id, scan_result_id (CASCADE), rule_id, severity, file_path, line, message, created, modified. Fast (bursts on noisy scans)

Multi-Tenancy (Tier 3.2)

The existing main_organization table gains 11 additive columns (all nullable or with safe defaults, zero-downtime migration): is_tenant_root, tenant_max_concurrent_jobs, tenant_max_daily_launches, tenant_max_hosts, tenant_max_storage_mb, tenant_isolation_strict, tenant_logo_url, tenant_primary_color, tenant_secondary_color, tenant_custom_domain (indexed), tenant_contact_email.

Table Description Growth rate
main_tenantusage One row per tenant Organization. Columns: id, organization_id (OneToOne), concurrent_jobs_count, launches_today_count, launches_today_window_start, hosts_count, storage_mb_used, last_recalculated_at, created, modified. Updated by the launch hook + Celery beat reconciliation task. Slow (bounded by tenant count)
main_tenantquotaevent One row per quota decision (allow or block) at job launch. Columns: id, organization_id (SET_NULL), organization_name (cached so the row survives org delete), quota_kind (concurrent_jobs/daily_launches/hosts/storage_mb), decision (allowed/blocked), current_value, limit_value, triggered_by_id, unified_job_template_id, message, created, modified. Medium
main_tenantisolationevent One row per cross-tenant read observed when tenant_isolation_strict=True (v1 audit only). Columns: id, user_id, user_organization_id, accessed_organization_id, resource_type, resource_id, request_path, blocked (always False in v1), created, modified. Medium-fast when strict mode is on

See docs/22-multi-tenancy.md.

Execution (GROW FAST — cleanup required)

Table Description Growth rate
main_job Execution records Fast (100/day+)
main_jobevent Output for every job (partitioned!) Very fast (500K/day+)
main_projectupdate Project sync records Medium
main_inventoryupdate Inventory sync records Medium
main_activitystream Audit log of all changes Fast
main_auditevent Immutable security audit log Fast
main_eventlog EDA webhook event logs Fast (depends on webhook volume)
main_notification Sent notifications Medium
main_hostfactsnapshot Host fact snapshots for drift detection Fast (cleanup via cleanup_old_snapshots)
main_driftdetection Detected configuration drift items Fast

Watch out


Polymorphic Models

Jobs use Django polymorphism — all types share a base table:

UnifiedJobTemplate (base table)
├── JobTemplate
├── Project
├── InventorySource
├── SystemJobTemplate
└── WorkflowJobTemplate

UnifiedJob (base table)
├── Job
├── ProjectUpdate
├── InventoryUpdate
├── AdHocCommand
├── SystemJob
└── WorkflowJob

The endpoint /api/v2/unified_jobs/ shows ALL job types in a single list.


Database Maintenance

Backup

# Automated backup (uses the built-in script)
docker compose exec forge-task bash /etc/forge/backup.sh

# Manual backup
docker compose exec postgres pg_dump -U forge forge | gzip > backup_$(date +%Y%m%d).sql.gz

Restore

# Stop the application
docker compose stop forge-web forge-task

# Restore
gunzip -c backup_20260310.sql.gz | docker compose exec -T postgres psql -U forge forge

# Restart
docker compose start forge-web forge-task
docker compose exec forge-web awx-manage migrate  # if version differs

Cleanup (MANDATORY in production)

# Delete jobs older than 90 days (and their events/partitions)
forge-manage cleanup_jobs --days=90

# Delete activity stream older than one year
forge-manage cleanup_activitystream --days=365

# Delete expired sessions and tokens
forge-manage cleanup_sessions
forge-manage cleanup_tokens

Recommendation: Set up a System Job in the Forge UI that runs cleanup_jobs daily.

Vacuum

# Reclaim dead tuples (run after large deletions)
docker compose exec postgres vacuumdb -U forge -z forge

PostgreSQL Tuning for Forge

Parameter Recommendation Why
shared_buffers 25% of RAM (e.g., 2GB) Data cache
effective_cache_size 75% of RAM (e.g., 6GB) Query planner hint
work_mem 64MB Per-query memory for sort/join
maintenance_work_mem 512MB For VACUUM and CREATE INDEX
random_page_cost 1.1 For SSD storage
log_min_duration_statement 1000 Log queries > 1 second

Useful Diagnostic Queries

System overview

SELECT
    (SELECT count(*) FROM main_organization) AS organizations,
    (SELECT count(*) FROM auth_user) AS users,
    (SELECT count(*) FROM main_host) AS hosts,
    (SELECT count(*) FROM main_jobtemplate) AS templates,
    (SELECT count(*) FROM main_job) AS jobs;

Jobs by status (last 7 days)

SELECT status, count(*)
FROM main_job
WHERE created > NOW() - INTERVAL '7 days'
GROUP BY status ORDER BY count(*) DESC;

Longest jobs (average by template)

SELECT jt.name, count(j.id) AS runs,
       round(avg(j.elapsed)::numeric, 1) AS avg_sec
FROM main_job j
JOIN main_jobtemplate jt ON j.job_template_id = jt.id
WHERE j.status = 'successful' AND j.created > NOW() - INTERVAL '30 days'
GROUP BY jt.name ORDER BY avg_sec DESC;

Hosts with failures

SELECT h.name, i.name AS inventory
FROM main_host h
JOIN main_inventory i ON h.inventory_id = i.id
WHERE h.has_active_failures = true;

Accessing the database

# Interactive shell
docker compose exec postgres psql -U forge forge

# Or through Django
docker compose exec forge-web awx-manage dbshell

Migrations

Forge has 252 migrations in forge/main/migrations/.