06 — Database Schema
Forge uses PostgreSQL 15. This document covers key tables, relationships,
maintenance, and useful queries for diagnostics.
ER Diagram — Main Relationships
Organization ──1:N──► Project
──1:N──► Inventory ──1:N──► Host
│ ──1:N──► Group (M:N with Host)
──1:N──► Credential
──1:N──► JobTemplate ──1:N──► Job ──1:N──► JobEvent
──1:N──► Team ──M:N──► User
──1:N──► NotificationTemplate
──M:N──► InstanceGroup ──M:N──► Instance
WorkflowJobTemplate ──1:N──► WorkflowJobNode
Schedule ──► UnifiedJobTemplate (any template type)
Organization ──1:N──► EventRule ──1:N──► EventLog
──1:N──► OutboundWebhook
Host ──1:N──► HostFactSnapshot ──1:N──► DriftDetection
Organization ──1:N──► DriftAlertRule ──1:N──► DriftAlert
Role ──M:N──► User
──M:N──► Team
──parent/child──► Role (hierarchy)
Key Tables
Core Resources
| Table |
Description |
Growth rate |
main_organization |
Tenant containers |
Slow (1-10) |
auth_user |
User accounts |
Slow (10-500) |
main_team |
User groups |
Slow |
main_project |
Git repositories with playbooks |
Slow |
main_inventory |
Host collections |
Slow |
main_host |
Managed systems |
Medium (10-10,000+) |
main_group |
Host groupings |
Medium |
main_credential |
Encrypted secrets |
Slow |
main_jobtemplate |
Templates for execution |
Slow |
Event-Driven Automation
| Table |
Description |
Growth rate |
main_eventrule |
Webhook rules with conditions and actions |
Slow (admin-created) |
main_eventlog |
Incoming webhook events and evaluation results |
Fast (depends on webhook traffic) |
main_outboundwebhook |
Outbound webhook configurations |
Slow |
Drift Detection
| Table |
Description |
Growth rate |
main_hostfactsnapshot |
Point-in-time host fact captures |
Fast (one per host per changed job run) |
main_driftdetection |
Detected configuration changes |
Fast (depends on change frequency) |
main_driftalertrule |
Alert rules for drift thresholds |
Slow (admin-created) |
main_driftalert |
Triggered alert records |
Medium |
Self-Service Portal
| Table |
Description |
Growth rate |
main_servicecatalogitem |
Curated portal entries wrapping JT/WFJT |
Slow (admin-curated) |
main_servicerequest |
End-user requests with lifecycle (pending → approved/rejected → running → terminal) |
Medium |
Authentication (WebAuthn / FIDO2)
| Table |
Description |
Growth rate |
main_webauthncredential |
FIDO2 credentials owned by users |
Slow (one or two per user) |
main_webauthnregistrationchallenge |
Short-lived (5 min) registration challenges |
Volatile — purged on every begin call |
main_webauthnauthenticationchallenge |
Short-lived (5 min) assertion challenges |
Volatile — purged on every begin call |
Policy-as-Code
| Table |
Description |
Growth rate |
main_policy |
Stored Rego policies pushed to OPA on save |
Slow |
main_policydecision |
One row per evaluation hit (warn or deny) |
Medium-fast |
IaC Scanning & Supply Chain Security
| Table |
Description |
Growth rate |
main_scanner |
Configured scanner rows (ansible-lint / checkov / pip-audit) with severity threshold + enforcement. Columns: id, name, description, organization_id, tool, config (jsonb), severity_threshold, enforcement, enabled, applies_to (jsonb), trigger_count, last_run_at, last_run_status, created, modified, created_by_id, modified_by_id. |
Slow |
main_scanresult |
One row per scanner execution at launch time. Columns: id, scanner_id (SET_NULL), scanner_name, unified_job_id (SET_NULL), unified_job_template_id (SET_NULL), organization_id, triggered_by_id, status (ok/warn/blocked/error/timeout), duration_ms, finding_count, highest_severity, message, raw_output, created, modified. |
Medium-fast |
main_scanfinding |
One row per finding at or above threshold. Columns: id, scan_result_id (CASCADE), rule_id, severity, file_path, line, message, created, modified. |
Fast (bursts on noisy scans) |
Multi-Tenancy (Tier 3.2)
The existing main_organization table gains 11 additive columns (all nullable
or with safe defaults, zero-downtime migration): is_tenant_root,
tenant_max_concurrent_jobs, tenant_max_daily_launches, tenant_max_hosts,
tenant_max_storage_mb, tenant_isolation_strict, tenant_logo_url,
tenant_primary_color, tenant_secondary_color, tenant_custom_domain
(indexed), tenant_contact_email.
| Table |
Description |
Growth rate |
main_tenantusage |
One row per tenant Organization. Columns: id, organization_id (OneToOne), concurrent_jobs_count, launches_today_count, launches_today_window_start, hosts_count, storage_mb_used, last_recalculated_at, created, modified. Updated by the launch hook + Celery beat reconciliation task. |
Slow (bounded by tenant count) |
main_tenantquotaevent |
One row per quota decision (allow or block) at job launch. Columns: id, organization_id (SET_NULL), organization_name (cached so the row survives org delete), quota_kind (concurrent_jobs/daily_launches/hosts/storage_mb), decision (allowed/blocked), current_value, limit_value, triggered_by_id, unified_job_template_id, message, created, modified. |
Medium |
main_tenantisolationevent |
One row per cross-tenant read observed when tenant_isolation_strict=True (v1 audit only). Columns: id, user_id, user_organization_id, accessed_organization_id, resource_type, resource_id, request_path, blocked (always False in v1), created, modified. |
Medium-fast when strict mode is on |
See docs/22-multi-tenancy.md.
Execution (GROW FAST — cleanup required)
| Table |
Description |
Growth rate |
main_job |
Execution records |
Fast (100/day+) |
main_jobevent |
Output for every job (partitioned!) |
Very fast (500K/day+) |
main_projectupdate |
Project sync records |
Medium |
main_inventoryupdate |
Inventory sync records |
Medium |
main_activitystream |
Audit log of all changes |
Fast |
main_auditevent |
Immutable security audit log |
Fast |
main_eventlog |
EDA webhook event logs |
Fast (depends on webhook volume) |
main_notification |
Sent notifications |
Medium |
main_hostfactsnapshot |
Host fact snapshots for drift detection |
Fast (cleanup via cleanup_old_snapshots) |
main_driftdetection |
Detected configuration drift items |
Fast |
Watch out
-
main_jobevent is the largest table. A single job with 100 hosts and 50 tasks generates
~5,000 rows. Without cleanup_jobs, this table can grow to hundreds of millions of rows.
-
Partitioning: main_jobevent uses PostgreSQL list partitioning by job_id.
Each job gets its own partition. A query for one job's events reads only one partition,
not the entire table.
-
main_activitystream also grows fast. Every create/update/delete on any
model creates a record. Use cleanup_activitystream --days=365 for cleanup.
Polymorphic Models
Jobs use Django polymorphism — all types share a base table:
UnifiedJobTemplate (base table)
├── JobTemplate
├── Project
├── InventorySource
├── SystemJobTemplate
└── WorkflowJobTemplate
UnifiedJob (base table)
├── Job
├── ProjectUpdate
├── InventoryUpdate
├── AdHocCommand
├── SystemJob
└── WorkflowJob
The endpoint /api/v2/unified_jobs/ shows ALL job types in a single list.
Database Maintenance
Backup
# Automated backup (uses the built-in script)
docker compose exec forge-task bash /etc/forge/backup.sh
# Manual backup
docker compose exec postgres pg_dump -U forge forge | gzip > backup_$(date +%Y%m%d).sql.gz
Restore
# Stop the application
docker compose stop forge-web forge-task
# Restore
gunzip -c backup_20260310.sql.gz | docker compose exec -T postgres psql -U forge forge
# Restart
docker compose start forge-web forge-task
docker compose exec forge-web awx-manage migrate # if version differs
Cleanup (MANDATORY in production)
# Delete jobs older than 90 days (and their events/partitions)
forge-manage cleanup_jobs --days=90
# Delete activity stream older than one year
forge-manage cleanup_activitystream --days=365
# Delete expired sessions and tokens
forge-manage cleanup_sessions
forge-manage cleanup_tokens
Recommendation: Set up a System Job in the Forge UI that runs cleanup_jobs daily.
Vacuum
# Reclaim dead tuples (run after large deletions)
docker compose exec postgres vacuumdb -U forge -z forge
PostgreSQL Tuning for Forge
| Parameter |
Recommendation |
Why |
shared_buffers |
25% of RAM (e.g., 2GB) |
Data cache |
effective_cache_size |
75% of RAM (e.g., 6GB) |
Query planner hint |
work_mem |
64MB |
Per-query memory for sort/join |
maintenance_work_mem |
512MB |
For VACUUM and CREATE INDEX |
random_page_cost |
1.1 |
For SSD storage |
log_min_duration_statement |
1000 |
Log queries > 1 second |
Useful Diagnostic Queries
System overview
SELECT
(SELECT count(*) FROM main_organization) AS organizations,
(SELECT count(*) FROM auth_user) AS users,
(SELECT count(*) FROM main_host) AS hosts,
(SELECT count(*) FROM main_jobtemplate) AS templates,
(SELECT count(*) FROM main_job) AS jobs;
Jobs by status (last 7 days)
SELECT status, count(*)
FROM main_job
WHERE created > NOW() - INTERVAL '7 days'
GROUP BY status ORDER BY count(*) DESC;
Longest jobs (average by template)
SELECT jt.name, count(j.id) AS runs,
round(avg(j.elapsed)::numeric, 1) AS avg_sec
FROM main_job j
JOIN main_jobtemplate jt ON j.job_template_id = jt.id
WHERE j.status = 'successful' AND j.created > NOW() - INTERVAL '30 days'
GROUP BY jt.name ORDER BY avg_sec DESC;
Hosts with failures
SELECT h.name, i.name AS inventory
FROM main_host h
JOIN main_inventory i ON h.inventory_id = i.id
WHERE h.has_active_failures = true;
Accessing the database
# Interactive shell
docker compose exec postgres psql -U forge forge
# Or through Django
docker compose exec forge-web awx-manage dbshell
Migrations
Forge has 252 migrations in forge/main/migrations/.
- All reference
forge.main.fields (not awx.main.fields)
- The init script automatically runs
migrate during deployment
- Never edit existing migrations — only add new ones
- Check status:
forge-manage showmigrations | grep "\[ \]"