Forge Platform — Administrator Handbook
Operational guide for installing, running, and maintaining a Forge Platform deployment. Each section is step-by-step with concrete example values you can copy. Click any item in the table of contents to jump straight to it.
For day-to-day UI usage, see the companion User Handbook.
Table of Contents
Install & First Boot
- Prerequisites
- Installation (Docker Compose)
- First-Time Setup
- TLS / SSL Setup
- Initial Admin Hardening
Day-2 Operations
Scaling & Topology
Observability
Security
Troubleshooting
- Stack Won't Come Up
- Database Connection Errors
- Jobs Stuck in Pending
- Receptor / Mesh Issues
- Frontend Returns 502
- Disk Filling Up
- Reset Admin Password
Reference
INSTALL & FIRST BOOT
Prerequisites
Hardware and software needed before installing Forge.
Step by step
- Provision a host (VM or bare metal) with:
- 4 vCPU, 8 GB RAM, 50 GB disk (minimum)
- 8 vCPU, 16 GB RAM, 200 GB disk (recommended for production)
- Install Docker ≥ 24 and Docker Compose plugin ≥ 2.20.
- Open inbound ports 80 and 443 in your firewall.
- Register a DNS A record pointing at the host (e.g.
forge.example.com). - Make sure the system clock is in sync with NTP.
Example — Ubuntu 24.04 fresh box
sudo apt update
sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
newgrp docker
Installation (Docker Compose)
The standard installation uses the compose file shipped in this repo.
Step by step
- Clone or copy
forge-deployto the target host:bash git clone https://github.com/forgeplatform/forge-devops.git /opt/forge cd /opt/forge - Copy the example env file and edit it:
bash cp .env.example .env $EDITOR .env - Fill in the required variables (see Environment Variables).
- Pull the images:
bash docker compose pull - Bring the stack up:
bash docker compose up -d - Tail the init container until it finishes:
bash docker compose logs -f forge-init - When you see
==> Init complete.you can hithttps://<your-host>in a browser.
Example .env for first install
POSTGRES_PASSWORD=Sup3rS3cret-pg
FORGE_SECRET_KEY=$(openssl rand -hex 25)
FORGE_BROADCAST_WEBSOCKET_SECRET=$(openssl rand -hex 25)
FORGE_ADMIN_PASSWORD=ChangeMe!Now
FORGE_CSRF_TRUSTED_ORIGINS=https://forge.example.com
FORGE_ALLOWED_HOSTS=forge.example.com
FORGE_TAG=2026.04.0
Note: generate the two random secrets with
openssl rand -hex 25before pasting them into the file — do not leave the literal$()in.env.
First-Time Setup
What to do the first time you log in.
Step by step
- Open
https://forge.example.com. - Log in as
admin/ yourFORGE_ADMIN_PASSWORD. - Click Settings → License and upload the license file (if applicable).
- Click Settings → System → set Base URL to
https://forge.example.com. - Click Organizations → Add and create your first organization (e.g.
Platform). - Click Users → Add and create at least one named admin (do not keep using the default
admin). - Log out, log back in as the new admin, and disable the default admin from Users → admin → Disable.
Example values
| Field | Value |
|---|---|
| Base URL | https://forge.example.com |
| First org | Platform |
| Named admin | krstan / strong password |
TLS / SSL Setup
Forge ships with a self-signed cert. Replace it with a real one before exposing the box.
Step by step — Let's Encrypt with certbot
- Stop the nginx service so port 80 is free for the ACME challenge:
bash docker compose stop nginx - Run certbot in standalone mode:
bash sudo certbot certonly --standalone -d forge.example.com - Copy the new cert/key into the nginx mount point:
bash sudo cp /etc/letsencrypt/live/forge.example.com/fullchain.pem nginx/ssl/forge.crt sudo cp /etc/letsencrypt/live/forge.example.com/privkey.pem nginx/ssl/forge.key - Restart nginx:
bash docker compose up -d nginx - Verify:
bash curl -I https://forge.example.com
Example renewal cron
0 3 * * 1 certbot renew --pre-hook "docker compose -f /opt/forge/docker-compose.yml stop nginx" --post-hook "cp /etc/letsencrypt/live/forge.example.com/fullchain.pem /opt/forge/nginx/ssl/forge.crt && cp /etc/letsencrypt/live/forge.example.com/privkey.pem /opt/forge/nginx/ssl/forge.key && docker compose -f /opt/forge/docker-compose.yml up -d nginx"
Initial Admin Hardening
Things every new install should do on day one.
Step by step
- Change
FORGE_ADMIN_PASSWORDin.envto a strong value, thendocker compose up -d forge-initto apply. - In Settings → Authentication, disable any auth backend you don't use (LDAP, SAML, OIDC).
- In Settings → System, enable Session Cookie Secure and set Session Timeout to
3600. - Create at least two named admin users so no single account is the only way in.
- Configure a backup cron before importing real data.
- Configure at least one Notification channel for failure alerts.
DAY-2 OPERATIONS
Starting and Stopping the Stack
Step by step
# Start everything
docker compose up -d
# Stop everything (no data loss)
docker compose stop
# Stop and remove containers (data in volumes survives)
docker compose down
# Restart a single service
docker compose restart forge-web
Example — restart only the task workers after editing settings
docker compose restart forge-task
Inspecting Logs
Step by step
# All services, follow
docker compose logs -f
# One service
docker compose logs -f forge-web
# Last 200 lines, no follow
docker compose logs --tail=200 forge-task
# Filter by timestamp
docker compose logs --since=1h forge-web
Example — find the most recent error in the web service
docker compose logs --since=24h forge-web | grep -i error | tail -20
Health Checks
The stack ships with two healthcheck scripts you can run manually or from monitoring.
Step by step
# Web service
docker compose exec forge-web /scripts/healthcheck-web.sh
# Task service
docker compose exec forge-task /scripts/healthcheck-task.sh
Example — uptime check from outside
curl -fsS https://forge.example.com/api/v2/ping/ && echo OK
Backup
Daily backups are mandatory. Forge ships scripts/backup.sh which dumps Postgres and rotates old archives.
Step by step
- Run a one-off backup to verify it works:
bash docker compose exec postgres /scripts/backup.sh - Inspect the result:
bash ls -lh /var/lib/awx/backups/ - Schedule a nightly cron on the host:
cron 0 2 * * * docker compose -f /opt/forge/docker-compose.yml exec -T postgres /scripts/backup.sh >> /var/log/forge-backup.log 2>&1 - Copy the backups off-host (S3, rsync, etc.):
cron 30 2 * * * aws s3 sync /var/lib/awx/backups/ s3://acme-forge-backups/
Example output
==> Starting backup...
==> Backup saved to /var/lib/awx/backups/forge_backup_20260411_020000.sql.gz
==> Removing backups older than 7 days...
==> Backup complete.
Retention is controlled by
BACKUP_RETENTION_DAYS(default 7). Override in.envif you need longer.
Restore
scripts/restore.sh reads a .sql.gz and pipes it back into Postgres.
Step by step
- Stop the application services so nothing writes to the DB:
bash docker compose stop forge-web forge-task - Run the restore (omit the filename to use the most recent backup):
bash docker compose exec -T postgres /scripts/restore.sh /var/lib/awx/backups/forge_backup_20260411_020000.sql.gz - Restart the application:
bash docker compose up -d forge-web forge-task - Verify in the UI: log in and check Activity for the expected history.
Example — restore yesterday's backup
docker compose stop forge-web forge-task
docker compose exec -T postgres /scripts/restore.sh
docker compose up -d forge-web forge-task
Warning: Restore is destructive. The current database is overwritten by the dump. Always take a fresh backup before restoring an old one.
Upgrade
Upgrading is a tag bump + pull + up.
Step by step
- Read the release notes for breaking changes.
- Take a backup (see Backup).
- Edit
.envand bumpFORGE_TAG:ini FORGE_TAG=2026.05.0 - Pull the new images:
bash docker compose pull - Bring the new stack up —
forge-initruns migrations automatically:bash docker compose up -d - Tail init:
bash docker compose logs -f forge-init - Smoke-test:
bash curl -fsS https://forge.example.com/api/v2/ping/ - Watch Jobs for 10 minutes — make sure new launches work.
Example — minor version bump
sed -i 's/^FORGE_TAG=.*/FORGE_TAG=2026.04.1/' .env
docker compose pull && docker compose up -d
Rolling Back
If an upgrade fails, roll back the tag and restore the pre-upgrade backup.
Step by step
- Set the previous tag in
.env:ini FORGE_TAG=2026.04.0 - Pull and bring up:
bash docker compose pull docker compose up -d - If migrations were applied during the failed upgrade, you must also restore the pre-upgrade DB dump (see Restore).
- Verify and notify users.
Example
Upgraded to
2026.05.0at 02:30, jobs started failing at 02:45 → setFORGE_TAG=2026.04.0→ pull → up → restoreforge_backup_20260411_020000.sql.gz→ service restored at 02:55.
SCALING & TOPOLOGY
Adding an Execution Node
Execution nodes run Ansible jobs. Add more when you exhaust capacity.
Step by step
- Provision a new host with Docker.
- On the new host, install Receptor and bring it up as an execution node, pointing at the control node:
bash docker run -d --name forge-receptor \ -e RECEPTOR_NODE_TYPE=execution \ -e RECEPTOR_PEER=tcp://control.forge.example.com:2222 \ ghcr.io/forgeplatform/forge-receptor:2026.04.0 - In the UI: Admin → Instances → Add and register the new node:
| Field | Value |
|---|---|
| Hostname | worker-eu-03 |
| Node Type | execution |
| Instance Group | eu-west-pool |
- Wait until the node shows Ready in Topology.
- Existing templates pinned to
eu-west-poolwill start scheduling onto it.
Adding a Hop Node
Hop nodes relay traffic across network boundaries (e.g. DMZ → internal).
Step by step
- Provision the hop host inside the boundary.
- Run a Receptor container with
RECEPTOR_NODE_TYPE=hop:bash docker run -d --name forge-receptor \ -e RECEPTOR_NODE_TYPE=hop \ -e RECEPTOR_PEER=tcp://control.forge.example.com:2222 \ ghcr.io/forgeplatform/forge-receptor:2026.04.0 - In the UI: Admin → Instances → Add with Node Type = hop.
- From any execution node behind the hop, set its peer to the hop instead of the control node.
- Confirm in Topology that the hop appears between the control and the workers.
Tuning Capacity
Step by step
- Settings → Jobs, edit:
- Max Concurrent Jobs — global ceiling
- Max Forks — Ansible forks per job
- Per instance: Admin → Instances →
→ Capacity Adjustment slider (0.0–1.0). - Save. Changes take effect within 30 seconds.
Example
16 vCPU control node → set capacity to
1.0(use all). 4 vCPU shared dev box → set to0.25.
Switching to Kubernetes
The k8s/ folder contains baseline manifests if you outgrow Docker Compose.
Step by step
- Read
k8s/— it includes Deployments, Services, ConfigMap, Secret, Ingress. - Create a namespace:
bash kubectl create namespace forge - Create the secrets (translate your
.env):bash kubectl -n forge create secret generic forge-env --from-env-file=.env - Apply the manifests:
bash kubectl -n forge apply -f k8s/ - Watch the rollout:
bash kubectl -n forge get pods -w
Migration from compose to k8s is a one-shot: dump Postgres, import into the k8s-managed Postgres (or external RDS).
OBSERVABILITY
Enabling OpenTelemetry
The stack ships with forge-otel-collector. You only need to point it at your backend.
Step by step
- Edit
otel/collector-config.yaml. - Set the exporter endpoint:
yaml exporters: otlphttp: endpoint: https://otel.example.com headers: authorization: "Bearer YOUR_TOKEN" - Restart the collector:
bash docker compose restart forge-otel-collector - In Settings → Observability, set:
- OTLP Endpoint =
http://forge-otel-collector:4318 - Sampling rate =
0.1(10%) - Save. Within a minute, traces appear in your APM tool.
Grafana Dashboards
grafana/ contains pre-built dashboards JSONs.
Step by step
- Open Grafana → Dashboards → Import.
- Upload
grafana/forge-overview.json. - Pick your Prometheus datasource.
- Save. The dashboard shows job throughput, web/task latency, queue depth.
Audit Log Export
For SOC 2 / ISO 27001 evidence collection.
Step by step
- Open Audit Log in the UI.
- Filter by date range (e.g. last quarter).
- Click Export → CSV.
- Hash and store the CSV alongside your evidence pack:
bash sha256sum audit-2026Q1.csv > audit-2026Q1.csv.sha256
SECURITY
Rotating Secrets
Secrets to rotate periodically: FORGE_SECRET_KEY, POSTGRES_PASSWORD, FORGE_ADMIN_PASSWORD, FORGE_BROADCAST_WEBSOCKET_SECRET.
Step by step — rotate FORGE_SECRET_KEY
- Take a backup first.
- Generate a new key:
bash openssl rand -hex 25 - Edit
.env, replaceFORGE_SECRET_KEY=.... - Restart web + task:
bash docker compose up -d forge-web forge-task - All sessions are invalidated — users must re-login. Encrypted credentials in the DB are unaffected (they use a separate Fernet key).
Never rotate the database encryption key (used for credential storage) without first re-encrypting all credentials. That procedure is a separate runbook.
User & SSO Setup
Step by step — enable OIDC
- Open Settings → Authentication → OIDC.
- Fill in:
- Provider URL —
https://login.example.com - Client ID —
forge-prod - Client Secret — (from IdP)
- Redirect URI —
https://forge.example.com/sso/complete/oidc/ - Save → click Test to verify discovery.
- Test login from a private browser window.
- Map IdP groups to Forge teams under Settings → Authentication → Group Mapping.
Firewall & Network Hardening
Step by step
- Allow inbound only on 443 (and optionally 80 for HTTP→HTTPS redirect).
- Restrict SSH to your bastion / admin range.
- Block Postgres (5432), Redis (6379), Receptor (2222) from the public internet — they should only be reachable from inside the Docker network.
- If running on cloud, also configure security groups, not just OS firewall.
Example — ufw on Ubuntu
sudo ufw default deny incoming
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow from 10.0.0.0/24 to any port 22
sudo ufw enable
Security Updates
Step by step
- Subscribe to the Forge release announcement channel.
- Run
docker compose pullweekly to pick up base-image patches when you bump tag. - Patch the host OS monthly (
unattended-upgradeson Debian/Ubuntu). - Run scheduled image scans:
bash trivy image ghcr.io/forgeplatform/forge-backend:2026.04.0
TROUBLESHOOTING
Stack Won't Come Up
Step by step
- Check
docker compose ps— which service is unhealthy? docker compose logs <service>for the failing one.- Common causes:
- Missing or wrong values in
.env→ look forKeyError/ImproperlyConfigured. - Port 80/443 already in use →
sudo lsof -i :443. - Volume permissions →
sudo chown -R 1000:1000 /var/lib/awx. - Fix and
docker compose up -dagain.
Database Connection Errors
Symptom: web service logs show could not connect to server: Connection refused.
Step by step
- Is postgres running?
docker compose ps postgres - Logs:
docker compose logs --tail=100 postgres - Can the web container reach it?
bash docker compose exec forge-web pg_isready -h postgres -U forge - If postgres is healthy but web cannot connect → check
POSTGRES_PASSWORDmatches in.envand the DB volume. - If postgres won't start → look for
PANIClines (disk full, corrupt WAL).
Jobs Stuck in Pending
Symptom: jobs sit in Pending and never start.
Step by step
- Check capacity: Admin → Instances → is total used == total capacity?
- Check task workers:
docker compose ps forge-task— running? - Check Redis:
docker compose exec redis redis-cli pingshould returnPONG. - Check Receptor:
docker compose exec forge-task receptorctl status. - As a last resort, restart the task service:
bash docker compose restart forge-task
Receptor / Mesh Issues
Step by step
- Open Topology — any red links?
- From the control node:
bash docker compose exec forge-task receptorctl status - From a worker:
bash docker exec forge-receptor receptorctl status - Verify TCP reachability between nodes on 2222.
- Restart the affected receptor container.
Frontend Returns 502
Symptom: browser shows nginx 502 Bad Gateway.
Step by step
docker compose ps— isforge-frontendhealthy?docker compose logs --tail=50 forge-frontenddocker compose logs --tail=50 nginx- Common cause: frontend container OOM-killed → bump memory in compose.
- Restart:
docker compose restart forge-frontend nginx.
Disk Filling Up
Step by step
df -h— which mount?- If
/var/lib/docker→ prune unused images:bash docker image prune -af --filter "until=168h" - If the backup directory → lower
BACKUP_RETENTION_DAYSand rerun backup. - If Postgres data dir → check for runaway audit log growth, vacuum:
bash docker compose exec postgres psql -U forge -d forge -c "VACUUM FULL VERBOSE;"
Reset Admin Password
Step by step
- Exec into the web container:
bash docker compose exec forge-web bash - Run the management command:
bash awx-manage changepassword admin - Enter the new password twice.
- Log in via the UI.
If
adminwas disabled and you have no other admin user:bash docker compose exec forge-web awx-manage createsuperuser
REFERENCE
Service Map
| Service | Image | Port (internal) | Purpose |
|---|---|---|---|
postgres |
postgres:15 |
5432 | Application database |
redis |
redis:7 |
6379 | Cache + Celery broker |
forge-init |
forge-backend |
— | Migrations + initial setup, exits on success |
forge-web |
forge-backend |
8050 (uWSGI), 8051 (Daphne) | REST API + WebSocket |
forge-task |
forge-backend |
— | Celery workers + dispatcher + ws relay |
forge-frontend |
forge-frontend |
80 | Static React UI |
forge-opa |
openpolicyagent/opa |
8181 | Policy-as-Code sidecar |
forge-otel-collector |
otel/collector |
4317/4318 | OpenTelemetry pipeline |
nginx |
nginx |
80 / 443 | TLS terminator + edge router |
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
POSTGRES_PASSWORD |
yes | — | DB password |
POSTGRES_USER |
no | forge |
DB user |
POSTGRES_DB |
no | forge |
DB name |
FORGE_SECRET_KEY |
yes | — | Django SECRET_KEY (50+ chars) |
FORGE_BROADCAST_WEBSOCKET_SECRET |
yes | — | WS broadcast secret |
FORGE_ADMIN_USER |
no | admin |
Bootstrap admin username |
FORGE_ADMIN_PASSWORD |
yes | — | Bootstrap admin password |
FORGE_ADMIN_EMAIL |
no | admin@example.com |
Bootstrap admin email |
FORGE_CSRF_TRUSTED_ORIGINS |
yes | — | Comma-separated https://... origins |
FORGE_ALLOWED_HOSTS |
no | * |
Django ALLOWED_HOSTS |
FORGE_NODE_NAME |
no | forge-node |
This node's name in mesh |
FORGE_NODE_TYPE |
no | hybrid |
control / execution / hybrid |
FORGE_BACKEND_IMAGE |
no | ghcr.io/forgeplatform/forge-backend |
Backend image |
FORGE_FRONTEND_IMAGE |
no | ghcr.io/forgeplatform/forge-frontend |
Frontend image |
FORGE_TAG |
no | latest |
Image tag (use a real version, not latest) |
BACKUP_RETENTION_DAYS |
no | 7 |
Days of backups to keep |
File Layout
/opt/forge/
├── docker-compose.yml # primary stack definition
├── .env # local secrets — never commit
├── nginx/
│ ├── nginx.conf
│ └── ssl/
│ ├── forge.crt
│ └── forge.key
├── settings/ # Django settings overrides mounted into forge-web/task
├── otel/
│ └── collector-config.yaml
├── grafana/
│ └── *.json # importable dashboards
├── scripts/
│ ├── backup.sh
│ ├── restore.sh
│ ├── healthcheck-web.sh
│ ├── healthcheck-task.sh
│ └── init.sh
├── k8s/ # k8s manifests (alternative to compose)
└── /var/lib/awx/
├── projects/ # synced project checkouts
├── public/ # static web assets
└── backups/ # nightly DB dumps
End of administrator handbook.