23 — Recommendations Engine

Tier 3.7 — NEW

Forge Platform ships a rule-based recommendations engine that surfaces actionable, scope-aware hints about platform health. Recommendations appear on the Dashboard and inside each wizard, and are designed to nudge operators toward best practices ("you have job templates but no IaC scanners", "the admin user is still using the default password") without ever blocking their work.

Everything is read-only and stateless. The engine has no Django models, no migrations, and no DB tables of its own — it computes its results live from existing tables, with a 60-second in-memory cache to keep the cost negligible.


Scope (v1)

In:

Out (deferred):


Module Layout

forge/main/recommendations/
├── __init__.py        # public exports: build_context, evaluate, Recommendation
├── types.py           # @dataclass RuleContext, Recommendation
├── rules.py           # 12 pure rule functions
└── engine.py          # build_context() + evaluate() + 60s cache

forge/api/
├── views/recommendations.py        # RecommendationsList APIView
├── serializers/recommendations.py  # RecommendationSerializer
└── urls/recommendations.py         # urlpatterns

tests_standalone/
└── test_recommendations.py         # 17 test classes

The module is not a Django app — it has no apps.py, no models.py, no migrations, and is not registered in INSTALLED_APPS. Its URL include is wired directly from forge/api/urls/urls.py.


Data Types

RuleContext

A frozen aggregation of everything the rules need. Built once per request (or once per minute when the cache is warm) by build_context(), which runs defensive lazy queries against the existing tables.

Field Type Source
scanners_enabled_count int Scanner.objects.filter(enabled=True).count()
policies_total int Policy.objects.count()
policies_enforce_count int Policy.objects.filter(mode='enforce').count()
organizations_total int Organization.objects.count()
tenancy_enabled bool settings.TENANCY_ENABLED
otel_enabled bool settings.OTEL_ENABLED
job_templates_total int JobTemplate.objects.count()
schedules_total int Schedule.objects.count()
catalog_items_total int CatalogItem.objects.count()
surveys_total int JobTemplate.objects.exclude(survey_spec=None).count()
drift_detections_total int DriftDetection.objects.count()
projects list[(name, last_sync)] Project.objects.values_list('name', 'last_update_failed')
tenant_usage list[(name, quota_pct)] TenantUsage aggregate
teams_count int Team.objects.count()
admin_default_password bool User.check_password('password') for admin

Each query is wrapped in try/except so a missing table (e.g. recommendations on a fresh install before migrations finish) returns a zero/false value instead of crashing.

Recommendation

@dataclass(frozen=True)
class Recommendation:
    id: str           # unique stable id, e.g. 'no_scanners'
    scope: str        # one of the 8 scopes
    severity: str     # 'critical' | 'warn' | 'info'
    title: str        # human title
    why: str          # human explanation, 1–2 sentences
    action_link: str  # frontend route the UI button should open

The 12 Rules

ID Scope Severity Triggers when…
default_admin_password dashboard critical The bootstrap admin user still has the seed password
no_scanners compliance warn At least one job template exists but no enabled scanners
multi_org_no_tenancy tenancy warn More than one Organization but TENANCY_ENABLED=False
tenant_near_quota tenancy warn Any tenant is using > 80% of any quota
all_policies_warn compliance info At least one policy exists but none are in enforce mode
no_observability dashboard info OTEL_ENABLED=False
stale_project automation info Any Project has not synced in 14+ days
no_schedules automation info Job templates exist but no Schedules are configured
no_drift compliance info Job templates exist but no DriftDetections configured
few_surveys self_service info Less than 50% of templates expose a survey
no_catalog_items self_service info Job templates exist but the Service Catalog is empty
only_default_team access info Only the seed Team exists

Each rule is a pure function (ctx: RuleContext) -> Recommendation | None. A rule that raises an exception is silently dropped — the engine never returns a partial error to the API.


HTTP API

GET /api/v2/recommendations/

Auth: IsAuthenticated (any logged-in user can read).

Query parameters

Name Type Default Description
scope str all Filter by scope. One of: all, dashboard, automation, self_service, tenancy, compliance, resources, access

Response 200

{
  "count": 3,
  "results": [
    {
      "id": "default_admin_password",
      "scope": "dashboard",
      "severity": "critical",
      "title": "Default admin password in use",
      "why": "The admin user is still using the default password. Change it immediately to secure your installation.",
      "action_link": "/users"
    },
    {
      "id": "multi_org_no_tenancy",
      "scope": "tenancy",
      "severity": "warn",
      "title": "Multiple organizations without tenancy",
      "why": "You have more than one organization but tenancy is disabled. Enable tenancy to enforce quotas and isolation.",
      "action_link": "/wizards/tenancy"
    },
    {
      "id": "no_observability",
      "scope": "dashboard",
      "severity": "info",
      "title": "Observability is not enabled",
      "why": "OpenTelemetry exporters are off. Enable them to get traces, metrics, and logs flowing to your backend.",
      "action_link": "/wizards/observability"
    }
  ]
}

Results are always sorted critical → warn → info, with stable ordering inside each severity bucket (the order in which the rule is registered).

Examples

All recommendations

curl -s -H "Authorization: Bearer $TOKEN" \
  https://forge.example.com/api/v2/recommendations/

Only compliance

curl -s -H "Authorization: Bearer $TOKEN" \
  "https://forge.example.com/api/v2/recommendations/?scope=compliance"
{
  "count": 1,
  "results": [
    {
      "id": "no_scanners",
      "scope": "compliance",
      "severity": "warn",
      "title": "No IaC scanners configured",
      "why": "You have job templates but no IaC scanners. Enable scanning to catch unsafe playbooks before they run.",
      "action_link": "/wizards/compliance"
    }
  ]
}

Only tenancy

curl -s -H "Authorization: Bearer $TOKEN" \
  "https://forge.example.com/api/v2/recommendations/?scope=tenancy"
{
  "count": 2,
  "results": [
    {
      "id": "multi_org_no_tenancy",
      "scope": "tenancy",
      "severity": "warn",
      "title": "Multiple organizations without tenancy",
      "why": "You have more than one organization but tenancy is disabled. Enable tenancy to enforce quotas and isolation.",
      "action_link": "/wizards/tenancy"
    },
    {
      "id": "tenant_near_quota",
      "scope": "tenancy",
      "severity": "warn",
      "title": "Tenant near concurrent-job quota",
      "why": "Tenant \"production-1\" is using 92% of its concurrent job limit. Consider raising the quota.",
      "action_link": "/tenants"
    }
  ]
}

Empty result

When nothing matches (best case — your platform is healthy), the API returns:

{ "count": 0, "results": [] }

Caching

build_context() is wrapped in a 60-second TTL memoizer. The cache key is constant (process-local), so:

This is intentional. There is no cross-process invalidation because recommendations are not load-bearing — being 60 s stale is fine.

To bypass the cache (for tests), call build_context(_cache_bust=True).


Severity → UI Mapping

The frontend uses these conventions; rules should pick severity to match the intended urgency.

Severity When to use UI treatment
critical Active security or data-loss risk Red badge, stays at the top, never auto-dismissable
warn Best-practice violation that will bite later Amber badge, dismissible per session
info Tip or suggestion Blue badge, dismissible per session

Today only default_admin_password is critical. Adding new criticals should require explicit review — they cannot be silenced from the UI.


Frontend Integration

The frontend hits the endpoint via TanStack Query in src/api/hooks/useRecommendations.ts. Two consumers:

  1. Dashboard banner (src/pages/Dashboard.tsx) — calls useRecommendations('dashboard'), displays the top 5 in a Getting Started card and groups the rest by severity.
  2. Wizard panel (src/components/wizard/RecommendationsPanel.tsx) — every wizard renders this panel at the top of its content area, scoped to that wizard's domain (e.g. useRecommendations('compliance') inside the ComplianceWizard).

Both consumers refetch every 60 seconds via TanStack's refetchInterval, matching the backend cache TTL.


Adding a New Rule

  1. Edit forge/main/recommendations/rules.py. Add a function:

python def rule_my_new_check(ctx: RuleContext) -> Recommendation | None: if not <my condition>: return None return Recommendation( id='my_new_check', scope='compliance', severity='warn', title='Short, action-oriented title', why='One or two sentences explaining why this matters.', action_link='/wherever', )

  1. Register it in the ALL_RULES list at the bottom of the same file.
  2. If your rule needs new context data, add a field to RuleContext in types.py and populate it in engine.build_context().
  3. Add a unit test in tests_standalone/test_recommendations.py. At minimum cover trigger and non-trigger cases.
  4. Run the tests: bash pytest tests_standalone/test_recommendations.py -v
  5. (Optional) add an i18n key on the frontend if the title/why should be translated rather than served verbatim.

Testing

Standalone (no Django):

pytest tests_standalone/test_recommendations.py

The 17 test classes cover every rule plus the evaluator (severity sort, scope filter, exception swallowing, empty context). Because the rules take a plain dataclass, no fixtures, no DB, no Django setup is required — the entire suite runs in well under a second.


Operational Notes


Roadmap


See Also