Deployment
Standalone Deployment
Run the assistant independently:
cd forge-assistant
docker compose up -d
This starts a single all-in-one container (forge-assistant) that bundles:
- Ollama — LLM server (internal, port 11434)
- ChromaDB — vector store (embedded, port 8000)
- FastAPI — API server (exposed on port 8100)
First-Time Setup
On first start, the entrypoint automatically pulls the LLM model (gemma3:1b) and embedding model (nomic-embed-text). Allow ~2 minutes for the initial model download.
# 1. Start the container
docker compose up -d
# 2. Wait for health check to pass (start_period is 120s)
docker compose logs -f forge-assistant
# 3. Index documentation
curl -X POST http://localhost:8100/api/v1/index
# 4. Verify
curl http://localhost:8100/api/v1/health
Integration with Forge Platform
Step 1: Add to Docker Compose
In your forge-deploy directory:
docker compose -f docker-compose.yml \
-f /path/to/forge-assistant/docker-compose.integration.yml \
up -d
Step 2: Configure Nginx
Add to your Forge nginx configuration:
# In forge-deploy/nginx/nginx.conf, inside the server block:
location /assistant/ {
proxy_pass http://forge-assistant:8100/;
proxy_http_version 1.1;
proxy_set_header Connection '';
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# SSE support
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 300s;
}
Step 3: Frontend Detection
The Forge frontend automatically detects the assistant by calling /assistant/api/v1/health. If the endpoint responds, the chat button appears in the UI.
GPU Support
For GPU-accelerated inference, uncomment the GPU section in docker-compose.yml:
services:
forge-assistant:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Requirements: - NVIDIA GPU with 8+ GB VRAM - nvidia-container-toolkit installed on the host - Docker configured with nvidia runtime
CPU-Only Deployment
For servers without GPU, use a smaller model:
FORGE_ASSISTANT_OLLAMA_MODEL=phi3:mini docker compose up -d
Response time will be 10-20 seconds instead of 2-5 seconds.
Removing the Assistant
To remove the assistant from a running Forge deployment:
# Stop assistant service
docker compose -f docker-compose.yml \
-f /path/to/forge-assistant/docker-compose.integration.yml \
down forge-assistant
# Or if running standalone
cd forge-assistant && docker compose down -v
The Forge platform continues to work normally. The chat button disappears automatically when the health check fails.
Backup and Restore
All persistent data (ChromaDB + Ollama models) is stored in the assistant_data volume mounted at /data:
# Backup
docker run --rm -v forge-assistant_assistant_data:/data -v $(pwd)/backups:/backup \
alpine tar czf /backup/assistant-data.tar.gz /data
# Restore
docker run --rm -v forge-assistant_assistant_data:/data -v $(pwd)/backups:/backup \
alpine tar xzf /backup/assistant-data.tar.gz -C /
Tip: Re-indexing docs (
curl -X POST http://localhost:8100/api/v1/index?rebuild=true) is fast and often easier than restoring ChromaDB data. Model re-download is automatic on first start if models are missing.
CI/CD Pipeline
The repository ships with a GitHub Actions workflow in .github/workflows/ci.yml:
- Lint — ruff check on Python code
- Test — pytest with JUnit XML reporting
- Build — Docker image build
- Scan — Trivy container vulnerability scan
- Push — Push to
ghcr.io/forgeplatform/forge-assistant(main branch and version tags only)
Tests must pass before any image is built or pushed. Releases use the built-in GITHUB_TOKEN with packages: write — no external secrets required.