Configuration
All configuration is via environment variables with the FORGE_ASSISTANT_ prefix.
Environment Variables
Ollama Settings
| Variable |
Default |
Description |
FORGE_ASSISTANT_OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API base URL (localhost — runs inside the same container) |
FORGE_ASSISTANT_OLLAMA_MODEL |
gemma3:1b |
LLM model for chat generation |
FORGE_ASSISTANT_OLLAMA_EMBED_MODEL |
nomic-embed-text |
Model for generating embeddings |
FORGE_ASSISTANT_OLLAMA_TIMEOUT |
120 |
Timeout in seconds for Ollama requests |
ChromaDB Settings
| Variable |
Default |
Description |
FORGE_ASSISTANT_CHROMA_HOST |
localhost |
ChromaDB hostname (localhost — embedded in the same container) |
FORGE_ASSISTANT_CHROMA_PORT |
8000 |
ChromaDB port |
FORGE_ASSISTANT_CHROMA_COLLECTION |
forge_docs |
Collection name for indexed documents |
RAG Settings
| Variable |
Default |
Description |
FORGE_ASSISTANT_RAG_TOP_K |
5 |
Number of document chunks to retrieve per query |
FORGE_ASSISTANT_RAG_CHUNK_SIZE |
500 |
Character count per document chunk |
FORGE_ASSISTANT_RAG_CHUNK_OVERLAP |
50 |
Overlap between adjacent chunks |
Application Settings
| Variable |
Default |
Description |
FORGE_ASSISTANT_APP_NAME |
Forge Assistant |
Application display name |
FORGE_ASSISTANT_APP_VERSION |
2026.05.0 |
Version string |
FORGE_ASSISTANT_LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
FORGE_ASSISTANT_CORS_ORIGINS |
* |
Comma-separated list of allowed CORS origins |
Model Selection
| Model |
VRAM |
Speed |
Quality |
Recommendation |
tinyllama:1.1b |
2 GB |
Fastest |
Basic |
CPU-only, testing |
phi3:mini |
4 GB |
Fast |
Good |
CPU with 8+ GB RAM |
gemma3:1b |
2 GB |
Fast |
Good |
Default — works on CPU, good balance |
mistral:7b |
6 GB |
Medium |
Excellent |
GPU with 8+ GB VRAM |
llama3.1:8b |
8 GB |
Medium |
Best |
GPU with 10+ GB VRAM |
To change the model:
# Pull new model (exec into the all-in-one container)
docker compose exec forge-assistant ollama pull llama3.1:8b
# Restart with new model
FORGE_ASSISTANT_OLLAMA_MODEL=llama3.1:8b docker compose up -d
Document Sources
Place markdown files in docs_to_index/ to make them searchable:
docs_to_index/
├── api_reference/ # API endpoint documentation
│ ├── jobs.md
│ ├── templates.md
│ └── inventories.md
├── user_guide/ # User instructions
│ ├── getting_started.md
│ ├── schedules.md
│ └── workflows.md
└── errors/ # Known errors and solutions
└── common_errors.md
After adding files, trigger re-indexing:
curl -X POST http://localhost:8100/api/v1/index?rebuild=true