Runbooks¶
Runbooks are step-by-step troubleshooting guides that Holmes follows when investigating issues. When a user asks a question or an alert fires, Holmes automatically matches relevant runbooks from its catalog and fetches them using the fetch_runbook tool. It then follows the runbook instructions step-by-step, calling tools to gather data and reporting results for each step.
Runbooks work with all Holmes interfaces — the CLI (ask and investigate commands), the HTTP server, and the Python SDK.
How It Works¶
- Holmes receives a question or alert
- Holmes compares the issue against runbook descriptions in the catalog
- If a runbook matches, Holmes fetches it with the
fetch_runbooktool - Holmes follows the runbook steps, calling tools to gather data at each step
- Holmes reports findings with a checklist showing completed and skipped steps
Built-in Runbooks¶
Holmes ships with a built-in runbook catalog at holmes/plugins/runbooks/. These are available automatically — no configuration needed.
Custom Runbook Catalogs¶
You can add your own runbooks by creating a catalog and pointing Holmes to it.
Creating a Catalog¶
A catalog consists of a catalog.json index file and one or more markdown runbook files:
my-runbooks/
├── catalog.json
├── database/
│ ├── postgres_troubleshooting.md
│ └── redis_connection_issues.md
└── networking/
└── dns_resolution.md
catalog.json:
{
"catalog": [
{
"id": "postgres-troubleshooting.md",
"update_date": "2026-01-15",
"description": "Troubleshooting PostgreSQL connection and performance issues",
"link": "database/postgres_troubleshooting.md"
},
{
"id": "redis-connection-issues.md",
"update_date": "2026-01-15",
"description": "Diagnosing Redis connection failures and timeout issues",
"link": "database/redis_connection_issues.md"
},
{
"id": "dns-resolution.md",
"update_date": "2026-01-15",
"description": "Troubleshooting DNS resolution failures in Kubernetes clusters",
"link": "networking/dns_resolution.md"
}
]
}
Each entry has:
id: Unique identifier (typically the filename)update_date: Last updated date (YYYY-MM-DD)description: Used by the LLM to match the runbook to user questions — make this descriptivelink: Relative path fromcatalog.jsonto the markdown file
Writing a Runbook¶
Runbooks are markdown files with a structured format that guides Holmes through troubleshooting steps:
# PostgreSQL Connection Troubleshooting
## Goal
Diagnose and resolve PostgreSQL database connection issues.
Follow the workflow steps sequentially.
## Workflow
1. **Check database pod status**
* Verify pods are running and not restarting
* Check resource usage (CPU, memory)
2. **Test database connectivity**
* Verify the connection string and credentials
* Check network policies and service endpoints
3. **Examine database logs**
* Look for authentication failures
* Check for max connection limit errors
4. **Review client configuration**
* Validate connection pool settings
* Check timeout configurations
## Synthesize Findings
Correlate the outputs from each step to identify the root cause.
## Recommended Remediation Steps
* **Authentication failures**: Verify credentials in the application Secret
* **Connection limit**: Increase `max_connections` or add connection pooling
* **Network issues**: Check NetworkPolicies and DNS resolution
The key sections are:
- Goal: What the runbook addresses
- Workflow: Sequential diagnostic steps Holmes will execute using its tools
- Synthesize Findings: How to interpret combined results
- Recommended Remediation Steps: Solutions based on findings
Configuring Custom Catalogs¶
Add catalog paths to ~/.holmes/config.yaml:
Mount your catalog files and reference them in values:
Multiple catalogs are merged — entries from all catalogs are combined with the built-in catalog.