Skip to content

Recommended Setup

After installing HolmesGPT and running your first investigation, connect your data sources so Holmes can perform deeper investigations.

How Holmes Works

HolmesGPT is an AI troubleshooting agent that investigates issues by pulling data from your existing observability stack. The more data sources you connect, the more thoroughly Holmes can investigate — correlating metrics with logs, tracing infrastructure changes to application failures, and building a complete picture of what went wrong.

Holmes works across cloud, on-premise, and hybrid environments. If you use Kubernetes, the Kubernetes toolsets are enabled automatically. But Kubernetes is not required — Holmes works equally well with Prometheus, Datadog, Elasticsearch, AWS, GCP, databases, and many other data sources. Configure the toolsets that match your stack.

1. Connect a Metrics Provider

Metrics give Holmes visibility into trends over time. Without metrics, Holmes can still investigate using logs and infrastructure state, but it won't be able to spot gradual degradation or correlate historical information as well. Metrics are also critical to answering numerical questions, like 'what is the error rate for service xyz?'

Connect whichever metrics platform you already use:

Platform Setup Guide Notes
Prometheus Setup Most common. Works with self-hosted, Grafana Cloud (Mimir), AWS AMP, Azure Managed Prometheus, Google Managed Prometheus, and Coralogix PromQL
Datadog Setup Enable datadog/metrics (and optionally datadog/logs, datadog/traces, datadog/general)
New Relic Setup Uses NRQL for metrics, traces, and logs in one toolset
Coralogix Setup For Coralogix-native log and metrics queries

Quick example (Prometheus):

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
  prometheus/metrics:
    enabled: true
    config:
      prometheus_url: http://prometheus-server.monitoring:9090

When using the standalone Holmes Helm Chart, update your values.yaml:

toolsets:
  prometheus/metrics:
    enabled: true
    config:
      prometheus_url: http://prometheus-server.monitoring:9090

Apply the configuration:

helm upgrade holmes holmes/holmes --values=values.yaml

When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:

holmes:
  toolsets:
    prometheus/metrics:
      enabled: true
      config:
        prometheus_url: http://prometheus-server.monitoring:9090

Apply the configuration:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

2. Connect Centralized Logging

Centralized logging gives Holmes access to historical logs, cross-service log correlation, and full-text search across your environment. This is especially important for investigating issues where logs from the affected service are no longer available — crashed processes, terminated containers, rotated log files, or services running on VMs and bare metal.

Platform Setup Guide Notes
Loki Setup Can connect through Grafana or directly
Elasticsearch / OpenSearch Setup elasticsearch/data for log search, elasticsearch/cluster for cluster health
Datadog Logs Setup Enable datadog/logs alongside metrics
Splunk Setup Via MCP server

Quick example (Loki via Grafana):

Add the following to ~/.holmes/config.yaml. Create the file if it doesn't exist:

toolsets:
  grafana/loki:
    enabled: true
    config:
      api_key: <your-grafana-token>
      api_url: https://your-grafana.net
      grafana_datasource_uid: <loki-datasource-uid>

When using the standalone Holmes Helm Chart, update your values.yaml:

toolsets:
  grafana/loki:
    enabled: true
    config:
      api_key: <your-grafana-token>
      api_url: https://your-grafana.net
      grafana_datasource_uid: <loki-datasource-uid>

Apply the configuration:

helm upgrade holmes holmes/holmes --values=values.yaml

When using the Robusta Helm Chart (which includes HolmesGPT), update your generated_values.yaml:

holmes:
  toolsets:
    grafana/loki:
      enabled: true
      config:
        api_key: <your-grafana-token>
        api_url: https://your-grafana.net
        grafana_datasource_uid: <loki-datasource-uid>

Apply the configuration:

helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

3. Connect Your Cloud Provider

Cloud provider access lets Holmes investigate infrastructure-level causes — misconfigured security groups, IAM permission changes, database failovers, load balancer issues, DNS misconfigurations, or resource quota limits. Many production incidents involve changes at the infrastructure layer that aren't visible from application metrics or logs alone.

Platform Setup Guide Notes
AWS Setup Read-only access to EC2, RDS, ELB, CloudWatch, CloudTrail, and more via MCP server
GCP Setup Logging, monitoring, traces, gcloud CLI, and storage via MCP server
Azure Setup Azure resource management via MCP server

4. Connect Grafana Dashboards (Bonus)

If you use Grafana, connecting the dashboards toolset lets Holmes see what you're already monitoring — it can find relevant dashboards, extract PromQL queries from panels, and use them during investigations.

Platform Setup Guide
Grafana Dashboards Setup

Verify Your Setup

After configuring your data sources, verify everything is connected:

# List all enabled toolsets
holmes toolset list

# Test with a real investigation
holmes ask "what is the health of my environment?"

Next Steps