Holmesgpt

Name: Holmesgpt
Author: HolmesGPT

SRE Agent - CNCF Sandbox Project

by HolmesGPT

Rating

0.0

Votes

score

Downloads

total

Price

Free

API key required

Works With

Claude CodeCursorWindsurfVS CodeDeveloper tool

About

HolmesGPT — The CNCF SRE Agent

Installation | Docs |

Open-source AI agent for investigating production incidents and finding root causes. Works with any stack — Kubernetes, VMs, cloud providers, databases, and SaaS platforms. We are a Cloud Native Computing Foundation sandbox project. Originally created by Robusta.Dev, with major contributions from Microsoft.

New: Operator Mode — Find Problems 24/7 in the Background

Most AI agents are great at troubleshooting problems, but still need a human to notice something is wrong and trigger an investigation. Operator mode fixes that — HolmesGPT runs in the background 24/7, spots problems before your customers notice, and messages you in Slack with the fix. Connect the GitHub integration and it can even open PRs to fix what it finds.

While the operator itself runs in Kubernetes, health checks can query any data source Holmes is connected to — VMs, cloud services, databases, SaaS platforms, and more.

[Deployment verification](https://holmesgpt.dev/operator/deployment-verification/) — Deploy a health check alongside your app to verify the new version is healthy
[Scheduled health checks](https://holmesgpt.dev/operator/scheduled-health-checks/) — Continuously monitor services and catch regressions automatically

Features

Petabyte-scale data: Server-side filtering, JSON tree traversal, and tool output transformers keep large payloads out of context windows
Memory-safe execution: Per-tool memory limits, streaming large results to disk, and automatic output budgeting prevent OOM kills when querying large observability datasets
[Deep integrations](https://holmesgpt.dev/data-sources/builtin-toolsets/): Prometheus, Grafana, Datadog, Kubernetes, and many more—plus any REST API
Bidirectional alert integrations: Fetch alerts from AlertManager, PagerDuty, OpsGenie, or Jira—and write findings back
[Any LLM provider](https://holmesgpt.dev/ai-providers/): OpenAI, Anthropic, Azure, Bedrock, Gemini, and more
No Kubernetes required: Works with any infrastructure — VMs, bare metal, cloud services, or containers

How it Works

HolmesGPT uses an agentic loop to query live observability data from multiple sources and identify root causes.

🔗 Data Sources

HolmesGPT integrates with popular observability and cloud platforms. The following data sources ("toolsets") are built-in. Add your own.

Don't lose this

Three weeks from now, you'll want Holmesgpt again. Will you remember where to find it?

Save it to your library and the next time you need Holmesgpt, it’s one tap away — from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.

⚡ Pro tip for geeks: add a-gnt 🤵🏻‍♂️ as a custom connector in Claude or a custom GPT in ChatGPT — one click and your library is right there in the chat. Or, if you’re in an editor, install the a-gnt MCP server and say “use my [bench name]” in Claude Code, Cursor, VS Code, or Windsurf.

🤵🏻‍♂️

a-gnt's Take

Our honest review

SRE Agent - CNCF Sandbox Project. Best for anyone looking to make their AI assistant more capable in devops & monitoring. It's completely free and works across most major AI apps.

Tips for getting started

Tap "Get" above, pick your AI app, and follow the steps. Most installs take under 30 seconds.

Heads up: this needs an API key to work. You'll get one from the service's website (usually free). The setup guide tells you exactly where.

DevOps & Monitoring