Early access · Kafka 4.x · KRaft-native

Stop babysitting dashboards. Ask your cluster.

Gregor connects your AI agent to Apache Kafka and returns answers and diagnoses in plain language — grounded in real signals, explainable, and read-only by design. No new dashboard to learn.

No card. Self-host or managed cloud. We email you when your slot opens.

clusterprod-eu-1 brokers6 topics214 partitions3,128 read-only

›

consumer-group orders-billing lag is climbing on payments. why, and is it the brokers?

DIAGNOSIS · consumer-lag root cause rule R-014

finding Partition payments-3 lag 1,243,902 and rising at ~8.4k msg/s.

cause One consumer left during an incremental rebalance (KIP-848). payments-3 reassigned to pod/orders-7, processing 4× slower (p99 1,840ms vs 460ms).

verdict Not a broker issue. ISR healthy, no under-replicated partitions. Throughput-bound consumer.

detection deterministic evidence 17 signals llm role grounded synthesis

› _

Representative output. Every root cause is synthesized from real signals — never invented.

/ the problem

Another dashboard won't tell you what's wrong.

Control Center, Conduktor, AKHQ, Redpanda Console, Grafana stacks — they render the metrics beautifully and leave the hard part to you. Diagnosing lag spikes, under-replicated partitions and rebalances still means staring at graphs and leaning on tribal knowledge that lives in one engineer's head.

If you run Kafka without a dedicated SRE or observability team, that engineer is you — usually at 2 a.m.

dashboard-centric tools

You ask the question. You read ten panels. You correlate them in your head. You guess.

tribal knowledge

"Check ISR, then the rebalance logs, then the slow consumer" — undocumented, unowned, unscalable.

heavy footprint

Capture infra, agents, sidecars, retention — another system you now have to operate.

gregor

You ask in plain language. You get a root cause, the evidence, and a verdict.

/ how it works

From zero to grounded answers.

Stand Gregor up next to your cluster, connect the agent you already use, and start asking — or let us run the core for you.

1

Drop in the agent

Deploy one lightweight agent next to your Kafka cluster — Docker Compose, a single binary, or Helm — and give it a read-only Kafka principal. It connects outbound only, so there are no inbound ports to open. Self-hosted, your data never leaves your network.

read-only principal outbound-only Compose · binary · Helm deploy options →
2

Connect your AI agent

Point the agent you already use — Claude, Cursor, or an internal agent — at Gregor's MCP (Model Context Protocol) server. One config block, and there's no new dashboard to learn.

configure your client →
3

Ask in plain language

Ask the way you'd ask a teammate. Your agent calls Gregor's read-only tools and answers — grounded in your cluster's real, live state.

› why is the orders consumer group lagging?

› are any partitions under-replicated?

// prefer managed?

Use Gregor Cloud

Don't want to operate the core yourself? Deploy only the lightweight collector in your network — still read-only, still outbound-only — and we host, run and upgrade everything else. Point your agent at your hosted Gregor endpoint and ask away. Same workflow, nothing to run or upgrade beyond the collector.

Transparency In managed mode, metrics and cluster metadata are sent to Gregor Cloud over an encrypted, outbound-only connection. To keep all data inside your network, choose self-hosted.

/ why you can trust it

Grounded, not guessing.

A deterministic engine detects incidents and the signals behind them; the LLM synthesizes the root cause from that real evidence — it never invents one. Detection is explainable, and the model only ever reasons over signals Gregor actually observed.

01 · collect

Read the facts

Consumer lag, throughput, topology and ISR health, config anti-patterns — pulled live and continuously over a read-only principal.

02 · detect

Surface the incident

Deterministic rules flag the incident and the signals behind it, continuously — consumer-lag, under-replication, offline partitions, rebalance fault. Reproducible, not a model's opinion.

03 · synthesize

Name the root cause

The LLM synthesizes the root cause from that evidence and explains it — grounded in what Gregor observed, never fabricated.

signals → deterministic detection → grounded synthesis · every claim traces back to real evidence

/ capabilities

Four things to do with your cluster, in plain language.

diagnose

Root cause, not just symptoms

The deterministic Incident Diagnosis Engine finds the cause of consumer-lag, detects under-replicated and offline partitions, and analyses rebalances — KIP-848 aware.

observe

The whole picture, on demand

Consumer lag per group, topic and partition. Throughput, cluster topology, ISR health and historical trends — ask for exactly the slice you need.

inspect

Read the log directly

On-demand message inspection and replay straight from the log. No capture pipeline, no sidecars, no extra retention to operate.

connect

Agent-native, no dashboard

An MCP server your agent talks to. Ask natural-language questions from the tools you already work in — there's no console to babysit.

/ security

Secure and sovereign by design.

Gregor is built for teams that won't hand their cluster to a black box. It connects with the least privilege it can, never writes, and — self-hosted — keeps every byte inside your network.

read-only principal: It can describe and consume. It cannot produce, alter or delete. Full stop.
least-privilege ACLs: Scoped to exactly the topics and operations it needs — nothing more.
outbound-only agent: No inbound ports to open. The agent dials out; nothing dials in.
TLS / SASL: Encrypted and authenticated against your brokers, the way you already run them.

# least-privilege, read-only

kafka-acls --add \
  --allow-principal User:gregor \
  --operation Read \
  --operation Describe \
  --topic '*' \
  --group '*'

no Write, no Delete, no Alter — by construction

/ deploy your way

Run it wherever you already run things.

A single lightweight Go agent plus the core service — Kafka 4.x / KRaft native, no ZooKeeper assumptions. Self-host it and your data never leaves your network, or let us run the managed cloud.

// self-hosted, three ways

Docker Compose fastest

Try it in one command

Postgres included. Point it at your cluster with read-only credentials and you're observing.

$ docker compose up

Single binary air-gapped

Bare VMs, no runtime

One static Go binary for air-gapped environments. Bring your own Postgres.

$ ./gregor-agent --config gregor.yaml

Helm kubernetes

Native to your cluster

One chart for Kubernetes — runs happily alongside Strimzi and other operators.

$ helm install gregor gregor/agent

// fully managed cloud

We run it for you — without holding your data.

Offload the service to us. Your collector stays inside your network and is outbound-only — nothing dials in, and your messages never leave.

Request managed access →

minimal footprint one Go agent + core service · Postgres is the only dependency · no heavy observability stack to operate

/ works with your agents

Speaks MCP. Bring your own agent.

Gregor exposes a Model Context Protocol server, so it plugs into the agents your team already uses — no bespoke integration, no lock-in. Point any MCP client at your endpoint and start asking.

endpoint

https://gregor.your-net.internal/mcp

mcp client

claude_desktop_config.json

{
  "mcpServers": {
    "gregor": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://gregor.your-net.internal/mcp"]
    }
  }
}

Toggle the endpoint above · self-hosted stays inside your network, managed cloud is a hosted URL · stdio clients use mcp-remote

/ why now

Three things changed at once.

01
Kafka 4.0 dropped ZooKeeper

KRaft-only changes the operational model. New tooling should be born native to it — not retrofitted.
02
KIP-848 changed rebalances

The new consumer-group protocol shifts how lag and reassignment behave. The diagnosis rules have to know it.
03
Agents and MCP matured

Capable agents and a real interop protocol make agent-native ops practical — not a demo.

/ faq

Straight answers.

What is Gregor? +

Gregor is an AI-agent-native observability and incident-diagnosis tool for Apache Kafka. You connect the AI agent you already use to your cluster over the Model Context Protocol (MCP) and ask questions in plain language — instead of reading another dashboard.

How is Gregor different from Control Center, Conduktor, AKHQ, Redpanda Console or Grafana? +

Those tools are dashboard-centric: they render the metrics and leave the diagnosis to you. Gregor is agent-native — it surfaces the incident and the signals behind it, then answers in plain language, so you don't have to correlate ten panels and lean on tribal knowledge.

Does the LLM guess the diagnosis? +

No. A deterministic engine detects incidents and the signals behind them; the LLM only synthesizes the explanation from that real evidence — it reasons over facts Gregor actually observed and never invents a cause.

What access does Gregor need? Is my data safe? +

Gregor connects with a read-only Kafka principal and least-privilege ACLs. The agent is outbound-only — no inbound ports to open — and traffic is encrypted over TLS/SASL. It can describe and consume; it cannot produce, alter or delete.

Self-hosted or managed — where does my data live? +

Self-hosted keeps all data inside your network. With Gregor Cloud you deploy only the lightweight collector — still read-only, still outbound-only — and metrics and cluster metadata are sent to Gregor Cloud over an encrypted, outbound-only connection. Choose self-hosted to keep everything in-network.

Which AI agents and Kafka versions are supported? +

Gregor speaks MCP, so it works with Claude, Cursor, Cline, GitHub Copilot, Codex, Windsurf, opencode, and any MCP client. It's built for Kafka 4.x / KRaft — no ZooKeeper assumptions.

Is Gregor available now? +

Gregor is in early access. Join the waitlist and we'll reach out when a slot opens — self-hosted or managed, your call.

early access · coming soon

Get your cluster on the waitlist.