How to Set Up Grafana Assistant for Instant Infrastructure Insights

Introduction

When an unexpected alert hits, engineers typically rush to ask their AI assistant for help. But without pre-loaded context, the assistant must ask you for details about data sources, services, connections, and metrics—every single time. This back-and-forth wastes precious minutes during an incident. Enter Grafana Assistant, an agentic observability assistant that studies your infrastructure ahead of time, building a persistent knowledge base. By the time you ask your first question, it already knows what's running, how things connect, and where to look. This guide walks you through setting up Grafana Assistant so you can skip context-sharing and dive straight into troubleshooting.

How to Set Up Grafana Assistant for Instant Infrastructure Insights

What You Need

A Grafana Cloud stack (any tier that includes Grafana Assistant)
At least one of the following data sources connected: Prometheus, Loki, or Tempo
Admin or Editor permissions to enable and configure Grafana Assistant
Basic familiarity with your infrastructure (services, deployments, metrics)

Step 1: Ensure Your Grafana Cloud Stack Is Ready

Grafana Assistant runs on top of your existing Grafana Cloud stack. Make sure your data sources are properly configured and accessible. You need at least one Prometheus data source for metrics discovery. For richer context, also add Loki for logs and Tempo for traces. If you haven't already, connect these sources via the Data Sources page in your Grafana instance. Once connected, verify that they are healthy and returning data.

Step 2: Enable Grafana Assistant

Grafana Assistant is typically enabled by default in most Grafana Cloud stacks. To confirm, go to Administration > Plugins and data > Assistant. If it’s not active, toggle the switch to enable it. No additional configuration is required—the assistant starts working immediately in the background.

Step 3: Let the AI Agents Discover Your Data Sources

Once enabled, a swarm of AI agents automatically scans your connected data sources. This process requires zero configuration from you. The agents perform the following:

Data source discovery: They identify all Prometheus, Loki, and Tempo data sources in your stack.
Metrics scans: Each Prometheus data source is queried in parallel to find services, deployments, and infrastructure components.
Enrichments via logs and traces: Loki and Tempo data sources are correlated with corresponding metrics, adding context about log formats, trace structures, and service dependencies.

This discovery runs continuously, so any new data sources or changes are picked up automatically.

Step 4: Monitor Knowledge Base Build

As the agents work, they generate structured documentation for each discovered service group. This documentation covers five areas:

What the service is (name, purpose)
Key metrics and labels (e.g., latency, error rate, deployment version)
How it's deployed (Kubernetes, EC2, etc.)
Dependencies (upstream and downstream services)
Relevant log and trace sources

You can view the knowledge base by opening the Grafana Assistant panel and checking the “Environment” tab. Here you’ll see a map of your infrastructure, with services and connections clearly labeled. The assistant updates this knowledge base periodically, so it reflects your current setup.

Step 5: Query Your Infrastructure

Now you can ask questions directly. For example, type “Why is the checkout service slow?” or “What are the upstream dependencies of payment?” Because the assistant already knows your environment, it won’t ask for clarification about data sources. Instead, it retrieves relevant metrics, logs, and traces instantly. You’ll get answers like:

“The checkout service depends on three downstream services: inventory, payment, and shipping.”
“Latency metrics for checkout are stored in the `prod-prometheus` data source under the label `service="checkout"`.”
“Recent logs show a spike in 5xx errors; the corresponding traces point to the inventory service.”

This eliminates the back-and-forth and accelerates root cause analysis.

Step 6: Use Context for Incident Response

During an incident, every second counts. Use Grafana Assistant to rapidly assess the situation. For example, when an alert fires, you can ask:

“What other services depend on this?”
“What changed in the last hour?”
“Show me the corresponding logs and traces for this metric spike.”

Because the assistant already has the full context, it can correlate data across multiple sources without you having to jump between dashboards. This is especially powerful for teams where not everyone knows the entire infrastructure. A developer unfamiliar with the payment system can ask about upstream dependencies and get accurate, complete answers immediately.

Tips for Best Results

Zero setup required: You don’t need to configure anything beyond enabling the assistant—just let it run.
Keep data sources healthy: Regularly check that your Prometheus, Loki, and Tempo endpoints are up and responsive. The assistant relies on them.
Leverage the knowledge base map: Review the Environment tab periodically to ensure the assistant has captured your entire stack. If something is missing, verify the corresponding data source is connected and scoped properly.
Train your team: Encourage everyone to ask questions freely. The assistant provides consistent answers regardless of the user’s experience level.
Combine with alerts: Set up your alert rules to trigger questions automatically (if supported). This can pre-populate answers in incident channels.

By following these steps, you’ll transform Grafana Assistant into your team’s instant infrastructure expert, slashing response times and eliminating repetitive context-sharing forever.