Respond to incidents faster with automatic context gathering. This workflow triggers on PagerDuty alerts, pulls relevant metrics from Datadog, uses AI to analyze the situation, and delivers enriched incident reports to Slack with everything responders need to start debugging immediately.

Receive incident alert from PagerDuty

The workflow triggers when a new incident is created in PagerDuty. It captures the incident title, description, severity, affected service, and any alert details that triggered the incident for context.

Pull relevant metrics from Datadog

Based on the affected service, the workflow queries Datadog for relevant metrics including error rates, latency percentiles, request volumes, and resource utilization for the time window around the incident. It also checks for any related alerts.

Analyze incident context with OpenAI

Using OpenAI, the workflow analyzes the incident details and metrics to identify likely root causes, correlate with recent deployments or changes, and suggest initial investigation steps. The AI provides a hypothesis based on available data.

Generate incident response recommendations

The AI generates specific response recommendations including which dashboards to check, what commands to run, and what mitigation steps to consider. Recommendations are tailored to the incident type and affected service.

Deliver enriched incident to Slack

An enriched incident report is posted to the on-call Slack channel with incident summary, relevant metric graphs, AI analysis, and recommended actions. Responders can start debugging immediately without gathering context manually.

Why automate incident enrichment with AI?

When an incident strikes, every minute counts. On-call engineers waste precious time gathering context, checking dashboards, and correlating data before they can start fixing the problem. AI-powered enrichment delivers this context instantly.

Reduce time to first meaningful action

Instead of spending the first 10-15 minutes gathering information, responders get a comprehensive briefing immediately. They can start investigating the root cause right away.

Provide consistent incident context

Manual context gathering varies by person and time of day. AI enrichment provides the same comprehensive analysis for every incident, ensuring nothing important is missed.

Help less experienced on-call engineers

Junior engineers benefit from AI-suggested investigation steps and runbook guidance. They get expert-level starting points even for unfamiliar services.

How to set up AI incident enrichment

Setting up this PagerDuty enrichment workflow takes about 15 minutes. You'll connect your monitoring tools and configure service mappings.

What you need to get started

PagerDuty account with incident webhooks
Datadog account for metrics access
OpenAI API key for analysis
Slack workspace for incident channel

Configuring service mappings

Map PagerDuty services to Datadog dashboards and metrics
Define which metrics are relevant for each service type
Set up integration with your deployment tracking
Configure runbook references for common incident types

Customizing AI analysis

Provide context about your architecture and dependencies
Define common failure modes for your services
Specify your incident response procedures
Include any team-specific debugging approaches

Frequently asked questions about AI incident enrichment

Does this work with other monitoring tools besides Datadog?

Yes, you can integrate with any monitoring platform with API access including New Relic, Grafana, CloudWatch, or Prometheus. The AI analysis works with metrics from any source.

How does AI identify likely root causes?

The AI correlates incident timing with metric anomalies, recent deployments, and known failure patterns. It provides hypotheses to investigate rather than definitive diagnoses.

Can this integrate with our existing runbooks?

Yes, you can provide runbook content to the AI so it can reference relevant procedures in its recommendations. This ensures suggestions align with your documented processes.

What if the AI analysis is wrong?

The AI provides hypotheses and suggestions, not certainties. Experienced engineers use it as a starting point and apply their own judgment. Over time you can refine the AI context for better accuracy.

Enrich PagerDuty incidents with AI analysis and Datadog context

Steps in this workflow

Categories

Use Cases

Integrations Used