Respond to incidents faster with automatic context gathering. This workflow triggers on PagerDuty alerts, pulls relevant metrics from Datadog, uses AI to analyze the situation, and delivers enriched incident reports to Slack with everything responders need to start debugging immediately.
Receive incident alert from PagerDuty
The workflow triggers when a new incident is created in PagerDuty. It captures the incident title, description, severity, affected service, and any alert details that triggered the incident for context.
Pull relevant metrics from Datadog
Based on the affected service, the workflow queries Datadog for relevant metrics including error rates, latency percentiles, request volumes, and resource utilization for the time window around the incident. It also checks for any related alerts.
Analyze incident context with OpenAI
Using OpenAI, the workflow analyzes the incident details and metrics to identify likely root causes, correlate with recent deployments or changes, and suggest initial investigation steps. The AI provides a hypothesis based on available data.
Generate incident response recommendations
The AI generates specific response recommendations including which dashboards to check, what commands to run, and what mitigation steps to consider. Recommendations are tailored to the incident type and affected service.
Deliver enriched incident to Slack
An enriched incident report is posted to the on-call Slack channel with incident summary, relevant metric graphs, AI analysis, and recommended actions. Responders can start debugging immediately without gathering context manually.
Why automate incident enrichment with AI?
When an incident strikes, every minute counts. On-call engineers waste precious time gathering context, checking dashboards, and correlating data before they can start fixing the problem. AI-powered enrichment delivers this context instantly.
Reduce time to first meaningful action
Instead of spending the first 10-15 minutes gathering information, responders get a comprehensive briefing immediately. They can start investigating the root cause right away.
Provide consistent incident context
Manual context gathering varies by person and time of day. AI enrichment provides the same comprehensive analysis for every incident, ensuring nothing important is missed.
Help less experienced on-call engineers
Junior engineers benefit from AI-suggested investigation steps and runbook guidance. They get expert-level starting points even for unfamiliar services.
How to set up AI incident enrichment
Setting up this PagerDuty enrichment workflow takes about 15 minutes. You'll connect your monitoring tools and configure service mappings.
What you need to get started
- PagerDuty account with incident webhooks
- Datadog account for metrics access
- OpenAI API key for analysis
- Slack workspace for incident channel
Configuring service mappings
- Map PagerDuty services to Datadog dashboards and metrics
- Define which metrics are relevant for each service type
- Set up integration with your deployment tracking
- Configure runbook references for common incident types
Customizing AI analysis
- Provide context about your architecture and dependencies
- Define common failure modes for your services
- Specify your incident response procedures
- Include any team-specific debugging approaches
Frequently asked questions about AI incident enrichment
Does this work with other monitoring tools besides Datadog?
Yes, you can integrate with any monitoring platform with API access including New Relic, Grafana, CloudWatch, or Prometheus. The AI analysis works with metrics from any source.
How does AI identify likely root causes?
The AI correlates incident timing with metric anomalies, recent deployments, and known failure patterns. It provides hypotheses to investigate rather than definitive diagnoses.
Can this integrate with our existing runbooks?
Yes, you can provide runbook content to the AI so it can reference relevant procedures in its recommendations. This ensures suggestions align with your documented processes.
What if the AI analysis is wrong?
The AI provides hypotheses and suggestions, not certainties. Experienced engineers use it as a starting point and apply their own judgment. Over time you can refine the AI context for better accuracy.