Skip to content
HOME / CYBERSECURITY / 6 PHASES OF A 2 years AGO

Cybersecurity

6 Phases of a Cyber Incident Response Plan: An Azure-Focused Practitioner Guide

6 Phases of a Cyber Incident Response Plan: An Azure-Focused Practitioner Guide

Last Updated on May 15, 2026 by Arnav Sharma

Incident response plans that live in SharePoint folders are not incident response plans. They are compliance artefacts. The difference between an organisation that recovers from a security incident in hours and one that recovers in weeks is almost never the quality of the documented plan. It is the degree to which the plan has been operationalised: tested, tooled, and practised before the incident occurs.

This guide covers the six phases of an effective incident response plan with specific focus on Azure environments, the Microsoft tooling that supports each phase, and the operational decisions that separate functional IR from theoretical IR.


Why Incident Response Fails in Practice

Before addressing the six phases, it is worth understanding the failure modes that make IR plans ineffective, because they are predictable and preventable.

The most common failure is detection latency. Mandiant’s 2024 M-Trends data puts the global median dwell time at 11 days. That is 11 days between initial compromise and the first alert that something is wrong. In Azure environments, that dwell time often reflects the absence of monitoring coverage on specific data sources, not a sophisticated attacker evading detection.

The second failure is playbook gaps. IR plans document what to do during a ransomware attack but not how to do it in the specific Azure environment the organisation operates. “Isolate the affected system” is a step. “Run this KQL query in Sentinel to identify all resources communicating with the suspicious IP, then apply this NSG rule via Azure CLI” is an operationalised playbook.

The third failure is decision authority gaps. During an active incident, the question “who has the authority to isolate this production workload?” should have a predetermined answer. Organisations that surface this question for the first time during an incident add hours of delay to containment decisions.


Phase 1: Preparation

Preparation is the only phase that happens before the incident. Every hour invested in preparation reduces the cost and duration of the incidents that will inevitably occur.

Tooling baseline for Azure environments

The minimum viable incident response tooling for Azure is Microsoft Sentinel (SIEM and SOAR), Microsoft Defender XDR (endpoint, identity, cloud, and email detection), and Defender for Cloud with enhanced security features enabled on all subscriptions. Without this foundation, the detection phase operates on intuition rather than data.

Beyond tooling, preparation includes:

Asset inventory. Know what is running in every Azure subscription. Azure Resource Graph enables queries across the entire tenant, returning a complete inventory of resources, their configurations, and their network exposure. Running az graph query -q "Resources | project name, type, location, resourceGroup, subscriptionId" from a scheduled task gives you a baseline to compare against during an incident.

Contact lists. A current list of technical contacts, business owners, legal counsel, cyber insurers, and external IR retainer contacts. Contact lists that have not been validated in six months may have outdated information for key people needed in the first two hours of an incident.

Access verification. Every member of the IR team should have verified access to Sentinel, Defender XDR, Entra ID audit logs, and Azure subscriptions before an incident. An IR team member discovering they lack the required permissions at 2am during an active incident is a preventable operational failure.

Communication templates. Draft internal escalation messages, board notification templates, and regulatory notification templates before they are needed. The Australian Notifiable Data Breaches scheme requires notification to the OAIC within 30 days of a breach determination. Having a template that requires only incident-specific detail reduces the risk of missing regulatory timelines during the stress of active response.


Phase 2: Detection and Analysis

Detection is where most IR effort is spent and where the greatest variance in organisational capability exists. Detection in Azure environments has two components: automated alerting through Microsoft Sentinel and Defender XDR, and human-initiated investigation following an anomalous signal.

What good detection looks like in Sentinel

A well-configured Sentinel deployment generates alerts for the most common Azure attack techniques: anomalous service principal activity, Entra ID sign-ins from impossible travel distances, mass file deletion or modification events (ransomware precursors), and Azure Resource Manager operations outside of change windows.

The KQL queries that matter most in the first 30 minutes of incident detection are:

Identifying all sign-ins from an anomalous IP address across the last 7 days:

kql

SigninLogs
| where IPAddress == "SUSPECT_IP"
| project TimeGenerated, UserPrincipalName, AppDisplayName, ResultType, Location
| order by TimeGenerated desc

Identifying Azure resources modified by a suspect account:

kql

AzureActivity
| where Caller == "[email protected]"
| where TimeGenerated > ago(7d)
| project TimeGenerated, OperationNameValue, ResourceGroup, Resource, ActivityStatusValue
| order by TimeGenerated desc

Severity classification

Not every Sentinel alert is a P1 incident. The classification framework should be defined in advance: P1 (active attack with confirmed data access or system impact, immediate response), P2 (confirmed initial access without lateral movement, same-day response), P3 (anomalous behaviour without confirmed malicious activity, next business day investigation).

Misclassifying a P1 as a P3 because the IR team is unsure how to interpret Sentinel alerts is a preparation failure. Teams that have conducted tabletop exercises using real Sentinel alert data develop the pattern recognition to classify accurately under pressure.


Phase 3: Containment

Containment has two objectives: stop the active attack and preserve evidence. These objectives can conflict. The fastest containment action (disabling the compromised account and isolating the affected VM) may destroy forensic evidence if done without logging the pre-containment state.

Short-term containment in Azure

For a compromised Entra ID account: disable the account in Entra ID, revoke all active sessions (the “Revoke sessions” option in the Entra ID user properties), and review all role assignments and application consents associated with the account in the last 30 days.

For a compromised Azure VM: use Defender for Endpoint’s “Isolate device” action, which quarantines the VM from all network traffic except the Defender for Endpoint management channel, preserving the ability to continue investigation without removing the VM from management. Do not delete the VM. Do not run antivirus. Preserve the disk state for forensic analysis.

For a compromised storage account: disable shared key access and rotate all SAS tokens, then apply a resource lock to prevent deletion of the storage account and its contents. The storage account logs in the associated diagnostic settings should be immediately archived to a separate, read-only storage account controlled by the IR team.

Long-term containment

Long-term containment covers the period between initial response and full remediation. In Azure, this typically involves deploying compensating controls: tightened NSG rules blocking the attacker’s observed communication patterns, Conditional Access policies requiring step-up MFA for all access until the investigation completes, and elevated Defender for Cloud alerting thresholds to catch any re-establishment of attacker access.


Phase 4: Eradication

Eradication removes the attacker’s presence from the environment. In Azure, eradication is not complete until every persistence mechanism the attacker established has been identified and removed.

Persistence hunting in Azure

The persistence mechanisms most commonly left by attackers in Azure environments are: new service principals with credentials the attacker controls, new owner assignments on existing application registrations, modified Conditional Access policies that exclude attacker-controlled accounts, and automation runbooks scheduled to re-establish access.

The Entra ID audit log is the primary data source for persistence hunting. Filter to the incident timeframe and review all operations in these categories: Add service principal, Add owner to application, Update conditional access policy, Create automation runbook, Add credentials to service principal.

Any operation from the incident timeframe that cannot be attributed to a specific, authorised administrative action should be treated as a potential persistence artefact and investigated before closing the incident.

Azure resource eradication

For Azure VMs confirmed as compromised, re-imaging from a known-good image is more reliable than attempting to clean the operating system. Defender for Endpoint’s post-isolation investigation capabilities support forensic analysis of the disk image before re-imaging.

For compromised storage accounts, the decision between account rotation (creating a new account with the same data) and continued use of the existing account depends on whether the access vector was the account credentials or the underlying network exposure. If an overly permissive SAS token was the entry point, rotating the SAS token addresses the eradication requirement. If the network ACL was the entry point, tightening the network ACL without rotating credentials leaves the existing credentials as a residual risk.


Phase 5: Recovery

Recovery returns affected systems to normal operation. The most common mistake in this phase is recovering too quickly, restoring systems before eradication is confirmed complete and before the detection controls are in place to identify re-infection.

Recovery sequencing for Azure workloads

The recovery sequence should follow the dependency structure of the affected workloads. Recover identity infrastructure (Entra ID configuration validation) before recovering application servers, and recover application servers before restoring database access. An application recovered before its identity dependencies are clean may re-establish connections to attacker-controlled infrastructure.

Azure Business Continuity and Disaster Recovery planning (Azure Site Recovery and Azure Backup) should have pre-defined recovery time objectives (RTOs) and recovery point objectives (RPOs) per workload tier. An organisation that has never tested Azure Backup restoration for a production workload should not be discovering the restoration process for the first time during incident recovery.

Validation before returning to production

Before returning a recovered workload to production traffic, validate: no attacker-controlled credentials remain in Key Vault or application configuration, all NSG rules reflect the intended post-incident baseline and not the pre-incident configuration that may have permitted the initial access, Defender for Endpoint is reporting green on all recovered VMs, and Sentinel analytics rules are generating expected baseline alerts rather than unusual silence (which may indicate a data pipeline issue introduced during recovery).


Phase 6: Post-Incident Activity

The post-incident review is where the IR cycle improves. Without a structured post-incident review, the same incident patterns recur because the root causes are never addressed.

The 5-day post-incident review

Best practice is a post-incident review within 5 business days of incident closure, while the details are still fresh. The review should address four questions: What happened and why? What did the IR team do well? What would be done differently? What specific changes to detection, prevention, or response procedures will prevent recurrence?

The output of the review should be specific, assigned action items with owners and deadlines. “Improve our monitoring” is not an action item. “Enable Sentinel analytics rule ‘ServicePrincipalCreatedByNonPrivilegedAccount’ and validate it fires correctly in the test environment by [date]” is an action item.

Metrics worth tracking

Mean time to detect (MTTD), mean time to contain (MTTC), and mean time to recover (MTTR) provide longitudinal tracking of IR capability improvement over time. An organisation that tracks these metrics across incidents will see the impact of preparation investments: better playbooks reduce MTTC, better detection rules reduce MTTD, better backup processes reduce MTTR.

For Azure environments, Sentinel workbooks can track these metrics automatically from incident creation and closure timestamps, generating a rolling IR performance dashboard that supports both operational improvement and board-level reporting.


Building the Playbook Library

The six phases provide the framework. Playbooks provide the operationalised procedures for specific incident scenarios. The minimum playbook library for Azure environments should cover: ransomware (including Azure-specific containment and recovery steps), compromised Entra ID account, data exfiltration from Azure Storage, insider threat, and supply chain compromise via a third-party Entra ID application.

Microsoft Sentinel’s automation features (Playbooks, based on Azure Logic Apps) allow incident response actions to be automated for specific alert types: automatically isolating a device when Defender for Endpoint generates a high-severity alert, automatically disabling an Entra ID account when Identity Protection generates a high-risk user alert, and automatically posting incident details to a Teams channel for IR team notification.

The goal is not to automate the IR decision. It is to automate the mechanical actions that consume time without requiring human judgement, so the IR team’s cognitive capacity is focused on the decisions that actually matter.

Arnav Sharma
Arnav Sharma Microsoft MVPMCT
Microsoft Certified Trainer · Cloud · Cybersecurity · AI

I help organisations secure their cloud infrastructure and stay ahead of evolving cyber threats. Microsoft MVP and Certified Trainer, author of Mastering Azure Security, and founder of arnav.au — a platform for practical Cloud, Cybersecurity, DevOps and AI content.

Frequently Asked Questions

KEEP READING

Leave a reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.