Cloud incident response is fundamentally different from on-premises IR. You cannot walk into a data center, pull a hard drive, and image it. Volatile evidence disappears when instances are terminated. Logs are spread across dozens of services, each with its own retention policy. And the shared responsibility model means your cloud provider controls the infrastructure layer while you are responsible for everything above it.

Despite these differences, many organizations still use on-prem IR playbooks for cloud incidents. This guide covers the unique challenges of cloud IR and provides practical guidance for evidence collection, investigation, and response across AWS, Azure, and GCP.

How Cloud IR Differs from On-Prem

Ephemeral Infrastructure

Cloud workloads are designed to be disposable. Auto-scaling groups terminate instances automatically. Containers may run for seconds. Serverless functions leave no persistent compute to investigate. If you do not capture evidence before the workload disappears, it is gone.

This means evidence preservation must be automated and proactive. You cannot rely on post-incident manual collection.

Distributed Logging

On-prem environments typically centralize logs in a SIEM. Cloud environments generate logs across dozens of services, each with different formats, retention defaults, and access patterns.

Before an incident occurs, you must know where every relevant log lives, how long it is retained, and how to query it efficiently.

API-Driven Response

Cloud containment and remediation happens through API calls, not through console access or physical intervention. Disabling an IAM user, revoking sessions, modifying security groups, and snapshotting volumes are all API operations.

Your IR team needs cloud API fluency, and your SOAR playbooks need cloud-specific integrations.

Shared Responsibility

During an incident, the cloud provider is responsible for the security of the cloud (physical infrastructure, hypervisor, managed service internals), while you are responsible for security in the cloud (your configurations, data, identity, and workloads).

You cannot ask AWS to provide hypervisor-level forensics. You can request cloud provider support for confirmed compromises through their abuse or security response programs, but the investigation is primarily your responsibility.

Multi-Account and Multi-Region Complexity

Enterprise cloud deployments span dozens or hundreds of accounts, subscriptions, or projects across multiple regions. An attacker who compromises one account may pivot to others.

Your IR capability must have cross-account visibility and access. A single-account IR plan is insufficient.

Evidence Collection by Cloud Provider

AWS Evidence Sources

CloudTrail

CloudTrail is the single most important evidence source in AWS. It records API calls made to AWS services.

Management events are enabled by default in all accounts and record control plane operations (creating EC2 instances, modifying IAM policies, changing S3 bucket configurations). Data events must be explicitly enabled and record data plane operations (S3 GetObject, Lambda invocations, DynamoDB reads). Retention is 90 days in the CloudTrail console by default; for IR purposes, send CloudTrail logs to S3 and retain for at least 1 year. Key fields include eventName, sourceIPAddress, userIdentity, requestParameters, and responseElements. Enable an organization-wide trail in your management account to capture events across all member accounts.

CloudTrail tells you who did what, when, and from where. Every cloud IR investigation starts here.

VPC Flow Logs

Network-level evidence showing traffic flows between resources. Enable on all VPCs, not just production. Capture accepted and rejected traffic. Send to S3 or CloudWatch Logs for retention. These are useful for identifying lateral movement, data exfiltration, and C2 communication.

GuardDuty Findings

AWS GuardDuty provides managed threat detection across CloudTrail, VPC Flow Logs, DNS logs, and EKS audit logs. GuardDuty findings are often the first indicator of compromise in AWS. Finding types include unauthorized API calls, cryptocurrency mining, IAM credential exfiltration, and malicious IP communication. Severity levels (1-8) help prioritize investigation.

Additional AWS Evidence Sources

SourceWhat It CapturesIR Relevance
S3 access logsObject-level access to S3 bucketsData exfiltration investigation
CloudWatch LogsApplication and system logsWorkload-level investigation
AWS ConfigConfiguration change historyIdentifying misconfigurations and unauthorized changes
Route 53 DNS logsDNS queriesC2 communication, DNS exfiltration
EKS audit logsKubernetes API server eventsContainer orchestration compromise
IAM Access AnalyzerExternal access to resourcesIdentifying exposed resources

Azure Evidence Sources

Azure Activity Log

The Azure equivalent of CloudTrail, capturing control plane operations. Retained for 90 days by default. Send to a Log Analytics workspace or storage account for longer retention. Captures resource creation, modification, deletion, and RBAC changes.

Entra ID (Azure AD) Sign-In and Audit Logs

Critical for identity-based investigations in Azure. Sign-in logs capture authentication events including IP address, location, device, conditional access policy evaluation, and MFA status. Audit logs capture directory changes including user creation, group membership changes, application registrations, and role assignments. Retention is 30 days by default (7 days for free tier), so send to Log Analytics for longer retention. Entra ID Protection flags suspicious authentication events as risky sign-ins.

Microsoft Defender for Cloud

Managed threat detection across Azure workloads. Provides security alerts for VMs, containers, storage, databases, and identity. Integrates with Microsoft Sentinel for SIEM correlation. Adaptive application controls detect anomalous process execution.

Additional Azure Evidence Sources

SourceWhat It CapturesIR Relevance
NSG Flow LogsNetwork traffic flowsLateral movement, exfiltration
Azure DNS AnalyticsDNS query logsC2 detection
Key Vault logsSecret and certificate accessCredential theft investigation
Storage AnalyticsBlob and file accessData exfiltration
AKS audit logsKubernetes API eventsContainer compromise

GCP Evidence Sources

Cloud Audit Logs

GCP’s equivalent of CloudTrail, with four log types. Admin Activity logs are always enabled, retained for 400 days, and record resource creation, modification, and IAM policy changes. Data Access logs must be enabled per service and record data reads, writes, and permission checks (can generate high volume). System Event logs are always enabled and record Google-initiated system events. Policy Denied logs are always enabled and record access attempts denied by security policies.

VPC Flow Logs

Must be enabled per subnet. Captures 5-tuple flow records (source/dest IP, ports, protocol). Configurable sampling rate and aggregation interval.

Security Command Center

GCP’s managed threat detection platform. Provides findings for misconfigurations, vulnerabilities, and active threats. Event Threat Detection analyzes audit logs for suspicious activity. Container Threat Detection monitors GKE workloads.

Additional GCP Evidence Sources

SourceWhat It CapturesIR Relevance
Cloud DNS logsDNS queriesC2 detection
Load Balancer logsHTTP(S) request logsWeb application attacks
GKE audit logsKubernetes API eventsContainer compromise
Access Transparency logsGoogle staff access to your dataInsider threat from provider
Cloud Storage access logsObject accessData exfiltration

Container Forensics

Containers present unique forensic challenges: they are ephemeral, share the host kernel, and may leave no persistent artifacts after termination.

Capturing Container Evidence

For running containers, first pause the container (do not stop it since stopping destroys volatile state). Export the container filesystem with docker export <container_id> > container.tar. Capture the container’s process list, network connections, and environment variables. Copy relevant log files from the container. If using Kubernetes, capture the pod spec, events, and logs with kubectl logs <pod> --all-containers.

For terminated containers, if the container image is still available, pull and inspect it for malicious layers. Check the container runtime logs (containerd, CRI-O) on the node. Check the orchestrator logs (Kubernetes audit logs, ECS task logs). If the node still exists, examine /var/lib/docker or /var/lib/containerd for residual data.

Container-Specific Attack Patterns

Attack PatternEvidence SourcesResponse
Container escapeHost system calls, kernel logs, container runtime logsIsolate the node, investigate host-level compromise
Malicious imageImage layer analysis, Dockerfile, registry audit logsRemove the image, scan registry, investigate supply chain
Kubernetes API abuseK8s audit logs, RBAC configurationRevoke compromised service account, audit RBAC permissions
Crypto miningCPU utilization metrics, network connections to mining poolsKill the pod, investigate initial access
Secrets exfiltrationK8s audit logs (Secret reads), container env varsRotate all secrets, audit Secret access policies

IAM Compromise Response

IAM credential compromise is the most common cloud incident type. The response pattern is consistent across providers.

Indicators of IAM Compromise

Signs include API calls from unusual IP addresses or geographic locations, API calls at unusual times, actions inconsistent with the principal’s normal behavior (for example, a developer creating IAM users or accessing production databases), GuardDuty / Defender / SCC alerts for unauthorized API usage, access key usage after an employee’s departure, and programmatic access from regions where the organization has no presence.

Response Procedure

Step 1 is to scope the compromise. Identify the compromised principal (user, role, service account). Query audit logs for all actions taken by this principal in the relevant time window. Determine what resources were accessed, created, modified, or deleted. Check for persistence mechanisms: new IAM users, access keys, roles, or policies created by the compromised principal.

Step 2 is to contain. Disable or delete compromised access keys (do not just deactivate since attackers can reactivate). Revoke all active sessions (AWS: attach a deny-all inline policy with a date condition; Azure: revoke Entra ID sessions; GCP: disable the service account). If an IAM role was compromised, update the trust policy to block the attacker’s assumed role path. Block the attacker’s source IP addresses at the network level (security groups, NACLs, NSGs).

Step 3 is to eradicate. Remove any persistence mechanisms created by the attacker (backdoor users, roles, policies, Lambda functions, scheduled tasks). Rotate all credentials that the compromised principal had access to (database passwords, API keys, secrets in Secrets Manager/Key Vault). Review and revert any unauthorized configuration changes.

Step 4 is to recover. Re-enable legitimate access with new credentials. Verify that containment actions did not break legitimate workloads. Monitor the compromised principal and related resources for continued suspicious activity.

AWS-Specific IAM Response

# Identify all access keys for a user
aws iam list-access-keys --user-name <username>

# Deactivate an access key
aws iam update-access-key --access-key-id <key-id> --status Inactive --user-name <username>

# Attach deny-all policy to revoke active sessions
# This denies all actions for sessions issued before the current time
aws iam put-user-policy --user-name <username> --policy-name DenyAll \
  --policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*","Condition":{"DateLessThan":{"aws:TokenIssueTime":"2026-02-02T00:00:00Z"}}}]}'

Shared Responsibility During Incidents

What You Can Expect from Your Cloud Provider

AWS Customer Incident Response Team (CIRT) can assist with confirmed security events. Contact through AWS Support (Business or Enterprise tier). AWS Artifact provides compliance reports but not incident-specific forensics.

For Azure, Microsoft Security Response Center (MSRC) handles vulnerabilities in Azure services. For customer incidents, use Microsoft Unified Support. Defender for Cloud provides managed detection.

For GCP, Google Cloud Incident Response team can assist for confirmed incidents. Contact through Cloud Support. Mandiant (Google-owned) provides additional IR services.

What Your Cloud Provider Will Not Do

They will not provide hypervisor-level or physical infrastructure forensics. They will not investigate incidents in your workloads on your behalf (unless you engage their professional services). They will not take containment actions in your accounts without your authorization (except for abuse cases). They will not extend log retention retroactively; if you did not configure retention before the incident, the data may be gone.

Cloud-Specific Playbooks

Playbook: Exposed S3 Bucket / Storage Account / GCS Bucket

First identify the exposed resource and the data it contains. Restrict public access immediately (bucket policy, ACL, or account-level block). Review access logs to determine if unauthorized parties accessed the data. If sensitive data was accessed, initiate breach notification procedures. Identify how the exposure occurred (misconfiguration, policy change, IaC error). Scan all storage resources across the organization for similar exposures. Implement preventive controls (S3 Block Public Access at the organization level, Azure Policy, GCP Organization Policies).

Playbook: Cryptocurrency Mining

First identify the affected resources (EC2 instances, containers, Lambda functions). Capture evidence: instance metadata, running processes, network connections, audit logs. Terminate the mining workloads. Investigate initial access to understand how the attacker deployed mining software (compromised credentials, vulnerable application, misconfigured service). Contain the initial access vector. Review billing for the affected period and request a billing adjustment from the CSP if the charges resulted from unauthorized access. Implement cost anomaly alerts and compute quotas to limit blast radius of future incidents.

Playbook: Compromised CI/CD Pipeline

First identify the compromised pipeline component (source repo, build server, artifact registry, deployment credentials). Halt all deployments from the affected pipeline. Review pipeline execution logs for unauthorized modifications. Audit all artifacts produced by the pipeline during the compromise window. Rotate all secrets and credentials used by the pipeline. Review deployed workloads for malicious changes (backdoors, modified dependencies). Rebuild the pipeline with hardened configurations (pinned dependencies, signed commits, restricted IAM roles).

Building Cloud IR Readiness

Pre-Incident Preparation Checklist

  • Centralized logging enabled across all accounts/subscriptions/projects with at least 1-year retention
  • CloudTrail organization trail / Azure Activity Log diagnostic settings / GCP organization audit logs configured
  • IR team has cross-account access through a dedicated security account with break-glass credentials
  • Cloud-specific IR playbooks documented and tested
  • Evidence collection automation deployed (auto-snapshot volumes, export container state on alert)
  • SOAR integrations configured for cloud provider APIs
  • Cost anomaly alerts configured to detect unauthorized resource usage
  • Regular IR tabletop exercises include cloud-specific scenarios

IR Tooling for Cloud

ToolPurposeCSP Support
AWS IR (service)Automated evidence collection and triage in AWSAWS
Cado SecurityCloud-native forensics and evidence captureAWS, Azure, GCP
WizCloud security posture and investigationAWS, Azure, GCP
Sysdig SecureContainer and cloud runtime securityAWS, Azure, GCP
ProwlerOpen-source AWS/Azure/GCP security assessmentAWS, Azure, GCP
SteampipeSQL-based querying of cloud APIs for investigationAWS, Azure, GCP

Training Your IR Team for Cloud

Traditional IR skills (disk forensics, memory analysis, network forensics) remain valuable but are insufficient for cloud environments. Your IR team needs fluency in at least one major CSP’s console, CLI, and API. They need understanding of IAM models (AWS IAM, Azure RBAC, GCP IAM) and how credentials work in cloud environments. Familiarity with cloud-native logging and monitoring services is essential. Experience with infrastructure as code (Terraform, CloudFormation) helps understand how environments are configured. Container and Kubernetes investigation skills are increasingly important. Cloud-specific certifications like AWS Security Specialty, AZ-500, and Google Professional Cloud Security Engineer demonstrate competency.