Cloud incident response is fundamentally different from on-premises IR. You cannot walk into a data center, pull a hard drive, and image it. Volatile evidence disappears when instances are terminated. Logs are spread across dozens of services, each with its own retention policy. And the shared responsibility model means your cloud provider controls the infrastructure layer while you are responsible for everything above it.
Despite these differences, many organizations still use on-prem IR playbooks for cloud incidents. This guide covers the unique challenges of cloud IR and provides practical guidance for evidence collection, investigation, and response across AWS, Azure, and GCP.
How Cloud IR Differs from On-Prem
Ephemeral Infrastructure
Cloud workloads are designed to be disposable. Auto-scaling groups terminate instances automatically. Containers may run for seconds. Serverless functions leave no persistent compute to investigate. If you do not capture evidence before the workload disappears, it is gone.
This means evidence preservation must be automated and proactive. You cannot rely on post-incident manual collection.
Distributed Logging
On-prem environments typically centralize logs in a SIEM. Cloud environments generate logs across dozens of services, each with different formats, retention defaults, and access patterns.
Before an incident occurs, you must know where every relevant log lives, how long it is retained, and how to query it efficiently.
API-Driven Response
Cloud containment and remediation happens through API calls, not through console access or physical intervention. Disabling an IAM user, revoking sessions, modifying security groups, and snapshotting volumes are all API operations.
Your IR team needs cloud API fluency, and your SOAR playbooks need cloud-specific integrations.
Shared Responsibility
During an incident, the cloud provider is responsible for the security of the cloud (physical infrastructure, hypervisor, managed service internals), while you are responsible for security in the cloud (your configurations, data, identity, and workloads).
You cannot ask AWS to provide hypervisor-level forensics. You can request cloud provider support for confirmed compromises through their abuse or security response programs, but the investigation is primarily your responsibility.
Multi-Account and Multi-Region Complexity
Enterprise cloud deployments span dozens or hundreds of accounts, subscriptions, or projects across multiple regions. An attacker who compromises one account may pivot to others.
Your IR capability must have cross-account visibility and access. A single-account IR plan is insufficient.
Evidence Collection by Cloud Provider
AWS Evidence Sources
CloudTrail
CloudTrail is the single most important evidence source in AWS. It records API calls made to AWS services.
Management events are enabled by default in all accounts and record control plane operations (creating EC2 instances, modifying IAM policies, changing S3 bucket configurations). Data events must be explicitly enabled and record data plane operations (S3 GetObject, Lambda invocations, DynamoDB reads). Retention is 90 days in the CloudTrail console by default; for IR purposes, send CloudTrail logs to S3 and retain for at least 1 year. Key fields include eventName, sourceIPAddress, userIdentity, requestParameters, and responseElements. Enable an organization-wide trail in your management account to capture events across all member accounts.
CloudTrail tells you who did what, when, and from where. Every cloud IR investigation starts here.
VPC Flow Logs
Network-level evidence showing traffic flows between resources. Enable on all VPCs, not just production. Capture accepted and rejected traffic. Send to S3 or CloudWatch Logs for retention. These are useful for identifying lateral movement, data exfiltration, and C2 communication.
GuardDuty Findings
AWS GuardDuty provides managed threat detection across CloudTrail, VPC Flow Logs, DNS logs, and EKS audit logs. GuardDuty findings are often the first indicator of compromise in AWS. Finding types include unauthorized API calls, cryptocurrency mining, IAM credential exfiltration, and malicious IP communication. Severity levels (1-8) help prioritize investigation.
Additional AWS Evidence Sources
| Source | What It Captures | IR Relevance |
|---|---|---|
| S3 access logs | Object-level access to S3 buckets | Data exfiltration investigation |
| CloudWatch Logs | Application and system logs | Workload-level investigation |
| AWS Config | Configuration change history | Identifying misconfigurations and unauthorized changes |
| Route 53 DNS logs | DNS queries | C2 communication, DNS exfiltration |
| EKS audit logs | Kubernetes API server events | Container orchestration compromise |
| IAM Access Analyzer | External access to resources | Identifying exposed resources |
Azure Evidence Sources
Azure Activity Log
The Azure equivalent of CloudTrail, capturing control plane operations. Retained for 90 days by default. Send to a Log Analytics workspace or storage account for longer retention. Captures resource creation, modification, deletion, and RBAC changes.
Entra ID (Azure AD) Sign-In and Audit Logs
Critical for identity-based investigations in Azure. Sign-in logs capture authentication events including IP address, location, device, conditional access policy evaluation, and MFA status. Audit logs capture directory changes including user creation, group membership changes, application registrations, and role assignments. Retention is 30 days by default (7 days for free tier), so send to Log Analytics for longer retention. Entra ID Protection flags suspicious authentication events as risky sign-ins.
Microsoft Defender for Cloud
Managed threat detection across Azure workloads. Provides security alerts for VMs, containers, storage, databases, and identity. Integrates with Microsoft Sentinel for SIEM correlation. Adaptive application controls detect anomalous process execution.
Additional Azure Evidence Sources
| Source | What It Captures | IR Relevance |
|---|---|---|
| NSG Flow Logs | Network traffic flows | Lateral movement, exfiltration |
| Azure DNS Analytics | DNS query logs | C2 detection |
| Key Vault logs | Secret and certificate access | Credential theft investigation |
| Storage Analytics | Blob and file access | Data exfiltration |
| AKS audit logs | Kubernetes API events | Container compromise |
GCP Evidence Sources
Cloud Audit Logs
GCP’s equivalent of CloudTrail, with four log types. Admin Activity logs are always enabled, retained for 400 days, and record resource creation, modification, and IAM policy changes. Data Access logs must be enabled per service and record data reads, writes, and permission checks (can generate high volume). System Event logs are always enabled and record Google-initiated system events. Policy Denied logs are always enabled and record access attempts denied by security policies.
VPC Flow Logs
Must be enabled per subnet. Captures 5-tuple flow records (source/dest IP, ports, protocol). Configurable sampling rate and aggregation interval.
Security Command Center
GCP’s managed threat detection platform. Provides findings for misconfigurations, vulnerabilities, and active threats. Event Threat Detection analyzes audit logs for suspicious activity. Container Threat Detection monitors GKE workloads.
Additional GCP Evidence Sources
| Source | What It Captures | IR Relevance |
|---|---|---|
| Cloud DNS logs | DNS queries | C2 detection |
| Load Balancer logs | HTTP(S) request logs | Web application attacks |
| GKE audit logs | Kubernetes API events | Container compromise |
| Access Transparency logs | Google staff access to your data | Insider threat from provider |
| Cloud Storage access logs | Object access | Data exfiltration |
Container Forensics
Containers present unique forensic challenges: they are ephemeral, share the host kernel, and may leave no persistent artifacts after termination.
Capturing Container Evidence
For running containers, first pause the container (do not stop it since stopping destroys volatile state). Export the container filesystem with docker export <container_id> > container.tar. Capture the container’s process list, network connections, and environment variables. Copy relevant log files from the container. If using Kubernetes, capture the pod spec, events, and logs with kubectl logs <pod> --all-containers.
For terminated containers, if the container image is still available, pull and inspect it for malicious layers. Check the container runtime logs (containerd, CRI-O) on the node. Check the orchestrator logs (Kubernetes audit logs, ECS task logs). If the node still exists, examine /var/lib/docker or /var/lib/containerd for residual data.
Container-Specific Attack Patterns
| Attack Pattern | Evidence Sources | Response |
|---|---|---|
| Container escape | Host system calls, kernel logs, container runtime logs | Isolate the node, investigate host-level compromise |
| Malicious image | Image layer analysis, Dockerfile, registry audit logs | Remove the image, scan registry, investigate supply chain |
| Kubernetes API abuse | K8s audit logs, RBAC configuration | Revoke compromised service account, audit RBAC permissions |
| Crypto mining | CPU utilization metrics, network connections to mining pools | Kill the pod, investigate initial access |
| Secrets exfiltration | K8s audit logs (Secret reads), container env vars | Rotate all secrets, audit Secret access policies |
IAM Compromise Response
IAM credential compromise is the most common cloud incident type. The response pattern is consistent across providers.
Indicators of IAM Compromise
Signs include API calls from unusual IP addresses or geographic locations, API calls at unusual times, actions inconsistent with the principal’s normal behavior (for example, a developer creating IAM users or accessing production databases), GuardDuty / Defender / SCC alerts for unauthorized API usage, access key usage after an employee’s departure, and programmatic access from regions where the organization has no presence.
Response Procedure
Step 1 is to scope the compromise. Identify the compromised principal (user, role, service account). Query audit logs for all actions taken by this principal in the relevant time window. Determine what resources were accessed, created, modified, or deleted. Check for persistence mechanisms: new IAM users, access keys, roles, or policies created by the compromised principal.
Step 2 is to contain. Disable or delete compromised access keys (do not just deactivate since attackers can reactivate). Revoke all active sessions (AWS: attach a deny-all inline policy with a date condition; Azure: revoke Entra ID sessions; GCP: disable the service account). If an IAM role was compromised, update the trust policy to block the attacker’s assumed role path. Block the attacker’s source IP addresses at the network level (security groups, NACLs, NSGs).
Step 3 is to eradicate. Remove any persistence mechanisms created by the attacker (backdoor users, roles, policies, Lambda functions, scheduled tasks). Rotate all credentials that the compromised principal had access to (database passwords, API keys, secrets in Secrets Manager/Key Vault). Review and revert any unauthorized configuration changes.
Step 4 is to recover. Re-enable legitimate access with new credentials. Verify that containment actions did not break legitimate workloads. Monitor the compromised principal and related resources for continued suspicious activity.
AWS-Specific IAM Response
# Identify all access keys for a user
aws iam list-access-keys --user-name <username>
# Deactivate an access key
aws iam update-access-key --access-key-id <key-id> --status Inactive --user-name <username>
# Attach deny-all policy to revoke active sessions
# This denies all actions for sessions issued before the current time
aws iam put-user-policy --user-name <username> --policy-name DenyAll \
--policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*","Condition":{"DateLessThan":{"aws:TokenIssueTime":"2026-02-02T00:00:00Z"}}}]}'
Shared Responsibility During Incidents
What You Can Expect from Your Cloud Provider
AWS Customer Incident Response Team (CIRT) can assist with confirmed security events. Contact through AWS Support (Business or Enterprise tier). AWS Artifact provides compliance reports but not incident-specific forensics.
For Azure, Microsoft Security Response Center (MSRC) handles vulnerabilities in Azure services. For customer incidents, use Microsoft Unified Support. Defender for Cloud provides managed detection.
For GCP, Google Cloud Incident Response team can assist for confirmed incidents. Contact through Cloud Support. Mandiant (Google-owned) provides additional IR services.
What Your Cloud Provider Will Not Do
They will not provide hypervisor-level or physical infrastructure forensics. They will not investigate incidents in your workloads on your behalf (unless you engage their professional services). They will not take containment actions in your accounts without your authorization (except for abuse cases). They will not extend log retention retroactively; if you did not configure retention before the incident, the data may be gone.
Cloud-Specific Playbooks
Playbook: Exposed S3 Bucket / Storage Account / GCS Bucket
First identify the exposed resource and the data it contains. Restrict public access immediately (bucket policy, ACL, or account-level block). Review access logs to determine if unauthorized parties accessed the data. If sensitive data was accessed, initiate breach notification procedures. Identify how the exposure occurred (misconfiguration, policy change, IaC error). Scan all storage resources across the organization for similar exposures. Implement preventive controls (S3 Block Public Access at the organization level, Azure Policy, GCP Organization Policies).
Playbook: Cryptocurrency Mining
First identify the affected resources (EC2 instances, containers, Lambda functions). Capture evidence: instance metadata, running processes, network connections, audit logs. Terminate the mining workloads. Investigate initial access to understand how the attacker deployed mining software (compromised credentials, vulnerable application, misconfigured service). Contain the initial access vector. Review billing for the affected period and request a billing adjustment from the CSP if the charges resulted from unauthorized access. Implement cost anomaly alerts and compute quotas to limit blast radius of future incidents.
Playbook: Compromised CI/CD Pipeline
First identify the compromised pipeline component (source repo, build server, artifact registry, deployment credentials). Halt all deployments from the affected pipeline. Review pipeline execution logs for unauthorized modifications. Audit all artifacts produced by the pipeline during the compromise window. Rotate all secrets and credentials used by the pipeline. Review deployed workloads for malicious changes (backdoors, modified dependencies). Rebuild the pipeline with hardened configurations (pinned dependencies, signed commits, restricted IAM roles).
Building Cloud IR Readiness
Pre-Incident Preparation Checklist
- Centralized logging enabled across all accounts/subscriptions/projects with at least 1-year retention
- CloudTrail organization trail / Azure Activity Log diagnostic settings / GCP organization audit logs configured
- IR team has cross-account access through a dedicated security account with break-glass credentials
- Cloud-specific IR playbooks documented and tested
- Evidence collection automation deployed (auto-snapshot volumes, export container state on alert)
- SOAR integrations configured for cloud provider APIs
- Cost anomaly alerts configured to detect unauthorized resource usage
- Regular IR tabletop exercises include cloud-specific scenarios
IR Tooling for Cloud
| Tool | Purpose | CSP Support |
|---|---|---|
| AWS IR (service) | Automated evidence collection and triage in AWS | AWS |
| Cado Security | Cloud-native forensics and evidence capture | AWS, Azure, GCP |
| Wiz | Cloud security posture and investigation | AWS, Azure, GCP |
| Sysdig Secure | Container and cloud runtime security | AWS, Azure, GCP |
| Prowler | Open-source AWS/Azure/GCP security assessment | AWS, Azure, GCP |
| Steampipe | SQL-based querying of cloud APIs for investigation | AWS, Azure, GCP |
Training Your IR Team for Cloud
Traditional IR skills (disk forensics, memory analysis, network forensics) remain valuable but are insufficient for cloud environments. Your IR team needs fluency in at least one major CSP’s console, CLI, and API. They need understanding of IAM models (AWS IAM, Azure RBAC, GCP IAM) and how credentials work in cloud environments. Familiarity with cloud-native logging and monitoring services is essential. Experience with infrastructure as code (Terraform, CloudFormation) helps understand how environments are configured. Container and Kubernetes investigation skills are increasingly important. Cloud-specific certifications like AWS Security Specialty, AZ-500, and Google Professional Cloud Security Engineer demonstrate competency.