Threat hunting is the proactive, iterative search for adversaries that have evaded existing security controls. Unlike reactive alert-driven investigation, hunting assumes the adversary is already present and seeks to find evidence of their activity. CrowdStrike’s 2025 reports documented a sharp rise in cloud intrusions, with China-linked adversaries responsible for a significant share of the increased activity, reinforcing that automated detection alone is insufficient.
Why Threat Hunting Matters
Traditional security operations rely on alerts generated by detection rules. This approach has fundamental limitations.
Alert fatigue overwhelms SOC teams with false positives, with the average SOC processing 11,000+ alerts per day. Detection gaps allow novel techniques and living-off-the-land tactics to bypass signature and rule-based detection. Dwell time means attackers operate undetected for a median of 10 days (CrowdStrike 2025 Global Threat Report) before detection. A reactive posture puts defenders in a position of always playing catch-up against adversaries who innovate faster than detection rules update. Cloud blind spots emerge because traditional hunting focused on endpoints and network while cloud control plane and identity attacks require new approaches.
Threat hunting addresses these limitations by proactively searching for indicators of compromise and adversary behaviors that automated systems miss.
Hunting Process
1. Hypothesis Generation
Every hunt begins with a hypothesis about adversary behavior. Strong hypotheses are specific, testable, and based on evidence or intelligence.
Sources for hypotheses include threat intelligence (reports on active campaigns, actor TTPs, and industry-specific targeting), MITRE ATT&CK v18 (technique coverage gaps in your detection rules since ATT&CK v18 from late 2025 expanded coverage for Kubernetes, CI/CD pipelines, cloud databases, and supply chain attacks), environmental knowledge (understanding of your crown jewels, attack surface, and architectural weaknesses), anomaly analysis (unusual patterns in baseline data such as new processes, unexpected network connections, or authentication anomalies), incident lessons (post-incident findings from your organization or peers that indicate techniques not currently detected), and AI agent activity (MITRE ATLAS from October 2025 added 14 new attack techniques targeting AI agents and generative AI systems).
Example hypotheses:
- “An adversary is using valid cloud credentials obtained through phishing to enumerate our S3 buckets from an unfamiliar IP range”
- “Attackers are persisting in our Kubernetes environment through backdoored container images in our private registry”
- “A compromised service account is being used for lateral movement between our production AWS accounts”
2. Tool and Data Selection
Identify the data sources and tools needed to test your hypothesis. The quality of your hunt is limited by the quality and completeness of your telemetry.
Data sources:
| Domain | Data Sources | Key Fields |
|---|---|---|
| Endpoint | EDR telemetry, Sysmon, Windows Event Logs | Process creation, file modification, registry changes, network connections |
| Network | NetFlow, Zeek, DNS logs, proxy logs | Source/dest IP, ports, protocols, domains, bytes transferred |
| Identity | Active Directory, Entra ID, Okta, cloud IAM logs | Authentication events, role changes, MFA status, session tokens |
| Cloud | CloudTrail, Azure Activity Log, GCP Audit Logs | API calls, resource modifications, IAM changes, data access |
| Application | Web server logs, database audit logs, API gateway logs | Request patterns, authentication failures, query anomalies |
| Mail flow logs, phishing detection platforms | Sender analysis, URL/attachment indicators, delivery status |
Tools include SIEM platforms (Splunk, Microsoft Sentinel, Google Chronicle, Elastic Security) for log analysis and correlation, EDR/XDR (CrowdStrike Falcon, SentinelOne, Microsoft Defender) for endpoint investigation, network analysis tools (Zeek, Wireshark, Stratoshark from Sysdig 2025) for cloud-native network analysis, threat intelligence platforms (MISP, OpenCTI, Recorded Future, Mandiant Advantage), and AI-assisted hunting tools (Charlotte AI, Purple AI, Copilot for Security) for natural language querying.
3. Investigation
Execute your hunt using appropriate techniques.
Query-based hunting searches for specific indicators or behavioral patterns:
# Example: Hunt for suspicious cloud API calls from unusual locations
index=cloudtrail eventName=GetSecretValue OR eventName=GetParameter
| iplocation sourceIPAddress
| stats count by userIdentity.arn, sourceIPAddress, Country, City
| where Country != "United States"
| sort -count
Stack counting identifies outliers by counting occurrences since rare events in high-volume data often indicate adversary activity:
# Find rare parent-child process relationships
index=edr
| stats count by parent_process_name, process_name
| where count < 5
| sort count
Frequency analysis detects beaconing and periodic communication patterns:
# Identify potential C2 beaconing by connection regularity
index=proxy
| timechart span=1m count by dest_domain
| foreach * [eval jitter_<<FIELD>> = abs(<<FIELD>> - avg(<<FIELD>>))]
Visualization graphs relationships to identify suspicious patterns. Network connection graphs identify unusual communication paths. Process trees map execution chains from initial access to objectives. Authentication flow graphs detect lateral movement patterns. Cloud resource access graphs identify privilege escalation paths.
4. Documentation
Document findings regardless of whether you find malicious activity. A hunt that finds nothing is still valuable because it either confirms the hypothesis is false or reveals visibility gaps.
Hunt report template:
- Hypothesis: What adversary behavior were you looking for?
- Data sources used: What telemetry was queried?
- Queries executed: Exact queries for reproducibility
- Results and conclusions: What was found? Was the hypothesis confirmed or refuted?
- Visibility gaps: What data was missing or insufficient?
- Recommendations: New detections, logging improvements, or configuration changes
Hunting Use Cases
Persistence Mechanisms
Hunt for unauthorized persistence since adversaries establish multiple persistence methods to survive remediation.
Endpoint persistence indicators include scheduled tasks created by unusual parent processes (not svchost.exe or schtasks.exe launched by administrators), services with suspicious executable paths (temp directories, user profiles, unusual locations), registry run keys modified by non-standard processes, WMI event subscriptions (a favorite of APT groups), and startup folder additions and modifications.
Cloud persistence indicators include IAM users or access keys created outside normal provisioning workflows, Lambda/Cloud Functions created or modified by unexpected principals, backdoor roles with trust policies allowing cross-account access, OAuth applications with excessive permissions added to identity providers, and container images modified in private registries.
Lateral Movement
Detect movement between systems since adversaries rarely achieve their objectives from the initial foothold.
Traditional lateral movement indicators include remote service creation (PsExec patterns with service EXE in Windows\Temp), WMI remote process execution (wmiprvse.exe spawning unexpected children), Windows Remote Management / PowerShell remoting from unexpected sources, RDP connections from servers or non-standard workstations, and pass-the-hash and pass-the-ticket indicators (NTLM type 3 logons from unexpected sources).
Cloud lateral movement indicators include role assumption chains (AssumeRole calls creating unexpected cross-account access), service account token usage from unexpected IP addresses, Kubernetes pod-to-pod lateral movement via service account tokens, and cross-cloud credential usage (AWS credentials used from GCP, or vice versa).
Data Exfiltration
Identify data theft before it’s complete.
Look for large outbound transfers to uncommon destinations (baseline normal egress first), cloud storage downloads from new or unexpected IP addresses, DNS tunneling patterns (high-entropy subdomains, excessive DNS query volume), encrypted channels to recently registered domains, cloud storage replication to external accounts, and Git repository cloning of sensitive repositories by service accounts.
Identity-Based Hunting
Identity attacks are the fastest-growing threat vector, so hunt for credential abuse and identity manipulation.
Watch for impossible travel (authentication from geographically distant locations within impossible timeframes), MFA fatigue (repeated MFA push notifications followed by approval), token theft (session cookies used from different IP addresses or user agents), privilege escalation (IAM policy changes, role assignments, or group membership modifications outside change management), service account abuse (service accounts authenticating interactively or from unexpected systems), and password spray (low-volume authentication failures distributed across many accounts).
Building a Hunting Program
Team Structure
Three models exist, each with trade-offs:
| Model | Pros | Cons |
|---|---|---|
| Dedicated team | Deep expertise, consistent output | Expensive, limited headcount |
| Rotation | Broader coverage, skill development for all analysts | Inconsistent quality, training overhead |
| Hybrid | Core expertise + broad participation | Requires strong leadership to coordinate |
For most organizations, the hybrid model works best: 1-3 dedicated hunters supported by SOC analysts who rotate through hunting assignments quarterly.
AI-Augmented Hunting
AI investigation assistants are transforming threat hunting in 2025-2026.
Natural language querying through Charlotte AI, Purple AI, and Copilot for Security allows hunters to ask questions in plain English rather than writing complex queries. Automated hypothesis generation lets AI suggest hunt hypotheses based on threat intelligence, environmental context, and detection gap analysis. Pattern recognition through AI excels at identifying subtle statistical anomalies across large datasets that human analysts miss. Investigation acceleration allows AI to summarize large result sets, correlate findings across data sources, and draft hunt reports.
There are important limitations to understand. AI assistants can hallucinate findings, so always verify AI-generated analysis against raw data. AI lacks the contextual understanding of your environment that experienced hunters have. Adversaries are beginning to use AI to generate more sophisticated evasion techniques. AI tools work best as force multipliers for skilled hunters, not replacements.
Measuring Success
Track hunting program metrics to justify investment and improve quality:
| Metric | Description | What It Tells You |
|---|---|---|
| Hunts completed per period | Volume of hunting activity | Program activity level |
| Threats discovered | Confirmed malicious activity found | Direct program value |
| Detection rules created | Hunts converted to automated detections | Lasting security improvements |
| Coverage improvements | MITRE ATT&CK techniques with detection coverage gained | Reduced detection gaps |
| Visibility gaps identified | Missing or insufficient data sources discovered | Logging investment priorities |
| Mean time to hunt | Average duration from hypothesis to conclusion | Operational efficiency |
| False positive reduction | Detection rules refined through hunting insights | SOC efficiency improvement |
Operationalizing Findings
Turn hunting discoveries into lasting improvements.
Create detection rules for discovered techniques so that every successful hunt produces at least one new detection. Improve logging coverage for blind spots identified during hunts. Update incident response playbooks with new attack patterns. Share intelligence with peers, ISACs, and the community (Sigma rules, YARA rules, blog posts). Feed findings into threat modeling for development teams.
Tools of the Trade
Commercial
CrowdStrike Falcon OverWatch provides managed hunting with Charlotte AI, covering endpoint, cloud, and identity. Microsoft Defender Threat Hunting offers KQL-based hunting with Copilot for Security natural language support. SentinelOne Singularity uses Purple AI-powered hunting across the Singularity Data Lake. Elastic Security provides open detection rules, EQL hunting, and ML-powered anomaly detection. Google Chronicle offers YARA-L hunting with Gemini AI at Google-scale data volumes.
Open Source
Sigma rules provide a vendor-agnostic detection format convertible to any SIEM query language. YARA rules enable pattern matching for file and memory analysis. Atomic Red Team provides technique-level adversary simulation for validating hunts and detections. HELK (Hunting ELK) is an Elasticsearch-based hunting platform with Jupyter notebooks. Velociraptor is an endpoint visibility and forensic collection tool. Falco provides runtime security for Kubernetes and cloud workloads (eBPF-based). Stratoshark from Sysdig (2025) is essentially Wireshark for cloud, offering packet-level cloud traffic analysis.
Getting Started
- Assess your data by inventorying available telemetry and identifying gaps, since you can’t hunt in data you don’t collect
- Start with known-bad by hunting for publicly documented threat actor TTPs relevant to your industry
- Build foundational queries by developing reusable queries for common techniques (persistence, lateral movement, credential access)
- Map to ATT&CK to track which techniques you can hunt for and which have gaps
- Hunt regularly by scheduling hunts even when there’s no indication of compromise, since consistency builds skill and coverage
- Operationalize everything by converting every finding into a detection rule, logging improvement, or configuration change