Building a Security Operations Center (SOC) is one of the most impactful and expensive security investments an organization can make. A 24x7 in-house SOC costs $2-7 million per year in staffing, technology, and facilities, while managed detection and response (MDR) services offer comparable outcomes starting at around $50K/year for 100 endpoints. The WEF 2025 Global Cybersecurity Outlook found that 67% of organizations report a moderate-to-critical cybersecurity skill gap, making the build-vs-buy decision more consequential than ever.
This guide covers SOC design, technology stack, staffing, and the shift toward AI-augmented operations that is redefining what a SOC looks like in 2026.
SOC Models
In-House SOC
A fully staffed, organization-owned security operations function makes sense for large enterprises with complex, heterogeneous environments, for organizations with strict regulatory requirements demanding internal control (defense, finance, healthcare), when there’s enough scale to justify the investment (typically 5,000+ endpoints), and when the organization-specific threat landscape requires deep institutional knowledge.
The cost structure includes 24x7 coverage requiring minimum 8-12 analysts (3 shifts plus coverage for PTO and turnover), SIEM, EDR/XDR, SOAR, and threat intelligence platform licensing, plus facility costs, training, and continuous professional development. Total: $2-7M/year depending on location, scale, and tooling.
MDR (Managed Detection and Response)
An external provider handles monitoring, detection, and response. This works well for mid-market organizations (500-5,000 endpoints), when there’s limited security hiring capacity, when you need 24x7 coverage without building a full team, or when there’s willingness to trust an external party with detection and containment decisions.
The cost structure runs around $50K/year for 100 endpoints and approximately $500K/year for 10,000 endpoints. This typically includes EDR/XDR technology, 24x7 monitoring, and response. Gartner estimated that 50% of organizations use MDR as of 2025.
Hybrid SOC
An internal security team augmented by MDR for off-hours coverage, specialized capabilities, or overflow. This makes sense when an organization wants internal control during business hours with 24x7 external coverage, when a core team handles tuning, threat hunting, and escalation while MDR handles alert triage and initial response, or when budget supports a small internal team but not full 24x7 staffing.
This is the most common model for organizations with 2-5 internal security staff.
Technology Stack
Core Components
| Component | Function | Examples |
|---|---|---|
| SIEM | Log aggregation, correlation, detection | Microsoft Sentinel, Splunk, Google Chronicle, Elastic Security |
| EDR/XDR | Endpoint/cross-domain detection and response | CrowdStrike Falcon, SentinelOne, Microsoft Defender, Cortex XDR |
| SOAR | Automation, orchestration, case management | Palo Alto XSOAR, Splunk SOAR, Tines, Torq |
| Threat Intelligence | IOCs, TTPs, campaign tracking | Recorded Future, Mandiant Advantage, MISP, OpenCTI |
| Ticketing | Incident tracking, workflow, metrics | ServiceNow SecOps, Jira, PagerDuty |
SIEM Selection Criteria
The SIEM is the SOC’s central nervous system. Key decision factors include data ingestion cost model (per-GB like traditional Splunk, per-endpoint, workload-based like the new Splunk/Cisco model, or fixed-price like Google Chronicle), detection engineering (quality of built-in rules, custom rule flexibility, MITRE ATT&CK mapping), investigation workflow (query language, dashboard quality, case management integration), AI capabilities (natural language investigation, automated triage, suggested remediation), and integration (data source connectors, SOAR integration, threat intelligence feeds).
SOAR and Automation
SOAR platforms are evolving from script-heavy playbook engines to AI-driven orchestration.
Generation 1 (mid-2010s) featured rule-based automation with rigid playbooks. Generation 2 (2020-2023) added GenAI-augmented playbooks with natural language interfaces. Generation 3 (2024+) brings agentic AI platforms where AI agents autonomously investigate, triage, and take containment actions.
Leading agentic SOC platforms include Exaforce, Dropzone AI, Radiant Security, and D3 Security Morpheus. These platforms can investigate alerts end-to-end, correlate data across sources, and recommend or execute containment, reducing the analyst role from alert triage to automation oversight and exception handling.
Data Sources
A SOC is only as effective as its telemetry. Prioritize these data sources:
Tier 1 (Essential) includes endpoint telemetry (EDR/XDR), authentication logs (Active Directory, Entra ID, Okta, cloud IAM), email security alerts, and cloud audit logs (CloudTrail, Azure Activity Log, GCP Audit Logs).
Tier 2 (Important) includes network flow data (NetFlow, VPC Flow Logs), DNS logs, web proxy / CASB logs, and firewall logs.
Tier 3 (Valuable) includes application security logs (WAF, API gateway), DLP alerts, physical security (badge access), and vulnerability scanner findings.
SOC Staffing
Roles and Responsibilities
| Role | Responsibility | Experience Level |
|---|---|---|
| SOC Analyst (L1) | Alert triage, initial classification, escalation | 0-2 years |
| SOC Analyst (L2) | Deep investigation, containment, forensics | 2-5 years |
| SOC Analyst (L3) | Advanced analysis, malware reverse engineering, threat hunting | 5+ years |
| Detection Engineer | Write and tune detection rules, reduce false positives | 3-5 years |
| Automation Engineer | Build and maintain SOAR playbooks and integrations | 3-5 years |
| SOC Manager | Operations management, metrics, reporting, hiring | 7+ years |
| Threat Hunter | Proactive hypothesis-driven hunting | 5+ years |
AI Impact on Staffing
AI copilots and agentic platforms are reshaping SOC staffing in 2026. L1 triage is automatable, with AI agents handling alert triage and reducing the need for dedicated L1 analysts. L2 analysts become AI supervisors, reviewing and validating AI-generated investigations rather than building them from scratch. Detection engineering grows, with demand for engineers who can write detection logic, tune AI models, and manage automation. The net effect is that SOC teams are getting smaller but more skilled, as junior alert-triage positions decline while senior engineering roles expand.
A Singaporean healthcare organization reported a 70% reduction in response times after deploying SOC AI copilots, while maintaining the same team size.
Shift Schedules
24x7 coverage options:
| Model | Staff Required | Pros | Cons |
|---|---|---|---|
| 4x12 (four 12-hour shifts) | 8-10 analysts | Fewer handoffs, simpler scheduling | Long shifts cause fatigue |
| Follow-the-sun (geographically distributed) | 6-9 analysts | Normal working hours for everyone | Coordination complexity |
| Hybrid (MDR off-hours) | 3-5 analysts + MDR | Most cost-effective 24x7 coverage | Dependency on external provider |
Detection Engineering
Detection-as-Code
Treat detection rules as software: version controlled, tested, and reviewed. Store detection rules in Git alongside infrastructure code. Use Sigma format for vendor-agnostic rule writing. Implement CI/CD for detections by linting, testing against sample data, and deploying to SIEM. Map every detection to MITRE ATT&CK techniques for coverage tracking. Measure and track detection coverage against the ATT&CK matrix to identify gaps.
Tuning and False Positive Management
False positives are the primary driver of SOC inefficiency and analyst burnout. Track false positive rate per detection rule. Rules exceeding 80% false positive rate should be tuned or disabled. Use allowlists judiciously since every allowlist entry is a potential detection blind spot. Review tuning decisions quarterly because environmental changes may invalidate previous tuning. AI-based noise reduction tools like Semgrep and Fortify Aviator can reduce SAST false positives by up to 98%.
SOC Metrics
Operational Metrics
| Metric | Description | Target |
|---|---|---|
| MTTD (Mean Time to Detect) | Time from compromise to detection | Under 24 hours |
| MTTR (Mean Time to Respond) | Time from detection to containment | Under 1 hour (critical) |
| Alert volume | Total alerts per day | Trending down through tuning |
| False positive rate | Percentage of alerts that are benign | Under 30% |
| Escalation rate | Percentage of alerts requiring L2+ investigation | 10-20% |
| SLA compliance | Percentage of incidents resolved within SLA | Over 95% |
Maturity Metrics
| Metric | Description | Indicates |
|---|---|---|
| ATT&CK coverage | Percentage of ATT&CK techniques with detection rules | Detection breadth |
| Automation rate | Percentage of alerts handled without human intervention | Operational efficiency |
| Hunting cadence | Number of completed hunts per month | Proactive maturity |
| Detection-to-production time | Time from threat intel to deployed detection | Engineering agility |
| Analyst retention | Average tenure and turnover rate | Team health |
Incident Response Integration
The SOC is the first line of defense, but major incidents require escalation to a broader IR team. Define clear escalation criteria establishing when a SOC alert becomes an IR incident. Establish handoff procedures between SOC analysts and IR team. Pre-define containment authorities specifying what SOC analysts can do without approval. Integrate SOC ticketing with IR case management. Conduct joint exercises between SOC and IR teams quarterly.
Getting Started
Phase 1: Foundation (Months 1-3)
Select and deploy SIEM and EDR/XDR platforms. Onboard Tier 1 data sources (endpoint, identity, email, cloud). Hire initial team (minimum 3 analysts for business-hours coverage) or engage MDR. Establish basic detection rules and alert triage procedures.
Phase 2: Operationalization (Months 4-8)
Deploy SOAR for repetitive triage and response automation. Onboard Tier 2 data sources (network, DNS, proxy). Implement detection-as-code workflow. Establish metrics tracking and regular reporting. Expand to 24x7 coverage (MDR hybrid if not fully staffed).
Phase 3: Maturation (Months 9-14)
Launch threat hunting program. Deploy AI copilots for analyst augmentation. Map and track MITRE ATT&CK detection coverage. Implement advanced automation (AI-assisted triage, automated containment). Optimize based on metrics by tuning detections, reducing false positives, and improving MTTD/MTTR.