Building and Running a Security Operations Center

Building a Security Operations Center (SOC) is one of the most impactful and expensive security investments an organization can make. A 24x7 in-house SOC costs $2-7 million per year in staffing, technology, and facilities, while managed detection and response (MDR) services offer comparable outcomes starting at around $50K/year for 100 endpoints. The WEF 2025 Global Cybersecurity Outlook found that 67% of organizations report a moderate-to-critical cybersecurity skill gap, making the build-vs-buy decision more consequential than ever.

This guide covers SOC design, technology stack, staffing, and the shift toward AI-augmented operations that is redefining what a SOC looks like in 2026.

SOC Models

In-House SOC

A fully staffed, organization-owned security operations function makes sense for large enterprises with complex, heterogeneous environments, for organizations with strict regulatory requirements demanding internal control (defense, finance, healthcare), when there’s enough scale to justify the investment (typically 5,000+ endpoints), and when the organization-specific threat landscape requires deep institutional knowledge.

The cost structure includes 24x7 coverage requiring minimum 8-12 analysts (3 shifts plus coverage for PTO and turnover), SIEM, EDR/XDR, SOAR, and threat intelligence platform licensing, plus facility costs, training, and continuous professional development. Total: $2-7M/year depending on location, scale, and tooling.

MDR (Managed Detection and Response)

An external provider handles monitoring, detection, and response. This works well for mid-market organizations (500-5,000 endpoints), when there’s limited security hiring capacity, when you need 24x7 coverage without building a full team, or when there’s willingness to trust an external party with detection and containment decisions.

The cost structure runs around $50K/year for 100 endpoints and approximately $500K/year for 10,000 endpoints. This typically includes EDR/XDR technology, 24x7 monitoring, and response. Gartner estimated that 50% of organizations use MDR as of 2025.

Hybrid SOC

An internal security team augmented by MDR for off-hours coverage, specialized capabilities, or overflow. This makes sense when an organization wants internal control during business hours with 24x7 external coverage, when a core team handles tuning, threat hunting, and escalation while MDR handles alert triage and initial response, or when budget supports a small internal team but not full 24x7 staffing.

This is the most common model for organizations with 2-5 internal security staff.

Technology Stack

Core Components

Component	Function	Examples
SIEM	Log aggregation, correlation, detection	Microsoft Sentinel, Splunk, Google Chronicle, Elastic Security
EDR/XDR	Endpoint/cross-domain detection and response	CrowdStrike Falcon, SentinelOne, Microsoft Defender, Cortex XDR
SOAR	Automation, orchestration, case management	Palo Alto XSOAR, Splunk SOAR, Tines, Torq
Threat Intelligence	IOCs, TTPs, campaign tracking	Recorded Future, Mandiant Advantage, MISP, OpenCTI
Ticketing	Incident tracking, workflow, metrics	ServiceNow SecOps, Jira, PagerDuty

SIEM Selection Criteria

The SIEM is the SOC’s central nervous system. Key decision factors include data ingestion cost model (per-GB like traditional Splunk, per-endpoint, workload-based like the new Splunk/Cisco model, or fixed-price like Google Chronicle), detection engineering (quality of built-in rules, custom rule flexibility, MITRE ATT&CK mapping), investigation workflow (query language, dashboard quality, case management integration), AI capabilities (natural language investigation, automated triage, suggested remediation), and integration (data source connectors, SOAR integration, threat intelligence feeds).

SOAR and Automation

SOAR platforms are evolving from script-heavy playbook engines to AI-driven orchestration.

Generation 1 (mid-2010s) featured rule-based automation with rigid playbooks. Generation 2 (2020-2023) added GenAI-augmented playbooks with natural language interfaces. Generation 3 (2024+) brings agentic AI platforms where AI agents autonomously investigate, triage, and take containment actions.

Leading agentic SOC platforms include Exaforce, Dropzone AI, Radiant Security, and D3 Security Morpheus. These platforms can investigate alerts end-to-end, correlate data across sources, and recommend or execute containment, reducing the analyst role from alert triage to automation oversight and exception handling.

Data Sources

A SOC is only as effective as its telemetry. Prioritize these data sources:

Tier 1 (Essential) includes endpoint telemetry (EDR/XDR), authentication logs (Active Directory, Entra ID, Okta, cloud IAM), email security alerts, and cloud audit logs (CloudTrail, Azure Activity Log, GCP Audit Logs).

Tier 2 (Important) includes network flow data (NetFlow, VPC Flow Logs), DNS logs, web proxy / CASB logs, and firewall logs.

Tier 3 (Valuable) includes application security logs (WAF, API gateway), DLP alerts, physical security (badge access), and vulnerability scanner findings.

SOC Staffing

Roles and Responsibilities

Role	Responsibility	Experience Level
SOC Analyst (L1)	Alert triage, initial classification, escalation	0-2 years
SOC Analyst (L2)	Deep investigation, containment, forensics	2-5 years
SOC Analyst (L3)	Advanced analysis, malware reverse engineering, threat hunting	5+ years
Detection Engineer	Write and tune detection rules, reduce false positives	3-5 years
Automation Engineer	Build and maintain SOAR playbooks and integrations	3-5 years
SOC Manager	Operations management, metrics, reporting, hiring	7+ years
Threat Hunter	Proactive hypothesis-driven hunting	5+ years

AI Impact on Staffing

AI copilots and agentic platforms are reshaping SOC staffing in 2026. L1 triage is automatable, with AI agents handling alert triage and reducing the need for dedicated L1 analysts. L2 analysts become AI supervisors, reviewing and validating AI-generated investigations rather than building them from scratch. Detection engineering grows, with demand for engineers who can write detection logic, tune AI models, and manage automation. The net effect is that SOC teams are getting smaller but more skilled, as junior alert-triage positions decline while senior engineering roles expand.

A Singaporean healthcare organization reported a 70% reduction in response times after deploying SOC AI copilots, while maintaining the same team size.

Shift Schedules

24x7 coverage options:

Model	Staff Required	Pros	Cons
4x12 (four 12-hour shifts)	8-10 analysts	Fewer handoffs, simpler scheduling	Long shifts cause fatigue
Follow-the-sun (geographically distributed)	6-9 analysts	Normal working hours for everyone	Coordination complexity
Hybrid (MDR off-hours)	3-5 analysts + MDR	Most cost-effective 24x7 coverage	Dependency on external provider

Detection Engineering

Detection-as-Code

Treat detection rules as software: version controlled, tested, and reviewed. Store detection rules in Git alongside infrastructure code. Use Sigma format for vendor-agnostic rule writing. Implement CI/CD for detections by linting, testing against sample data, and deploying to SIEM. Map every detection to MITRE ATT&CK techniques for coverage tracking. Measure and track detection coverage against the ATT&CK matrix to identify gaps.

Tuning and False Positive Management

False positives are the primary driver of SOC inefficiency and analyst burnout. Track false positive rate per detection rule. Rules exceeding 80% false positive rate should be tuned or disabled. Use allowlists judiciously since every allowlist entry is a potential detection blind spot. Review tuning decisions quarterly because environmental changes may invalidate previous tuning. AI-based noise reduction tools like Semgrep and Fortify Aviator can reduce SAST false positives by up to 98%.

SOC Metrics

Operational Metrics

Metric	Description	Target
MTTD (Mean Time to Detect)	Time from compromise to detection	Under 24 hours
MTTR (Mean Time to Respond)	Time from detection to containment	Under 1 hour (critical)
Alert volume	Total alerts per day	Trending down through tuning
False positive rate	Percentage of alerts that are benign	Under 30%
Escalation rate	Percentage of alerts requiring L2+ investigation	10-20%
SLA compliance	Percentage of incidents resolved within SLA	Over 95%

Maturity Metrics

Metric	Description	Indicates
ATT&CK coverage	Percentage of ATT&CK techniques with detection rules	Detection breadth
Automation rate	Percentage of alerts handled without human intervention	Operational efficiency
Hunting cadence	Number of completed hunts per month	Proactive maturity
Detection-to-production time	Time from threat intel to deployed detection	Engineering agility
Analyst retention	Average tenure and turnover rate	Team health

Incident Response Integration

The SOC is the first line of defense, but major incidents require escalation to a broader IR team. Define clear escalation criteria establishing when a SOC alert becomes an IR incident. Establish handoff procedures between SOC analysts and IR team. Pre-define containment authorities specifying what SOC analysts can do without approval. Integrate SOC ticketing with IR case management. Conduct joint exercises between SOC and IR teams quarterly.

Getting Started

Phase 1: Foundation (Months 1-3)

Select and deploy SIEM and EDR/XDR platforms. Onboard Tier 1 data sources (endpoint, identity, email, cloud). Hire initial team (minimum 3 analysts for business-hours coverage) or engage MDR. Establish basic detection rules and alert triage procedures.

Phase 2: Operationalization (Months 4-8)

Deploy SOAR for repetitive triage and response automation. Onboard Tier 2 data sources (network, DNS, proxy). Implement detection-as-code workflow. Establish metrics tracking and regular reporting. Expand to 24x7 coverage (MDR hybrid if not fully staffed).

Phase 3: Maturation (Months 9-14)

Launch threat hunting program. Deploy AI copilots for analyst augmentation. Map and track MITRE ATT&CK detection coverage. Implement advanced automation (AI-assisted triage, automated containment). Optimize based on metrics by tuning detections, reducing false positives, and improving MTTD/MTTR.