Penetration tests find vulnerabilities. Red team exercises test your organization’s ability to detect and respond to a real adversary. Purple team exercises take that a step further by turning adversarial simulation into a collaborative feedback loop between offense and defense. Despite their value, most organizations confuse these disciplines, scope them poorly, or fail to translate findings into measurable security improvements.

This guide covers the practical differences between these exercise types, how to scope and execute them effectively, and how to build a continuous validation program that actually improves your security posture.

Understanding the Differences

Penetration Testing

A penetration test is a time-boxed technical assessment focused on finding vulnerabilities in a defined scope.

The objective is to identify exploitable vulnerabilities in systems, applications, or networks. Scope is narrowly defined to specific applications, IP ranges, or environments. The approach is methodical testing against a checklist (OWASP, PTES, NIST SP 800-115). Stealth is minimal since testers are not trying to evade detection. Duration typically runs 1-3 weeks. The deliverable is a vulnerability report with severity ratings and remediation guidance. Blue team awareness is usually full, meaning they know the test is happening.

Red Team Exercise

A red team exercise simulates a real-world adversary attempting to achieve specific objectives against your organization.

The objective is to test detection and response capabilities against realistic attack scenarios. Scope is broad, with the entire organization in scope unless explicitly excluded. The approach is goal-oriented, adversary-emulation where the red team operates like a real threat actor. Stealth is high since the red team actively evades detection, and getting caught is a data point, not a failure. Duration runs 4-12 weeks, sometimes longer. The deliverable is a narrative report describing the attack path, detection gaps, and response effectiveness. Blue team awareness is typically blind (unaware), or only senior leadership is informed.

Purple Team Exercise

A purple team exercise is a collaborative engagement where red and blue teams work together to test and improve detection and response capabilities.

The objective is to systematically validate and improve defensive controls against known attack techniques. Scope is defined by MITRE ATT&CK techniques or specific threat scenarios. The approach is iterative where the red team executes a technique, blue team attempts to detect it, and both sides discuss and tune. Stealth is nonexistent since there’s full transparency between teams. Duration is 1-5 days per engagement, recurring monthly or quarterly. The deliverable is a detection coverage matrix, tuning recommendations, and new detection rules. Blue team awareness is full since they’re active participants.

When to Use Each

ScenarioRecommended Exercise
Compliance requirement for vulnerability assessmentPenetration test
Validating detection coverage for specific threat actor TTPsPurple team
Testing incident response under realistic conditionsRed team
New SOC deployment needing to baseline detection capabilitiesPurple team
Board-level assurance that defenses work against advanced threatsRed team
Ongoing validation of detection engineering programContinuous purple team

Scoping and Rules of Engagement

Rules of Engagement Document

Every exercise requires a formal Rules of Engagement (ROE) document signed by both the exercise team and organizational leadership. This is non-negotiable.

Required elements include authorization (explicit written authorization from an executive with authority like the CISO, CTO, or CEO), scope (what is in scope and, critically, what is explicitly out of scope), objectives (what the exercise is trying to test or prove), timing (start date, end date, and any blackout windows like end-of-quarter freeze), communication plan (emergency contacts, escalation procedures, safe words for deconfliction), legal considerations (ensure the exercise complies with applicable laws, especially for cloud environments, third-party systems, and international operations), data handling (rules for handling any sensitive data encountered during the exercise), and deconfliction (process for distinguishing exercise activity from real attacks).

Scoping a Red Team Exercise

Red team scope should be defined by objectives, not by systems.

Good objectives include questions like “Can an external threat actor gain access to the payment processing environment?” or “Can a compromised employee workstation be used to exfiltrate customer PII?” or “How long does it take the SOC to detect and respond to lateral movement in the corporate network?”

Poor objectives include “Find all vulnerabilities in our network” (that is a pen test) or “Test our security” (too vague to measure success).

Exclusions and Safety Controls

Even in broad-scope red team exercises, certain targets should be excluded. Life-safety systems like medical devices and industrial control systems should be excluded unless specifically in scope with appropriate safeguards. Production databases with real customer data should be avoided where possible by using synthetic data environments. Third-party systems without authorization (cloud providers, SaaS applications, and partner networks) require separate authorization. Denial of service and availability attacks should generally be excluded unless specifically scoped.

Establish a safe word or emergency stop procedure. If exercise activity inadvertently causes a real incident or business disruption, both sides need a way to immediately halt and deconflict.

MITRE ATT&CK Mapping

Why ATT&CK Matters for Exercises

MITRE ATT&CK provides a common language for describing adversary behavior. Using ATT&CK to plan and report exercises enables threat-informed testing by mapping exercises to techniques used by threat actors relevant to your industry, coverage measurement by tracking which techniques your detections cover and which have gaps, progress tracking by measuring improvement over time as detections are added and tuned, and communication by giving executives a visual representation of defensive coverage.

Building a Threat Profile

Before planning an exercise, build a threat profile for your organization. First identify relevant threat groups using MITRE ATT&CK Groups, Mandiant threat reports, or sector-specific ISACs to identify adversaries that target your industry. Then extract their TTPs by mapping each threat group’s known techniques to the ATT&CK matrix. Prioritize techniques by focusing on those that are common across multiple relevant threat groups, have high impact, and are feasible to simulate. Finally build an exercise plan by selecting 10-20 techniques for a purple team engagement, or define an attack path using chained techniques for a red team exercise.

ATT&CK-Based Exercise Planning Template

For each technique in the exercise plan, document:

FieldDescription
Technique IDe.g., T1566.001 (Spearphishing Attachment)
Tactice.g., Initial Access
ProcedureSpecific implementation: “Send email with macro-enabled Word document to finance team”
Expected detectionWhat should trigger: “Email gateway alert, EDR alert on macro execution”
Data sources requiredEmail logs, endpoint telemetry, process creation events
Success criteriaDetection: alert generated within 5 minutes. Response: endpoint isolated within 30 minutes.

Attack Simulation Frameworks

Open-Source Frameworks

Atomic Red Team (Red Canary) provides a library of small, discrete tests mapped to ATT&CK techniques. Each “atomic” test executes a single technique and can be run independently, making it ideal for purple team exercises. Tests are defined in YAML and executed via PowerShell (Invoke-AtomicRedTeam) or bash. There are 700+ tests covering most ATT&CK techniques. The low barrier to entry means tests can be run by defenders, not just offensive specialists. The limitation is that individual technique execution does not simulate realistic attack chains.

Caldera (MITRE) is an automated adversary emulation platform that chains techniques into realistic attack scenarios. It’s agent-based, meaning you deploy agents on test systems and run operations from a central server. It supports custom adversary profiles mapped to ATT&CK and can run autonomously or with manual operator control. It’s more realistic than Atomic Red Team but requires more setup.

Strider and Prelude Operator are additional open-source options for automated adversary emulation with varying levels of sophistication.

Commercial Platforms

Breach and Attack Simulation (BAS) platforms include SafeBreach, AttackIQ, Cymulate, and Picus Security. These platforms continuously simulate attacks against production environments to validate security controls. Advantages include continuous validation, executive dashboards, and integration with SIEM and EDR. Limitations are that simulations are not real attacks, so they test control effectiveness but may not reflect actual attacker behavior. The use case is ongoing hygiene and regression testing between manual exercises.

Choosing the Right Tool

NeedRecommended Tool
Quick purple team with limited offensive expertiseAtomic Red Team
Realistic multi-stage attack simulationCaldera or commercial BAS
Continuous control validationCommercial BAS (SafeBreach, AttackIQ)
Full adversary emulation with custom toolingExperienced red team with custom implants

Purple Team Collaboration Model

Purple Team Workflow

A structured purple team engagement follows this iterative cycle.

Step 1 is to plan by selecting techniques based on threat profile and defining procedures, expected detections, and success criteria. Step 2 is to execute as the red team performs the technique in a controlled manner while blue team monitors in real time. Step 3 is to detect as the blue team attempts to identify the activity using existing detections and tools, documenting what triggered, what was missed, and what was noisy. Step 4 is to analyze as both teams discuss the results, covering why it was detected or missed, what data sources are available, and what detection logic would work. Step 5 is to improve as the blue team writes or tunes detection rules and the red team re-executes to validate the new detection. Step 6 is to document by recording the technique, procedure, detection status, and any new rules created in a detection coverage matrix.

Running an Effective Purple Team Day

A typical purple team engagement covers 8-15 techniques in a single day.

In the morning (4-5 techniques), brief the team on the day’s objectives and threat scenario, execute techniques sequentially spending 20-30 minutes per technique, and include initial access, execution, and persistence techniques.

In the afternoon (4-5 techniques), focus on lateral movement, privilege escalation, and exfiltration techniques. Allow time for detection tuning and re-testing since these techniques often require more investigation time.

At end of day (1 hour), review results and update the detection coverage matrix, identify quick wins (detections that can be deployed this week) and gaps requiring longer-term investment (new data sources, new tools), and schedule follow-up actions.

Roles in a Purple Team Exercise

RoleResponsibility
Purple team leadFacilitates the exercise, manages the schedule, mediates between teams
Red team operatorExecutes attack techniques according to the plan
Detection engineerAnalyzes telemetry, writes and tunes detection rules in real time
SOC analystMonitors alerts and investigates activity as they would in production
ScribeDocuments every technique execution, detection result, and action item

Reporting and Remediation Tracking

Red Team Report Structure

A red team report should tell a story, not just list findings.

The executive summary is a one-page narrative of what was attempted, what succeeded, and what the business impact could have been. The attack narrative is a chronological description of the attack path from initial access through objective completion, including timelines and screenshots. The detection and response assessment covers when (if ever) the red team was detected, how the SOC responded, and what the gap was between compromise and detection. Findings are specific vulnerabilities and control gaps that enabled the attack path, with each finding mapped to ATT&CK techniques. Recommendations are prioritized remediation actions with estimated effort and impact. The ATT&CK heat map is a visual representation of techniques used, detected, and missed.

Purple Team Report Structure

Purple team reports are more granular and action-oriented.

The technique coverage matrix is a table showing every technique tested, detection status (detected/partial/missed), data source availability, and detection rule status. Detection gaps are techniques with no detection, ranked by risk and adversary relevance. New detections created lists rules written during the exercise, with logic and testing status. Tuning recommendations cover existing rules that need adjustment to reduce false positives or improve fidelity. Data source gaps are telemetry that is not currently collected but would enable detection of tested techniques. Trend analysis shows detection coverage improvement over time if running recurring purple teams.

Remediation Tracking

Findings without remediation tracking are findings that never get fixed. Create tickets in your ITSM system for every finding with an assigned owner and due date. Use severity-based SLAs: critical findings (30 days), high (60 days), medium (90 days). Schedule a remediation review 30 days after the report is delivered. Re-test remediated findings in the next exercise to confirm they are actually fixed. Track remediation rate as a KPI and escalate if findings persist across multiple exercises.

Continuous Validation

Moving Beyond Point-in-Time Exercises

Annual red team engagements provide a snapshot, but adversaries do not wait for your next exercise. Continuous validation combines multiple approaches.

Weekly automated BAS tests validate that security controls are functioning (EDR is blocking known malware, email gateway is catching phishing payloads, firewall rules are enforced).

Monthly tabletop exercises walk through incident scenarios with the SOC and IR teams. No technical execution, just focus on process and decision-making.

Quarterly purple team engagements test detection coverage against new techniques, recently published threat intelligence, and changes in the environment.

Annually, a full red team exercise tests the organization’s end-to-end detection and response capability under realistic conditions.

Measuring Improvement Over Time

Track these metrics across exercises to demonstrate improvement. ATT&CK technique detection coverage measures the percentage of relevant techniques with validated detections, targeting 70%+ for techniques used by your top threat actors. Mean time to detect (MTTD) measures how quickly the SOC identifies red team activity and should decrease over time. Mean time to respond (MTTR) measures how quickly containment actions are taken after detection and should decrease over time. Remediation closure rate is the percentage of findings from previous exercises confirmed fixed, targeting 90%+ within SLA. Detection engineering velocity is the number of new detection rules created or tuned per quarter.

Building an Internal Red Team

Organizations with mature security programs should consider building an internal red team. The minimum viable team is 2-3 experienced offensive security professionals. Skills needed include network penetration testing, application security, cloud security, social engineering, and custom tool development. Common baseline certifications are OSCP, OSCE, CRTO, and GXPN. The team should report to the CISO but operate independently from the defensive security team to maintain objectivity. Engage external red teams annually for fresh perspective and techniques your internal team may not possess.

Common Mistakes

Running red team exercises without a mature SOC is problematic. If you have no detection capability, a red team will simply confirm what you already know, that you cannot detect anything. Start with purple team exercises to build detection coverage first.

Treating findings as punitive creates resistance. If the SOC team fears blame for missing red team activity, they will resist future exercises. Frame findings as improvement opportunities, not failures.

Scope creep is dangerous. Red teams that expand scope without authorization risk causing real damage. Enforce the ROE strictly.

Reporting without context is unhelpful. A finding of “obtained domain admin” is meaningless without the attack path, business impact, and detection timeline. Invest in quality reporting.

One and done thinking is insufficient. A single exercise provides a snapshot. Security validation must be continuous to be meaningful.