Penetration tests find vulnerabilities. Red team exercises test your organization’s ability to detect and respond to a real adversary. Purple team exercises take that a step further by turning adversarial simulation into a collaborative feedback loop between offense and defense. Despite their value, most organizations confuse these disciplines, scope them poorly, or fail to translate findings into measurable security improvements.
This guide covers the practical differences between these exercise types, how to scope and execute them effectively, and how to build a continuous validation program that actually improves your security posture.
Understanding the Differences
Penetration Testing
A penetration test is a time-boxed technical assessment focused on finding vulnerabilities in a defined scope.
The objective is to identify exploitable vulnerabilities in systems, applications, or networks. Scope is narrowly defined to specific applications, IP ranges, or environments. The approach is methodical testing against a checklist (OWASP, PTES, NIST SP 800-115). Stealth is minimal since testers are not trying to evade detection. Duration typically runs 1-3 weeks. The deliverable is a vulnerability report with severity ratings and remediation guidance. Blue team awareness is usually full, meaning they know the test is happening.
Red Team Exercise
A red team exercise simulates a real-world adversary attempting to achieve specific objectives against your organization.
The objective is to test detection and response capabilities against realistic attack scenarios. Scope is broad, with the entire organization in scope unless explicitly excluded. The approach is goal-oriented, adversary-emulation where the red team operates like a real threat actor. Stealth is high since the red team actively evades detection, and getting caught is a data point, not a failure. Duration runs 4-12 weeks, sometimes longer. The deliverable is a narrative report describing the attack path, detection gaps, and response effectiveness. Blue team awareness is typically blind (unaware), or only senior leadership is informed.
Purple Team Exercise
A purple team exercise is a collaborative engagement where red and blue teams work together to test and improve detection and response capabilities.
The objective is to systematically validate and improve defensive controls against known attack techniques. Scope is defined by MITRE ATT&CK techniques or specific threat scenarios. The approach is iterative where the red team executes a technique, blue team attempts to detect it, and both sides discuss and tune. Stealth is nonexistent since there’s full transparency between teams. Duration is 1-5 days per engagement, recurring monthly or quarterly. The deliverable is a detection coverage matrix, tuning recommendations, and new detection rules. Blue team awareness is full since they’re active participants.
When to Use Each
| Scenario | Recommended Exercise |
|---|---|
| Compliance requirement for vulnerability assessment | Penetration test |
| Validating detection coverage for specific threat actor TTPs | Purple team |
| Testing incident response under realistic conditions | Red team |
| New SOC deployment needing to baseline detection capabilities | Purple team |
| Board-level assurance that defenses work against advanced threats | Red team |
| Ongoing validation of detection engineering program | Continuous purple team |
Scoping and Rules of Engagement
Rules of Engagement Document
Every exercise requires a formal Rules of Engagement (ROE) document signed by both the exercise team and organizational leadership. This is non-negotiable.
Required elements include authorization (explicit written authorization from an executive with authority like the CISO, CTO, or CEO), scope (what is in scope and, critically, what is explicitly out of scope), objectives (what the exercise is trying to test or prove), timing (start date, end date, and any blackout windows like end-of-quarter freeze), communication plan (emergency contacts, escalation procedures, safe words for deconfliction), legal considerations (ensure the exercise complies with applicable laws, especially for cloud environments, third-party systems, and international operations), data handling (rules for handling any sensitive data encountered during the exercise), and deconfliction (process for distinguishing exercise activity from real attacks).
Scoping a Red Team Exercise
Red team scope should be defined by objectives, not by systems.
Good objectives include questions like “Can an external threat actor gain access to the payment processing environment?” or “Can a compromised employee workstation be used to exfiltrate customer PII?” or “How long does it take the SOC to detect and respond to lateral movement in the corporate network?”
Poor objectives include “Find all vulnerabilities in our network” (that is a pen test) or “Test our security” (too vague to measure success).
Exclusions and Safety Controls
Even in broad-scope red team exercises, certain targets should be excluded. Life-safety systems like medical devices and industrial control systems should be excluded unless specifically in scope with appropriate safeguards. Production databases with real customer data should be avoided where possible by using synthetic data environments. Third-party systems without authorization (cloud providers, SaaS applications, and partner networks) require separate authorization. Denial of service and availability attacks should generally be excluded unless specifically scoped.
Establish a safe word or emergency stop procedure. If exercise activity inadvertently causes a real incident or business disruption, both sides need a way to immediately halt and deconflict.
MITRE ATT&CK Mapping
Why ATT&CK Matters for Exercises
MITRE ATT&CK provides a common language for describing adversary behavior. Using ATT&CK to plan and report exercises enables threat-informed testing by mapping exercises to techniques used by threat actors relevant to your industry, coverage measurement by tracking which techniques your detections cover and which have gaps, progress tracking by measuring improvement over time as detections are added and tuned, and communication by giving executives a visual representation of defensive coverage.
Building a Threat Profile
Before planning an exercise, build a threat profile for your organization. First identify relevant threat groups using MITRE ATT&CK Groups, Mandiant threat reports, or sector-specific ISACs to identify adversaries that target your industry. Then extract their TTPs by mapping each threat group’s known techniques to the ATT&CK matrix. Prioritize techniques by focusing on those that are common across multiple relevant threat groups, have high impact, and are feasible to simulate. Finally build an exercise plan by selecting 10-20 techniques for a purple team engagement, or define an attack path using chained techniques for a red team exercise.
ATT&CK-Based Exercise Planning Template
For each technique in the exercise plan, document:
| Field | Description |
|---|---|
| Technique ID | e.g., T1566.001 (Spearphishing Attachment) |
| Tactic | e.g., Initial Access |
| Procedure | Specific implementation: “Send email with macro-enabled Word document to finance team” |
| Expected detection | What should trigger: “Email gateway alert, EDR alert on macro execution” |
| Data sources required | Email logs, endpoint telemetry, process creation events |
| Success criteria | Detection: alert generated within 5 minutes. Response: endpoint isolated within 30 minutes. |
Attack Simulation Frameworks
Open-Source Frameworks
Atomic Red Team (Red Canary) provides a library of small, discrete tests mapped to ATT&CK techniques. Each “atomic” test executes a single technique and can be run independently, making it ideal for purple team exercises. Tests are defined in YAML and executed via PowerShell (Invoke-AtomicRedTeam) or bash. There are 700+ tests covering most ATT&CK techniques. The low barrier to entry means tests can be run by defenders, not just offensive specialists. The limitation is that individual technique execution does not simulate realistic attack chains.
Caldera (MITRE) is an automated adversary emulation platform that chains techniques into realistic attack scenarios. It’s agent-based, meaning you deploy agents on test systems and run operations from a central server. It supports custom adversary profiles mapped to ATT&CK and can run autonomously or with manual operator control. It’s more realistic than Atomic Red Team but requires more setup.
Strider and Prelude Operator are additional open-source options for automated adversary emulation with varying levels of sophistication.
Commercial Platforms
Breach and Attack Simulation (BAS) platforms include SafeBreach, AttackIQ, Cymulate, and Picus Security. These platforms continuously simulate attacks against production environments to validate security controls. Advantages include continuous validation, executive dashboards, and integration with SIEM and EDR. Limitations are that simulations are not real attacks, so they test control effectiveness but may not reflect actual attacker behavior. The use case is ongoing hygiene and regression testing between manual exercises.
Choosing the Right Tool
| Need | Recommended Tool |
|---|---|
| Quick purple team with limited offensive expertise | Atomic Red Team |
| Realistic multi-stage attack simulation | Caldera or commercial BAS |
| Continuous control validation | Commercial BAS (SafeBreach, AttackIQ) |
| Full adversary emulation with custom tooling | Experienced red team with custom implants |
Purple Team Collaboration Model
Purple Team Workflow
A structured purple team engagement follows this iterative cycle.
Step 1 is to plan by selecting techniques based on threat profile and defining procedures, expected detections, and success criteria. Step 2 is to execute as the red team performs the technique in a controlled manner while blue team monitors in real time. Step 3 is to detect as the blue team attempts to identify the activity using existing detections and tools, documenting what triggered, what was missed, and what was noisy. Step 4 is to analyze as both teams discuss the results, covering why it was detected or missed, what data sources are available, and what detection logic would work. Step 5 is to improve as the blue team writes or tunes detection rules and the red team re-executes to validate the new detection. Step 6 is to document by recording the technique, procedure, detection status, and any new rules created in a detection coverage matrix.
Running an Effective Purple Team Day
A typical purple team engagement covers 8-15 techniques in a single day.
In the morning (4-5 techniques), brief the team on the day’s objectives and threat scenario, execute techniques sequentially spending 20-30 minutes per technique, and include initial access, execution, and persistence techniques.
In the afternoon (4-5 techniques), focus on lateral movement, privilege escalation, and exfiltration techniques. Allow time for detection tuning and re-testing since these techniques often require more investigation time.
At end of day (1 hour), review results and update the detection coverage matrix, identify quick wins (detections that can be deployed this week) and gaps requiring longer-term investment (new data sources, new tools), and schedule follow-up actions.
Roles in a Purple Team Exercise
| Role | Responsibility |
|---|---|
| Purple team lead | Facilitates the exercise, manages the schedule, mediates between teams |
| Red team operator | Executes attack techniques according to the plan |
| Detection engineer | Analyzes telemetry, writes and tunes detection rules in real time |
| SOC analyst | Monitors alerts and investigates activity as they would in production |
| Scribe | Documents every technique execution, detection result, and action item |
Reporting and Remediation Tracking
Red Team Report Structure
A red team report should tell a story, not just list findings.
The executive summary is a one-page narrative of what was attempted, what succeeded, and what the business impact could have been. The attack narrative is a chronological description of the attack path from initial access through objective completion, including timelines and screenshots. The detection and response assessment covers when (if ever) the red team was detected, how the SOC responded, and what the gap was between compromise and detection. Findings are specific vulnerabilities and control gaps that enabled the attack path, with each finding mapped to ATT&CK techniques. Recommendations are prioritized remediation actions with estimated effort and impact. The ATT&CK heat map is a visual representation of techniques used, detected, and missed.
Purple Team Report Structure
Purple team reports are more granular and action-oriented.
The technique coverage matrix is a table showing every technique tested, detection status (detected/partial/missed), data source availability, and detection rule status. Detection gaps are techniques with no detection, ranked by risk and adversary relevance. New detections created lists rules written during the exercise, with logic and testing status. Tuning recommendations cover existing rules that need adjustment to reduce false positives or improve fidelity. Data source gaps are telemetry that is not currently collected but would enable detection of tested techniques. Trend analysis shows detection coverage improvement over time if running recurring purple teams.
Remediation Tracking
Findings without remediation tracking are findings that never get fixed. Create tickets in your ITSM system for every finding with an assigned owner and due date. Use severity-based SLAs: critical findings (30 days), high (60 days), medium (90 days). Schedule a remediation review 30 days after the report is delivered. Re-test remediated findings in the next exercise to confirm they are actually fixed. Track remediation rate as a KPI and escalate if findings persist across multiple exercises.
Continuous Validation
Moving Beyond Point-in-Time Exercises
Annual red team engagements provide a snapshot, but adversaries do not wait for your next exercise. Continuous validation combines multiple approaches.
Weekly automated BAS tests validate that security controls are functioning (EDR is blocking known malware, email gateway is catching phishing payloads, firewall rules are enforced).
Monthly tabletop exercises walk through incident scenarios with the SOC and IR teams. No technical execution, just focus on process and decision-making.
Quarterly purple team engagements test detection coverage against new techniques, recently published threat intelligence, and changes in the environment.
Annually, a full red team exercise tests the organization’s end-to-end detection and response capability under realistic conditions.
Measuring Improvement Over Time
Track these metrics across exercises to demonstrate improvement. ATT&CK technique detection coverage measures the percentage of relevant techniques with validated detections, targeting 70%+ for techniques used by your top threat actors. Mean time to detect (MTTD) measures how quickly the SOC identifies red team activity and should decrease over time. Mean time to respond (MTTR) measures how quickly containment actions are taken after detection and should decrease over time. Remediation closure rate is the percentage of findings from previous exercises confirmed fixed, targeting 90%+ within SLA. Detection engineering velocity is the number of new detection rules created or tuned per quarter.
Building an Internal Red Team
Organizations with mature security programs should consider building an internal red team. The minimum viable team is 2-3 experienced offensive security professionals. Skills needed include network penetration testing, application security, cloud security, social engineering, and custom tool development. Common baseline certifications are OSCP, OSCE, CRTO, and GXPN. The team should report to the CISO but operate independently from the defensive security team to maintain objectivity. Engage external red teams annually for fresh perspective and techniques your internal team may not possess.
Common Mistakes
Running red team exercises without a mature SOC is problematic. If you have no detection capability, a red team will simply confirm what you already know, that you cannot detect anything. Start with purple team exercises to build detection coverage first.
Treating findings as punitive creates resistance. If the SOC team fears blame for missing red team activity, they will resist future exercises. Frame findings as improvement opportunities, not failures.
Scope creep is dangerous. Red teams that expand scope without authorization risk causing real damage. Enforce the ROE strictly.
Reporting without context is unhelpful. A finding of “obtained domain admin” is meaningless without the attack path, business impact, and detection timeline. Invest in quality reporting.
One and done thinking is insufficient. A single exercise provides a snapshot. Security validation must be continuous to be meaningful.