A vulnerability management program is the continuous process of identifying, evaluating, prioritizing, and remediating security weaknesses across your environment. This guide covers how to build one that scales.
Why Vulnerability Management Matters
Organizations with mature vulnerability management programs reduce their attack surface significantly. Most breaches exploit known vulnerabilities for which patches were already available. The challenge is operational, not informational.
Step 1: Asset Discovery and Inventory
You cannot protect what you do not know exists. Start with a complete asset inventory.
What to Inventory
- Endpoints like workstations, laptops, and mobile devices
- Servers including physical, virtual, and cloud instances (EC2, Azure VMs, GCE)
- Network devices such as routers, switches, firewalls, and load balancers
- Applications including web applications, APIs, and SaaS integrations
- Containers and serverless workloads like Kubernetes pods, Lambda functions, and Cloud Run services
- OT/IoT devices including industrial control systems, sensors, and cameras
Tools and Approaches
- Network scanning (Nmap, Masscan) for IP-based discovery
- Cloud API enumeration (AWS Config, Azure Resource Graph, GCP Asset Inventory)
- Agent-based inventory for endpoints (EDR agents, management tools)
- CMDB integration for reconciliation
Assign each asset an owner and a criticality rating (critical, high, medium, or low) based on the data it processes and its role in business operations.
Step 2: Vulnerability Scanning
Scanner Selection
Choose scanners based on your environment:
| Environment | Scanner Type | Examples |
|---|---|---|
| Network/infrastructure | Authenticated network scanners | Tenable Nessus, Qualys VMDR, Rapid7 InsightVM |
| Web applications | DAST scanners | Burp Suite, OWASP ZAP, Nuclei |
| Source code | SAST scanners | Semgrep, SonarQube, Checkmarx |
| Containers | Image scanners | Trivy, Grype, Snyk Container |
| Cloud infrastructure | CSPM tools | Wiz, Prisma Cloud, AWS Security Hub |
| Dependencies | SCA scanners | Snyk, Dependabot, Renovate |
Scan Frequency
Critical assets should be scanned weekly or continuously. Standard production systems need bi-weekly scans. Development and staging environments should be scanned on deployment through CI/CD integration. Run a full environment scan monthly at minimum.
Authenticated vs. Unauthenticated Scanning
Always prefer authenticated scans for internal systems. Unauthenticated scans miss a significant portion of vulnerabilities because they cannot inspect installed software versions, configurations, or local services.
Step 3: Risk-Based Prioritization
Raw vulnerability counts are not useful. A CVSS 9.8 on an air-gapped test server is less urgent than a CVSS 7.5 on your internet-facing payment system.
Prioritization Framework
Combine multiple factors:
- CVSS base score reflects the severity of the vulnerability itself
- EPSS score (Exploit Prediction Scoring System) estimates the probability of exploitation in the next 30 days
- If the vulnerability is in CISA’s Known Exploited Vulnerabilities catalog, treat it as critical regardless of CVSS
- Asset criticality reflects the business impact of the affected system
- Consider exposure: is it internet-facing or internal-only?
- Account for compensating controls like WAF rules, network segmentation, or EDR coverage that reduce exploitability
Practical Priority Tiers
| Tier | Criteria | SLA |
|---|---|---|
| P1 (Critical) | CISA KEV, or CVSS 9+ on critical/internet-facing assets | 48 hours |
| P2 (High) | CVSS 7-8.9 on production, or high EPSS on any asset | 7 days |
| P3 (Medium) | CVSS 4-6.9 on production systems | 30 days |
| P4 (Low) | CVSS < 4, or any finding on non-production | 90 days |
Step 4: Remediation Workflows
Identifying vulnerabilities without fixing them is an expensive audit exercise. Build remediation into existing workflows.
Remediation Options
Patching by applying vendor-supplied updates is preferred. Upgrading to a non-vulnerable version of the software is another option. When patching is not immediately possible, mitigate by applying compensating controls such as WAF rules, network restrictions, or configuration hardening. For risk acceptance, document the risk and obtain formal management approval with a review date.
Integration with Engineering Workflows
File remediation tasks as tickets in Jira, ServiceNow, or your existing issue tracker. Assign tickets to asset owners, not to the security team. Include reproduction steps, impact assessment, and remediation guidance. Track SLA adherence and escalate overdue items.
Patch Management
For infrastructure patching, maintain a patch testing environment that mirrors production. Use phased rollouts: test, then staging, then canary production, then full production. Schedule regular maintenance windows for non-emergency patches. Automate where possible using WSUS, SCCM, Ansible, or cloud-native patching.
Step 5: Metrics and Reporting
Key Metrics
Track Mean Time to Remediate (MTTR), which is the average time from discovery to fix broken down by severity. SLA compliance rate shows the percentage of vulnerabilities remediated within their priority SLA. Vulnerability density measures open vulnerabilities per asset or per business unit. Coverage tracks the percentage of assets scanned in the last 30 days. Recurrence rate shows vulnerabilities that reappear after remediation, which indicates systemic issues. For federal agencies and their contractors, track CISA KEV closure rate for compliance with BOD 22-01 timelines.
Reporting Cadence
Weekly reports cover operational metrics for the security team including new findings, overdue items, and SLA breaches. Monthly reports provide a management summary with trend analysis and top risks. Quarterly reports include an executive dashboard with risk posture trends and program maturity assessment.
Step 6: Continuous Improvement
Feedback Loops
When a vulnerability is exploited, conduct a post-incident review to analyze why it was not prioritized or remediated in time. Regularly tune scanner configurations to reduce noise and false positives. Perform quarterly coverage gap analysis to identify assets not covered by scanning. Revise remediation SLAs based on actual team capacity and threat landscape changes.
Maturity Model
| Level | Characteristics |
|---|---|
| 1 (Ad hoc) | Occasional scans, no formal process |
| 2 (Defined) | Regular scans, documented SLAs, assigned ownership |
| 3 (Managed) | Risk-based prioritization, metrics-driven, integrated with ticketing |
| 4 (Optimized) | Automated remediation, continuous scanning, predictive prioritization |
Common Pitfalls
Scanning without remediation ownership produces reports that nobody acts on. CVSS-only prioritization ignores context and leads to wasted effort on low-risk findings. Infrequent scanning misses vulnerabilities introduced between cycles. Without a formal exception process, vulnerabilities linger indefinitely with no accountability. Dumping thousands of findings on engineering teams without prioritization destroys credibility and causes alert fatigue.
Getting Started
If you are building from zero:
- Enumerate all assets and assign owners
- Deploy one authenticated scanner for your primary environment
- Define four priority tiers with remediation SLAs
- Create a ticketing workflow for remediation
- Report monthly on MTTR and SLA compliance
- Expand scanner coverage and automation over subsequent quarters