2026-04-21 · Pentestas Features

The annual pentest is broken. Here's how to replace it with a continuous pentest as a service that runs on every build and actually finds things.

Top row: the annual pentest. Bottom row: continuous AI-driven coverage.

Continuous pentest as a service is the operational model that replaces the annual pentest with nightly or per-build AI-driven testing — same blast radius, same depth, wildly better cadence.

💰

In Detail

The problem with the annual pentest

The annual pentest has three structural failures in 2026:

1. The coverage gap. Between pentests, your team ships roughly 260 production deploys (assuming daily deploys on weekdays). Every one of those is an untested change. By the time the next pentest lands, half the features the previous pentest covered no longer exist in the same form, and half the current code has never been tested at all. The pentest is testing a snapshot that was never in production.

2. The triage pileup. An annual pentest produces a 60-page PDF with 40 findings. Half are "recommendations", a third are "medium" that your team will probably defer, a quarter are the real bugs — and the remediation sprint all happens in the two weeks after the PDF arrives. The rest of the year the pentest is a document in SharePoint, not a living artefact. Real risk hides in the gap.

3. The asymmetric economics. A week of consultant time at $400/hour is $20K. Six days of hands-on-keyboard effort against one environment. Attackers, meanwhile, have infinite bot-hours and an LLM. The economic asymmetry has gotten so severe that continuous automated testing isn't a luxury — it's a necessity.

None of this means consultant-led pentests are obsolete. A great human pentester still finds things an LLM won't — complex business-logic bugs, chained social-engineering exploits, bespoke attacks against custom protocols. What's obsolete is the cadence, not the discipline. You need both — a deep annual engagement, and continuous coverage between engagements.

🔍

The Breakdown

What "continuous pentest as a service" looks like in practice

A Pentestas-style continuous pentest engagement has four pillars:

Pillar 1 — Per-build scanning

Your CI wires up a post-merge job: every change to main triggers an ai pentest against the staging environment. Hard-fail on new HIGH/CRITICAL findings; post summary to #secops. The scan finishes in 30–90 minutes; by the time QA gets the build, a PR comment either lists the security issues or confirms the merge is clean.

- name: Pentestas on staging
  env:
    PENTESTAS_API_KEY: ${{ secrets.PENTESTAS_API_KEY }}
  run: |
    pentestas start \
      -u https://staging-${{ github.sha }}.example.com \
      -c .pentestas/scan.yaml \
      -r . \
      -w 45m

Typical per-scan cost: ~$5 in Anthropic API charges (or free on subscription-plan routing, just slower). Typical delta per scan: 0–2 new findings. The 99th-percentile scan that finds a CRITICAL regression saves the programme many multiples of the full-year subscription cost.

Pillar 2 — Scheduled scans on production-parallel

Nightly or weekly against the production-parallel environment that mirrors production data shape. Picks up drift that didn't land in a specific PR (infrastructure changes, WAF rule updates, DB schema migrations). Delivers via Slack / webhook / email. Feeds the same findings DB as the per-build scans.

Pillar 3 — Per-finding rescan

Engineer ships a fix. The platform's rescan button re-fires the exact payload that produced the finding. If the oracle no longer triggers, the finding is marked resolved with a timestamp — clean audit trail from found-to-fixed, automatically.

Pillar 4 — Quarterly human-led deep engagement

The annual pentest becomes quarterly and focuses on what the AI pentest can't do well: business-logic flaws, bespoke protocol attacks, workflow manipulation, social-engineering-adjacent chains. Because the continuous coverage has already caught the OWASP Top 10 misbehaviours, the human pentester's week is spent on the high-signal work humans uniquely do.

The total spend typically drops — or stays flat while coverage massively improves — compared to "annual pentest + internal scanner subscription + mostly-ignored findings".

💼

By Industry

Industry-specific operating models

Fintech

Regulatory driver: PCI DSS 4.0 requirement 11.4.x — penetration testing after any significant change.

Recommended cadence: per-merge ai pentest on any PR touching payment flows; nightly ai pentest on the full CDE scope; quarterly human-led deep engagement focused on business-logic (race conditions, amount-manipulation, double-spend).

Evidence model: PCI-ROC-aligned scan history, one row per scan, linked to the merge commit. Findings → Jira tickets with remediation timestamps. The full programme satisfies 11.4.1, 11.4.2, 11.4.3 without additional tooling.

Medtech + healthtech

Regulatory driver: HIPAA Security Rule §164.308(a)(8) — periodic technical evaluation. HITRUST CSF if certified.

Recommended cadence: per-merge ai pentest for anything touching PHI-handling code; weekly full-scope scans of the EHR / patient-portal / caregiver-app; quarterly human engagement.

Evidence model: scan results tagged to the HITRUST control mapping. Pentestas's report export includes HITRUST-control-number columns where applicable. HIPAA auditors have not yet formally endorsed AI pentesting, but every covered entity we talk to is running it quietly and using the output as primary technical evidence.

Legaltech

Regulatory driver: ABA Model Rule 1.6 (confidentiality) + contractual obligations to enterprise clients.

Recommended cadence: per-merge ai pentest + quarterly human engagement. Many platforms also add on-demand scans before a major client migration.

Evidence model: client-facing attestation reports pulled from scan history. The Pro+ custom-branding feature lets the legal firm ship the report under its own cover, framed as internal-audit documentation that clients can review under NDA.

Banks + financial services

Regulatory driver: DORA for EU-regulated entities; NYDFS 500 §500.05; FFIEC CAT; OCC bulletin 2013-29.

Recommended cadence: internal agents running continuous ai pentest against intranet apps + public-app per-merge; monthly CIS M365 benchmark; quarterly tabletop + ad-hoc red-team; annual external pentest.

Evidence model: regulator-ready scan history with retention in the 7-year range (Enterprise plan includes unlimited retention). FFIEC control mapping + automated tickets into the bank's internal risk-management system via Pentestas webhooks.

Insurance

Regulatory driver: NAIC Insurance Data Security Model Law + state DFS regulations (NY, CA, VA leading).

Recommended cadence: same as banks, calibrated by data-sensitivity. Often adds on-demand ai pentest before M&A / carve-out integration activity.

Evidence model: broker / reinsurer-facing attestation of continuous security testing. The carrier that can show "we ran 340 pentest scans last year" signs faster than the one that shows "we did an annual pentest on X date".

💬

Tough Questions

The $100 objections — addressed head-on

"This will generate noise that nobody acts on."

It won't — because the Accuracy Gate + "no exploit, no report" policy targets a <10% false positive rate. A typical per-merge scan produces zero or one new finding. A typical weekly scan produces three to five. These are triageable numbers. Compare with open-source scanners that produce 80+ findings per run, 90% false-positive.

"Our auditor won't accept AI pentesting as primary evidence."

Your auditor will accept evidence — scan history, finding records, remediation timestamps, chain reports. They won't accept "we use AI" as a substitute for evidence. Pentestas produces the evidence; your auditor evaluates the evidence. We're seeing acceptance across SOC 2, ISO 27001, HITRUST, and PCI DSS 4.0 assessors who've reviewed the scan output.

"LLMs hallucinate."

They do. Pentestas addresses this with three filters: the Accuracy Gate (re-runs the payload through an independent HTTP client with fresh session state), the "no exploit, no report" rule (findings without a working PoC are discarded), and the AI false-positive filter (claims in the narrative that don't match the evidence get flagged). Every CRITICAL ships with a copy-and-paste PoC. Run it before you page someone.

"We already pay for Burp Suite Enterprise."

Great tool for the manual-tester-in-the-loop model. Different shape from continuous automated ai pentest. Most teams we talk to keep Burp for manual work and add Pentestas for the continuous automated coverage. They're complementary, not competitive.

"The cost will blow up."

Pentestas pricing is per-tenant, not per-scan. Scan-volume-economics match continuous cadence; you don't pay $5K per scan like a consultancy. Running 40 scans a month vs. 4 is a per-plan upgrade, not a 10x bill.

"We can't give you our source code."

White-box mode is optional. Source-code-aware scans produce higher-signal findings, but black-box ai pentesting is still the baseline product. Many regulated entities run black-box against staging and reserve source access for quarterly human engagements.

⚙️

How It Works

How to start

The minimum viable pentest-as-a-service programme:

Verify your primary production domain (5 min).
Create an API key (1 min).
Wire one scan into CI against staging (15 min).
Set up a Slack webhook (5 min).
Schedule a weekly full-scope scan (2 min).

Total lift: half a day. Ongoing effort: ~2 hours/week on triage, declining as false-positive filtering learns your environment.

Iterate by adding agent-based internal scanning, source-code-aware mode, and CIS benchmark automation over the following weeks.

Start your continuous pentest-as-a-service programme

Start your AI pentest

📚

In Pentestas's daily pipeline

The technique above runs inside Pentestas — an AI penetration testing system delivered as pentesting-as-a-service that exposes the same primitives to operators via Forge, Volley, the OAST callback host, and a per-scan capture corpus. Our penetration testing with Claude routing handles narrative reasoning and finding triage; our penetration testing with DeepSeek routing handles bulk verification and exploit-DB matching. Either backend lands findings in the same dedupe pipeline, the same accuracy gate, and the same Big-4-style PDF report — so a B2B SaaS pentest produces the same evidence quality whichever model touched it.

For teams new to penetration testing with AI, the platform's free tier (10 verified-domain scans per month) is enough to validate the approach against your own stack before committing to a paid plan.

Related reading

Run it on your stack: Penetration Testing →

Continuous Pentest as a Service: From Annual Audit to On-Demand Security Assurance