How to Choose an AI Penetration Testing Provider: The Buyer's Checklist
Pentestas Team
Security Analyst

A ten-item checklist you can take to any pentest as a service vendor.
In Detail
1. Does the vendor report findings without a working exploit?
Critical question. Legacy scanners report "something that looks SQL-adjacent in the response"; real AI pentest providers report only findings with a working proof-of-concept. Ask directly: "when your tool reports a SQL injection, do you require that a time-based blind payload actually produced a ~N-second delay, or do you report anything whose pattern matches?" If the vendor hedges, they're reporting pattern matches.
The "no exploit, no report" discipline is what keeps your triage queue tractable. Vendors without it will drown your Slack channel in 200-finding reports that are 70% false-positive.
Pentestas answer: yes. Accuracy Gate runs every finding through an orthogonal verifier in a fresh HTTP session; signals that don't reproduce are dropped before persistence. "Unproven" findings never reach the findings list.
Attack Chains
2. Do they synthesise attack chains?
Attack-chain synthesis — linking two MEDIUM findings into a CRITICAL compromise path — is where the real business risk hides. Ask: "show me an example where your tool found a chain of findings that, individually, wouldn't have been triaged as critical."
A vendor that ships only per-finding rows isn't producing AI pentest output. They're producing AI-narrated scanner output. The gap matters.
Pentestas answer: yes. 23 rule-based patterns + LLM-proposed chains, dedupe'd and rendered as an interactive mindmap. See Attack chains deep-dive.
In Detail
3. Do they support both black-box and white-box modes?
Black-box = they scan from outside. White-box = they also read your source code. The difference is roughly 2× the finding count on the same target. Vendors that only offer one of the two are leaving half the surface uncovered.
Pentestas answer: yes. Source-code-aware scans on the same target URL. Optional; recommended.
In Detail
4. Can they reach internal networks?
Your staging, your admin panels, your intranet apps, your VPC — all live on private-RFC-1918 addresses. If the vendor is cloud-only, they can't scan these. Ask: "can I deploy an agent inside my network so you can scan 10.0.0.0/24?"
Vendors that say "yes, via an on-prem appliance" are still a 2018 offering. Vendors that say "yes, via a small outbound-only WebSocket agent" are modern.
Pentestas answer: yes. Linux + Windows .NET agents — outbound WebSocket only, tenant-scoped, IP-allowlisted.
In Detail
5. Do they support complex login flows with 2FA?
Most production targets have 2FA on admin accounts. If the scanner can't authenticate past 2FA, it can't scan the most sensitive surface. Ask: "how do you handle TOTP 2FA on our admin login?"
Acceptable answers: "we generate TOTP codes inline from a Base32 seed"; "we use a delegated session-cookie paste approach with auto-refresh"; "we drive a browser through a configurable login script with TOTP substitution".
Unacceptable answers: "you'd paste the post-2FA cookie"; "you'd disable 2FA on the test account".
Pentestas answer: yes. YAML config + TOTP. $totp placeholder in login-flow steps, live RFC 6238 generation.
In Detail
6. Do they integrate with your CI?
If the vendor's only surface is a web dashboard, they don't fit continuous delivery. Ask: "show me a GitHub Actions or GitLab CI YAML that runs your pentest on every merge to main."
Pentestas answer: yes. Pentestas CLI — pentestas start -u URL -c config.yaml -w 1h drops into any CI. Exit codes gate the build.
In Detail
7. Do they cite source-code locations for findings (when you supply source)?
White-box mode is only useful if the vendor surfaces the specific file + line of the vulnerable code. "We ran static analysis" without location output is not useful; "finding is at src/db/queries.ts:42" is immediately actionable.
Pentestas answer: yes. Source-code citations on every finding when white-box mode is enabled.
In Detail
8. Do they filter false positives with transparency?
Two sub-questions:
Do they drop findings the AI judged as false-positive? The answer should be yes (otherwise your Slack channel drowns). But:
Can you see what was dropped + why? The answer should also be yes. A filter that's completely opaque is a filter you can't trust.
Pentestas answer: yes to both. Findings filtered by the AI false-positive layer are hidden by default with a Show AI-filtered toggle. Each filtered finding displays the AI's rationale; you can click Disagree to restore it.
Compliance
9. Do they meet your compliance + procurement needs?
Regulated buyers have non-negotiable requirements:
- **SOC 2 Type II*— non-negotiable for mid-market / enterprise.
- **BAA*— for HIPAA-covered entities.
- **DPASCC*— for GDPR-scope customers.
- **BYOK encryption*— for financial / healthcare / government.
- **On-prem / air-gapped deployment*— for some banks / defence.
- **EU data residency*— for EU customers with GDPR Schrems-II concerns.
Ask for each. Vendors that hedge on any of these don't scale into enterprise.
Pentestas answer: SOC 2 Type II + ISO 27001 certified; BAA on Business+; EU SCC + UK IDTA on request; BYOK on Enterprise; on-prem available under custom contract; EU deployment (Frankfurt + Dublin) on Enterprise.
The Numbers
10. Is the pricing transparent + sustainable?
Ask: "publish your pricing". If the answer is "contact sales", you're going to spend two weeks in a procurement cycle with unclear outcomes.
Sustainable pricing:
- Public tiers (FreeProEnterprise).
- Scan-cadence economics that match continuous delivery (no per-scan billing that penalises frequency).
- BYOK option for AI costs so you control the variable spend.
- Contract SLA that matches the tier.
Pentestas answer: public pricing, Free tier, Pro at ~$1,500/month, Enterprise negotiated. See pricing guide. BYOK Anthropic key supported — keeps variable AI costs on your billing.
In Detail
Bonus questions to disqualify fast
- **"Do you support [my specific framework / auth / stack]?"*Concrete answer: "yes, here's the sample output against a Next.jsClerkPostgres target". Hand-wavy answer: "we support all major frameworks" without a demo.
- **"What's your median time-to-first-finding on the free tier?"*Under 30 minutes means a mature onboarding experience. Over 2 hours means they don't care about evaluation UX.
- **"Who signs a security questionnaire / vendor security review?"*A vendor without a prepared security posture doc is a vendor who hasn't done an enterprise deal.
- **"What's your regression-rate on AI false-positive filtering quarter-over-quarter?"*Vendors with actual data show trend improvement. Vendors without data will answer vaguely.
In Detail
Running a head-to-head evaluation
Every mature security buying motion should run at least a 2-week head-to-head between two shortlisted vendors:
- Week 1. Both vendors scan the same target (your staging). Collect raw findings lists + chains + reports.
- Manual triage. Your senior security engineer rates each finding as real / false-positive / irrelevant. Compute the false-positive rate for each vendor. Compute the finding delta between the two.
- Week 2. Both vendors run a second scan 7 days later. Measure what's new / regressed / resolved in each.
- Total cost. Annual subscription + AI cost (if BYOK) + estimated engineer-triage time at $200/hour loaded.
Whichever vendor wins cost-per-actionable-finding gets the deal.
Pentestas routinely wins these head-to-heads — not because we're better at everything, but because the Accuracy Gate + attack-chain synthesis + source-code citations compound into lower triage cost. The per-finding cost gap is large enough that it outweighs smaller differences in pricing.
In Detail
Disqualifying patterns
Walk away from vendors that:
- Report theoretical vulnerabilities without working PoCs.
- Don't support CI integration except "through our dashboard".
- Require you to install an agent with inbound ports.
- Can't explain how they handle 2FA.
- Don't offer BAA / DPA / SOC 2 under NDA.
- Require annual commitment with no monthly option.
- Charge per-scan rather than per-tenant (mismatched economics for continuous delivery).
- Won't publish sample reports (what are they hiding?).
In Detail
The honest limits of AI pentest
An AI pentest is not a silver bullet. Three areas where human pentesters still dominate:
- Complex business-logic bugs. "Can I check a box that says I'm over 18 to bypass the age gate?" Requires human product understanding.
- Chained social-engineering attacks. Humans phish humans; AI does not (yet).
- Bespoke protocol attacks. Custom binary protocols, proprietary RPC, embedded device comms.
The correct programme is ai pentest as continuous baseline + annual human-led engagement for depth. Any vendor selling you "complete replacement" is overselling.
In Detail
The purchase decision
Ten-item checklist. Head-to-head evaluation. Total cost of ownership math. If a vendor scores ≥8/10 on the checklist, matches or beats on the head-to-head, and produces a sub-$25K/year TCO for a typical SaaS programme — they're worth the deal.
Pentestas hits 10/10. We're biased, but we'll stand behind that scorecard in a head-to-head against any vendor in the market.
Run a 2-week head-to-head evaluation
Register free, point at staging, ship the findings list to whoever you're comparing us against.
Start your AI pentestMore Reading
Further reading

Alexander Sverdlov
Founder of Pentestas. Author of 2 information security books, cybersecurity speaker at the largest cybersecurity conferences in Asia and a United Nations conference panelist. Former Microsoft security consulting team member, external cybersecurity consultant at the Emirates Nuclear Energy Corporation.