I Open-Sourced My AI Audit Workflow — Prompts, Pipelines, and Everything I Learned

After months of using AI to accelerate my smart contract audits — and making every possible mistake along the way — I'm releasing the full toolkit: prompts, review checklists, automated pipelines, and the workflow that ties them together.

✍ 0xTheBlackPanther 📅 Mar 2026 ⏱ 12 min read 🏷 AI, Security, Tooling, Open Source

The Honest Starting Point

Let me be direct about something: AI doesn't find bugs for you. I learned this the hard way. The first time I pasted an entire contract into Claude and said "find bugs," I got back a polished list of ten findings — eight of which were hallucinated, one was a duplicate of a known issue, and one was a real Low. That's a 10% hit rate, and most of the misses would have gotten me laughed out of a triage room.

But I kept going. Not because AI was good at auditing — it wasn't — but because I noticed something: when I pointed it at the right things, when I gave it enough context, when I structured the interaction as a pipeline instead of a single question, the output went from noise to genuinely useful. It started catching things I'd miss during hour 14 of a review. It started verifying my hunches faster than I could trace call paths manually. And it started producing structured reports that saved me hours of write-up time.

Over months of bug bounties, private audits, and audit contests — across Solidity, Move on Aptos, and ZK circuits — I refined every prompt, built checklists from real findings, and eventually automated the entire pipeline into a tool that scales from a 500-line contract to a 30,000-line codebase.

Today I'm open-sourcing all of it: web3-sec-ai-prompts.

What's in the Repo

The repo isn't a random dump of prompts. It's a structured system designed around how security research actually works — with different workflows for different engagement types, shared tools that apply everywhere, and an automated pipeline that ties them together.

Repository structure bug-bounty/ — Target selection, recon, hunting (critical/high only), report writing
private-audits/ — Full-spectrum audit guide, threat modeling, code review, finding classification
contests/ — Contest strategy, time management, report templates
zk-audits/ — ZK circuit hunting guide — soundness, completeness, privacy checks
common/ — Shared review checklist, severity framework, multi-expert review, Move/Solidity patterns, protocol detection, grep patterns
claude-skill/ — Panther Audit — automated 4-phase pipeline with chunk mode for large codebases

The common/ directory is the backbone. Everything else references it. The review checklist, the verifier, the severity framework — these are the shared tools that make every engagement type sharper. Let me walk through what I built and why.

The Core Ideas That Actually Work

1. The Custom Primer — Your Eyes First, AI Second

This is the single most important file in the repo, and it's the one most people will skip. The custom primer guide teaches you to read the codebase yourself first, write down everything that smells off, and then feed those observations to the AI.

No prompt can replace your intuition as a researcher. What a prompt can do is take your vague "something feels wrong about this fee calculation" and trace it across 15 files in seconds. The primer bridges the gap between your instincts and the AI's ability to exhaustively cross-reference. I've found more bugs from primer entries that said "this division looks suspicious" than from any generic "find all vulnerabilities" prompt.

How it works: You read the code top to bottom. Every time something catches your eye — a weird constant, a missing check, a function that's public when everything else is public(package), a comment that says "TODO" — you write it down as a primer entry. Then you feed those entries alongside the code to the AI. Instead of searching blindly, the AI now investigates your leads.

2. The Multi-Expert Review — Three Passes, Not One

A single-pass AI review inherits whatever bias the model starts with. If it begins thinking about reentrancy, it sees reentrancy everywhere. The multi-expert review forces three independent passes with different focuses:

Three-pass adversarial review Pass 1 — Systematic Auditor: Methodical checklist-driven review. Map all fund flows, state changes, access control. Follow the 15-section checklist.

Pass 2 — Economic Attacker: Start fresh, ignore Pass 1. Think in flash loans, sandwich attacks, liquidation cascades, share price manipulation. For every finding, tell two stories: innocent user vs attacker.

Pass 3 — Skeptical Triager: Try to DISPROVE every finding from both passes. Check economic rationality, existing protections, unrealistic preconditions. Label each: VALID, QUESTIONABLE, DISMISSED, or OVERCLASSIFIED.

The "two stories" pattern from Pass 2 is something I use constantly now, even outside of AI prompts. If I can't write a concrete attacker story with specific steps, the finding is probably invalid. And Pass 3 — the adversarial triager — is what kills the hallucinated findings before they reach your report.

3. The Review Checklist + Verifier — The Backbone

The review checklist is 15 sections of concrete checks derived from real audits. Constants and immutables, state variables, access control, asymmetry detection, input validation, setters, unchecked return values, arithmetic, storage vs memory, precision mismatches, copy-paste errors, general heuristics, forked protocol checks, and a final "what's NOT listed" section that reminds the AI to think beyond the checklist.

But the checklist alone produces false positives. That's why it's paired with a Verifier that forces adversarial validation of every finding — tracing the full call path, checking economic rationality, verifying that existing protections don't already block it. A finding that survives the verifier has real substance.

The asymmetry check (Section 4) is the highest-yield pattern in the entire checklist. Compare deposit vs withdraw, open vs close, long vs short, maker vs taker. Any check present in one path but missing in the other is a finding. This single pattern has led me to more valid bugs than any other heuristic — and it's one the AI is genuinely good at automating, because it's mechanical comparison across large amounts of code.

4. Language-Specific Pattern Files

Generic prompts miss language-specific bug classes. The repo includes dedicated pattern files for Move and Solidity, derived from real audit data.

The Move patterns file alone is based on 1,141 findings across 200+ audited Move protocols, organized by frequency: business logic bugs (296 findings), input validation (170), calculation errors (148), access control (73), and state management (64). Each entry references the actual protocols where the bug was found — Cetus, Thala Labs, Navi, Bluefin, and dozens more. The top 5 vulnerability classes account for over 70% of all Critical/High findings in Move. Knowing where to look is half the battle.

Move-specific gotcha that Solidity auditors miss: Generic type validation doesn't exist in Solidity. In Move, a function that accepts a generic type parameter <CoinType> must verify that the type matches what the pool or vault expects. If it doesn't, an attacker can pass a worthless token where USDC is expected. This is the single most Move-specific bug class — it's produced critical findings in Econia, Navi, AquaSwap, and Dexlyn.

5. The ZK Hunting Guide

This one took the longest to build. The ZK audit guide is a comprehensive prompt for auditing ZK circuits — Circom, Halo2, Noir, Cairo, or any proof system. Every bug in ZK maps to one of three properties: soundness (can a cheater forge proofs?), completeness (can an honest prover be blocked?), and zero-knowledge (does the proof leak information?).

The guide covers under-constrained circuits (the #1 ZK bug class — approximately 96% of all documented SNARK vulnerabilities), over-constrained circuits, finite field arithmetic traps (where subtraction wraps to massive numbers instead of going negative), Merkle tree verification gaps, and verifier contract integration issues. It also includes DSL-specific checks — like the critical difference between Circom's <-- (assignment only, no constraint) vs <== (assignment + constraint), which was the root cause of bugs in major protocols.

Panther Audit — The Automated Pipeline

All the prompts above work standalone. But the real power comes from chaining them in sequence. That's what Panther Audit does — it's a skill/plugin for Cursor, Claude Code, or Claude.ai that automates the entire pipeline.

    // Panther Audit — two modes, auto-detected by codebase size

    // STANDARD MODE (≤ 5000 NSLOC)

    Phase 1: CONTEXT    → Detect chain, map architecture, identify attack surface

    Phase 2: REVIEW     → Two independent expert passes (systematic + economic)

    Phase 3: TRIAGE     → Adversarial validation — try to disprove every finding

    Phase 4: REPORT     → Structured findings with severity scores + PoC direction

    // CHUNK MODE (> 5000 NSLOC) — for large codebases

    Phase 0: SCAN & PLAN       → Map files, group into modules, create audit plan

    Phase 1: CONTEXT           → Protocol detection (once for entire codebase)

    Phase 2: MODULE AUDIT LOOP → Review → triage → deduplicate → persist state

    Phase 3: CROSS-MODULE      → Find issues spanning multiple modules

    Phase 4: FINAL AGGREGATION → Merge, validate, generate consolidated report

Chunk mode was born from frustration. Large codebases — 15k, 20k, 30k lines — blow past any AI's context window. The solution: automatically split the codebase into logical modules (grouped by name root, imports, and directory), audit each module independently, persist findings to a JSON state file after each module, deduplicate across modules, then do a final cross-module pass to catch issues that only appear when modules interact.

The state file (audit_state.json) means you can stop and resume. Context window full? Close the conversation, open a new one, and the pipeline picks up from the last completed module. This was essential for auditing large protocols like the one I recently worked on — a fully on-chain perp DEX with 30+ Move modules.

What I Learned the Hard Way

Every design decision in this repo came from a mistake. Here are the ones worth sharing:

Never feed the entire codebase at once. The AI loses focus and produces shallow, generic findings. Go contract by contract. For large codebases, that's what chunk mode is for — it enforces this discipline automatically.
Always verify AI output. I can't stress this enough. The AI will hallucinate findings that look convincing — complete with line numbers and exploit scenarios — that are completely wrong. The verifier/triager phase exists specifically to catch this. Treat every finding as "guilty until proven valid."
The best prompts are pipelines, not questions. "Find bugs in this contract" is a terrible prompt. "Build a protocol profile, then run a systematic review, then run an economic attack review, then adversarially validate every finding" is a pipeline. Each stage's output feeds the next. The quality difference is massive.
Your observations are the most valuable input. A primer entry that says "this public fun is the only one in a module where everything else is public(package)" is worth more than any generic prompt. You've already done the hard part — pattern recognition from reading real code. The AI just needs you to point it in the right direction.
Use the best model available. For deep code analysis, use Claude, GPT-5.2+, or equivalent frontier models. Lightweight models hallucinate more, miss nuance, and produce lower-confidence findings. Security is not the place to save on model costs.
A PoC or it didn't happen. If you can't prove a finding with a Foundry test, a Move test, or a concrete transaction sequence, it's probably not valid. The AI can draft PoC skeletons, but you must verify them yourself. Reports without proof get rejected.

How to Use It

The repo works with any AI tool — Claude, ChatGPT, Cursor, Claude Code, or anything else with file access. The prompts are generic and not tied to any specific model.

    // Three ways to use the repo

    // 1. Standalone prompts — copy + paste into any AI chat

    Copy bug-bounty/hunting-guide.md → paste into Claude/ChatGPT

    Attach common/review-checklist.md alongside

    Paste your contract code → run

    // 2. Cursor / Claude Code — file access, automatic skill detection

    Clone repo → install skill per README instructions

    Open any project → "audit src/Vault.sol"

    Pipeline runs automatically (4 phases)

    // 3. Claude.ai Projects — upload to project knowledge

    Upload SKILL.md + common/*.md to project knowledge

    Paste contract code in chat → pipeline runs

The prompts are intentionally generic — they're not tied to any protocol category. They work on lending protocols, DEXes, bridges, vaults, perp exchanges, or anything else. But the best results come when you append your own custom heuristics to the review checklist. The generic patterns are the starting point; your domain knowledge is what makes them sharp.

Why Open Source This?

Honestly? Because I wish someone had shared this when I started. I spent months iterating on prompts that produced garbage, debugging pipelines that hallucinated findings, and learning the hard way that AI without structure is worse than no AI at all. Every researcher who adopts AI for auditing is going to hit the same walls. This repo shortens that learning curve.

There's also a practical angle: better tools produce better security outcomes. If more researchers use structured AI workflows, more bugs get found, and more protocols ship safely. The alpha isn't in the prompts themselves — it's in your ability to read code, build intuition, and know which leads to pursue. The prompts just make the pursuit faster.

AI is a force multiplier, not a replacement. These prompts accelerate your workflow — they don't replace your expertise. The AI will hallucinate, miss context, and get severity wrong. You are the final reviewer. Your brain is still the primary tool. Manual review is non-negotiable. Use these prompts to augment your process, catch things you might miss, and structure your thinking.

What's Next

The repo is a living project. As I run more audits, find new patterns, and refine the pipeline, I'll keep pushing updates. Contributions are welcome — if you've built custom heuristics, language-specific patterns, or workflow improvements, the contributing guide has everything you need.

If you're a security researcher who hasn't tried AI in your workflow yet, start small. Pick one contract you've already audited. Run the review checklist on it. See if the AI catches something you missed — or if it validates what you found. That first "huh, I didn't see that" moment is what hooked me.

And if you're already using AI but getting noisy output, the answer is almost always: more structure, more context, smaller scope. One contract at a time. Pipeline, not question. Your observations first, AI second.

The best audit tool is still your brain. AI just gives it more hours in the day. 🛡️

Repo: github.com/pantheraudits/web3-sec-ai-prompts
Contents: 25+ prompts across 6 directories
Covers: Bug bounties, private audits, contests, ZK circuits
Languages: Solidity, Move (Aptos/Sui), Circom, Halo2, Noir, Cairo
Includes: Panther Audit — automated 4-phase pipeline with chunk mode
License: MIT — use it, fork it, improve it
Follow: @thepantherplus