After months of using AI to accelerate my smart contract audits — and making every possible mistake along the way — I'm releasing the full toolkit: prompts, review checklists, automated pipelines, and the workflow that ties them together.
Let me be direct about something: AI doesn't find bugs for you. I learned this the hard way. The first time I pasted an entire contract into Claude and said "find bugs," I got back a polished list of ten findings — eight of which were hallucinated, one was a duplicate of a known issue, and one was a real Low. That's a 10% hit rate, and most of the misses would have gotten me laughed out of a triage room.
But I kept going. Not because AI was good at auditing — it wasn't — but because I noticed something: when I pointed it at the right things, when I gave it enough context, when I structured the interaction as a pipeline instead of a single question, the output went from noise to genuinely useful. It started catching things I'd miss during hour 14 of a review. It started verifying my hunches faster than I could trace call paths manually. And it started producing structured reports that saved me hours of write-up time.
Over months of bug bounties, private audits, and audit contests — across Solidity, Move on Aptos, and ZK circuits — I refined every prompt, built checklists from real findings, and eventually automated the entire pipeline into a tool that scales from a 500-line contract to a 30,000-line codebase.
Today I'm open-sourcing all of it: web3-sec-ai-prompts.
The repo isn't a random dump of prompts. It's a structured system designed around how security research actually works — with different workflows for different engagement types, shared tools that apply everywhere, and an automated pipeline that ties them together.
The common/ directory is the backbone. Everything else references it. The review checklist, the verifier, the severity framework — these are the shared tools that make every engagement type sharper. Let me walk through what I built and why.
This is the single most important file in the repo, and it's the one most people will skip. The custom primer guide teaches you to read the codebase yourself first, write down everything that smells off, and then feed those observations to the AI.
No prompt can replace your intuition as a researcher. What a prompt can do is take your vague "something feels wrong about this fee calculation" and trace it across 15 files in seconds. The primer bridges the gap between your instincts and the AI's ability to exhaustively cross-reference. I've found more bugs from primer entries that said "this division looks suspicious" than from any generic "find all vulnerabilities" prompt.
How it works: You read the code top to bottom. Every time something catches your eye — a weird constant, a missing check, a function that's public when everything else is public(package), a comment that says "TODO" — you write it down as a primer entry. Then you feed those entries alongside the code to the AI. Instead of searching blindly, the AI now investigates your leads.
A single-pass AI review inherits whatever bias the model starts with. If it begins thinking about reentrancy, it sees reentrancy everywhere. The multi-expert review forces three independent passes with different focuses:
The "two stories" pattern from Pass 2 is something I use constantly now, even outside of AI prompts. If I can't write a concrete attacker story with specific steps, the finding is probably invalid. And Pass 3 — the adversarial triager — is what kills the hallucinated findings before they reach your report.
The review checklist is 15 sections of concrete checks derived from real audits. Constants and immutables, state variables, access control, asymmetry detection, input validation, setters, unchecked return values, arithmetic, storage vs memory, precision mismatches, copy-paste errors, general heuristics, forked protocol checks, and a final "what's NOT listed" section that reminds the AI to think beyond the checklist.
But the checklist alone produces false positives. That's why it's paired with a Verifier that forces adversarial validation of every finding — tracing the full call path, checking economic rationality, verifying that existing protections don't already block it. A finding that survives the verifier has real substance.
The asymmetry check (Section 4) is the highest-yield pattern in the entire checklist. Compare deposit vs withdraw, open vs close, long vs short, maker vs taker. Any check present in one path but missing in the other is a finding. This single pattern has led me to more valid bugs than any other heuristic — and it's one the AI is genuinely good at automating, because it's mechanical comparison across large amounts of code.
Generic prompts miss language-specific bug classes. The repo includes dedicated pattern files for Move and Solidity, derived from real audit data.
The Move patterns file alone is based on 1,141 findings across 200+ audited Move protocols, organized by frequency: business logic bugs (296 findings), input validation (170), calculation errors (148), access control (73), and state management (64). Each entry references the actual protocols where the bug was found — Cetus, Thala Labs, Navi, Bluefin, and dozens more. The top 5 vulnerability classes account for over 70% of all Critical/High findings in Move. Knowing where to look is half the battle.
Move-specific gotcha that Solidity auditors miss: Generic type validation doesn't exist in Solidity. In Move, a function that accepts a generic type parameter <CoinType> must verify that the type matches what the pool or vault expects. If it doesn't, an attacker can pass a worthless token where USDC is expected. This is the single most Move-specific bug class — it's produced critical findings in Econia, Navi, AquaSwap, and Dexlyn.
This one took the longest to build. The ZK audit guide is a comprehensive prompt for auditing ZK circuits — Circom, Halo2, Noir, Cairo, or any proof system. Every bug in ZK maps to one of three properties: soundness (can a cheater forge proofs?), completeness (can an honest prover be blocked?), and zero-knowledge (does the proof leak information?).
The guide covers under-constrained circuits (the #1 ZK bug class — approximately 96% of all documented SNARK vulnerabilities), over-constrained circuits, finite field arithmetic traps (where subtraction wraps to massive numbers instead of going negative), Merkle tree verification gaps, and verifier contract integration issues. It also includes DSL-specific checks — like the critical difference between Circom's <-- (assignment only, no constraint) vs <== (assignment + constraint), which was the root cause of bugs in major protocols.
All the prompts above work standalone. But the real power comes from chaining them in sequence. That's what Panther Audit does — it's a skill/plugin for Cursor, Claude Code, or Claude.ai that automates the entire pipeline.
Chunk mode was born from frustration. Large codebases — 15k, 20k, 30k lines — blow past any AI's context window. The solution: automatically split the codebase into logical modules (grouped by name root, imports, and directory), audit each module independently, persist findings to a JSON state file after each module, deduplicate across modules, then do a final cross-module pass to catch issues that only appear when modules interact.
The state file (audit_state.json) means you can stop and resume. Context window full? Close the conversation, open a new one, and the pipeline picks up from the last completed module. This was essential for auditing large protocols like the one I recently worked on — a fully on-chain perp DEX with 30+ Move modules.
Every design decision in this repo came from a mistake. Here are the ones worth sharing:
public fun is the only one in a module where everything else is public(package)" is worth more than any generic prompt. You've already done the hard part — pattern recognition from reading real code. The AI just needs you to point it in the right direction.The repo works with any AI tool — Claude, ChatGPT, Cursor, Claude Code, or anything else with file access. The prompts are generic and not tied to any specific model.
The prompts are intentionally generic — they're not tied to any protocol category. They work on lending protocols, DEXes, bridges, vaults, perp exchanges, or anything else. But the best results come when you append your own custom heuristics to the review checklist. The generic patterns are the starting point; your domain knowledge is what makes them sharp.
Honestly? Because I wish someone had shared this when I started. I spent months iterating on prompts that produced garbage, debugging pipelines that hallucinated findings, and learning the hard way that AI without structure is worse than no AI at all. Every researcher who adopts AI for auditing is going to hit the same walls. This repo shortens that learning curve.
There's also a practical angle: better tools produce better security outcomes. If more researchers use structured AI workflows, more bugs get found, and more protocols ship safely. The alpha isn't in the prompts themselves — it's in your ability to read code, build intuition, and know which leads to pursue. The prompts just make the pursuit faster.
AI is a force multiplier, not a replacement. These prompts accelerate your workflow — they don't replace your expertise. The AI will hallucinate, miss context, and get severity wrong. You are the final reviewer. Your brain is still the primary tool. Manual review is non-negotiable. Use these prompts to augment your process, catch things you might miss, and structure your thinking.
The repo is a living project. As I run more audits, find new patterns, and refine the pipeline, I'll keep pushing updates. Contributions are welcome — if you've built custom heuristics, language-specific patterns, or workflow improvements, the contributing guide has everything you need.
If you're a security researcher who hasn't tried AI in your workflow yet, start small. Pick one contract you've already audited. Run the review checklist on it. See if the AI catches something you missed — or if it validates what you found. That first "huh, I didn't see that" moment is what hooked me.
And if you're already using AI but getting noisy output, the answer is almost always: more structure, more context, smaller scope. One contract at a time. Pipeline, not question. Your observations first, AI second.
The best audit tool is still your brain. AI just gives it more hours in the day. 🛡️