The deep scan engines: Opengrep, picklescan, Gitleaks, and osv-scanner

Beyond the heuristic scanner layers, ai-supply.store runs four purpose-built security engines on uploaded artifacts. These are the same tools security engineers use in professional AppSec pipelines — and they run on every submission, for free.

Opengrep — AST analysis and taint tracking

Opengrep is an open-source fork of Semgrep that performs abstract syntax tree (AST) analysis rather than simple regex matching. This means it understands the structure of code, not just its text.

What it catches:

Taint flows: user input → SQL query → database call (SQL injection)
Taint flows: user input → shell command (command injection)
Taint flows: external data → eval() (code injection)
Insecure cryptographic primitive usage
Path traversal patterns (../../)

Example finding:

[HIGH] taint-sink: user_input flows to subprocess.call() at server.py:42
Rule: python.lang.security.dangerous-subprocess-use-audit

How to pass cleanly: Validate and allowlist all inputs before they reach dangerous sinks. Prefer library APIs over shell invocations. Opengrep's rulesets are public — you can run it locally before uploading:

npx opengrep scan --config auto .

picklescan — model malware detection

picklescan is a purpose-built scanner for detecting malicious pickle files. Pickle is Python's default serialisation format and it is fundamentally unsafe: loading a pickle file is equivalent to executing the code it contains.

What it catches:

REDUCE opcodes calling dangerous globals (os.system, subprocess.Popen, eval)
Stack manipulation that constructs callable objects at load time
Multi-stage payloads that obfuscate execution via opcode sequencing

Example finding:

Pickle REDUCE opcode with dangerous callable: os.system
File: model_weights.pkl — MALICIOUS

How to pass cleanly: Use safetensors instead of pickle. If you must use pickle (e.g., for scikit-learn pipelines), restrict callables to known-safe classes with a custom Unpickler and document this clearly. The scanner knows the difference between a torch.save() of tensor weights and a payload disguised as one.

A safe listing to check out: LLM Guard, which uses safetensors throughout.

Gitleaks — deep secrets scanning

Gitleaks scans every file in the artifact tree for high-entropy strings matching credential patterns. It uses a rule library covering 150+ secret types:

OpenAI, Anthropic, Cohere, Hugging Face API keys
AWS/GCP/Azure access keys
GitHub/GitLab personal access tokens
Stripe, Twilio, Sendgrid keys
Generic high-entropy strings (base64, hex)
Private key PEM blocks

It scans everything: source files, test files, config files, comments, .env.example, Jupyter notebooks, inline JSON.

Example finding:

RuleID: openai-api-key
File: src/config.ts:14
Secret: sk-proj-XXXXXXXXXXXXXXXXXXXXXXXXXX
Commit: [embedded in artifact]

How to pass cleanly:

# Run locally before uploading:
gitleaks detect --source . --no-git

Rotate any real credentials that were accidentally committed. Replace with env var references. Gitleaks won't flag OPENAI_API_KEY=your_key_here (placeholder) but will flag any string that looks like a real credential.

osv-scanner — dependency CVE lookup

osv-scanner by Google queries the Open Source Vulnerability (OSV) database — the authoritative cross-ecosystem CVE registry covering npm, PyPI, Go, Maven, Cargo, and more.

What it scans:

package.json + package-lock.json / yarn.lock
requirements.txt / pyproject.toml / poetry.lock
go.mod / go.sum
Cargo.toml / Cargo.lock

Severity mapping to listing level:

Severity	Effect
CRITICAL (CVSS ≥ 9.0)	Pushes toward QUARANTINE
HIGH (CVSS 7.0–8.9)	Pushes toward REVIEW
MEDIUM / LOW	Score deduction only

How to pass cleanly:

# Install
go install github.com/google/osv-scanner/cmd/osv-scanner@latest

# Scan your project
osv-scanner scan --recursive .

Fix findings with npm audit fix (Node) or pip-audit --fix (Python). Pin exact versions — floating ranges (^1.2.0) allow silent upgrades that introduce new CVEs after your listing is live.

Running all four locally before upload

# In your project directory:
npx opengrep scan --config auto .
gitleaks detect --source . --no-git
osv-scanner scan --recursive .
picklescan -r .  # if you have .pkl files

If all four pass locally, your upload almost certainly passes the platform scanner too. For the full nine-layer breakdown, see the nine-layer scanner: a deep dive.

And remember — all of this runs for free on every artifact you upload to ai-supply.store.

The deep scan engines: Opengrep, picklescan, Gitleaks, and osv-scanner

The deep scan engines: Opengrep, picklescan, Gitleaks, and osv-scanner

Opengrep — AST analysis and taint tracking

picklescan — model malware detection

Gitleaks — deep secrets scanning

osv-scanner — dependency CVE lookup

Running all four locally before upload

Comments