Skip to the content.

LazyAGI/LazyLLM — security scan

Repository: LazyAGI/LazyLLM — 3.8k★, Apache-2.0, a multi-agent LLM application framework backed by SenseTime (top-committer emails on sensetime.com) with a distributed deploy-relay server, fine-tuning components, RAG tooling, and an HPC launcher layer. Commit scanned: b11fa4c12b1b (HEAD of main at scan time) Scan date: 2026-06-06 Disclosure status: Post-only public + private email to maintainer. No SECURITY.md or PVR is published, but the project is corporate-backed (SenseTime); the two highest-severity items surfaced by curation warranted private disclosure rather than a public courtesy issue. Disclosure email sent to wangzhihong@sensetime.com (top-committer corporate address) covering the two severe items. This post discusses the broader patterns and the items that can be discussed publicly without enabling exploitation.

Summary

Severity Count
Critical 0
High 73
Medium 48
Low 0
Info 0 (filtered)

121 total findings. After curation: two items reported privately to the maintainer; a series-record **16-site pull_request_target workflow cluster in a single workflow file; a six-CVE dependency tail dominated by Gradio (×3) and DeepSpeed (RCE class); the recurring SQL-identifier and agent-shell-tool classes; and a classic eval()-based calculator agent tool pattern. The 17 gitleaks generic-api-key hits are entirely in test fixtures (a file literally named test_validate_api_key.py and Feishu-URL test data).**

Top findings (curated)

1. .github/workflows/main.yml — 16-site pull_request_target + checkout PR head cluster

Tool: Semgrep (yaml.github-actions.security.pull-request-target-code-checkout) Verdict: Real — series record for this class. Every job in main.yml (16 sites: lines 36, 74, 116, 177, 232, 282, 329, 408, 462, 500, 548, 593, 673, 722, 803, 870) checks out the PR head SHA under pull_request_target.

The repeating pattern at every job:

- name: Checkout code
  uses: actions/checkout@v4
  with:
    ref: $
    fetch-depth: 2

pull_request_target is the privileged variant of the PR trigger — it runs with the base repo’s GITHUB_TOKEN write scopes and (depending on the workflow) access to repository secrets. Combining it with actions/checkout@v4 at ref: $ deliberately checks out the attacker-controlled PR head into the privileged context. Any job step that runs code from the checked-out tree (make lint-only-diff, make doccheck, the test runners, etc.) executes attacker code with the privileged token.

The same class has appeared on airweave (single-site, since fixed) and on Giskard (clean teardown of how to use pull_request_target correctly). 16 sites in one workflow is the series record by an order of magnitude.

Architectural fix shape (per the Giskard teardown): split into two workflows. One uses pull_request_target for trusted, code-free steps (label management, comments, secret-gated reporting); the other uses plain pull_request for code-running steps (lint, tests, doc checks) without secrets. The pull_request_target jobs never check out the PR head.

2. requirements.txt — Gradio 5.49.1 carries three named CVEs

Tool: Trivy Verdict: Real — single pin bump clears all three.

CVE Class
Path Traversal (absolute path, Windows) High
Server-Side Request Forgery (SSRF, internal access) High
Open Redirect Medium

Plus the alpaca-LoRA fine-tuning component’s requirements.txt carries DeepSpeed Remote Code Execution Vulnerability and a sentencepiece invalid-memory-access advisory. Two lxml_html_clean advisories round out the dep tail. No Dependabot is configured on the repo, consistent with the pattern across the recent series (MemoryBear, agency-swarm, ouroboros).

3. lazyllm/tools/tools/calculator.py:9Calculator agent tool uses eval() over LLM-controlled expression

from math import *  # noqa. import math functions for expressions

class Calculator(ModuleBase):
    def __init__(self):
        super().__init__()

    def forward(self, exp: str, *args, **kwargs):
        return eval(exp)

Tool: Semgrep (eval-detected) Verdict: Real — the classic agent-tool eval-sandbox-escape pattern.

The Calculator tool is registered as something an agent can call (forward(exp) is the tool-call entrypoint). exp comes from the LLM. eval(exp) with from math import * in scope gives the LLM full Python access — calling __import__('os').system('...'), reading the file system, exfiltrating env vars, etc. The “calc tool” → “RCE” pivot is documented in every LLM-agent-security overview from the last two years.

The safer shapes are well-known:

4. 25× SQL identifier interpolation (text(f"…") / formatted-SQL / asyncpg-sqli) — the recurring class

Tool: Semgrep (avoid-sqlalchemy-text + sqlalchemy-execute-raw-query + formatted-sql-query) Verdict: Same class as on nine prior scans — gated by configuration-controlled identifiers today, brittle to future input-source changes.

The pattern keeps repeating: identifiers (table names, collection names) come from validated config, no real injection vector with the current input sources, but the defensible long-term shape is SQLAlchemy’s quoted_name() / Identifier(). Cross-scan link discipline applies: Upsonic, PraisonAI, airweave, honcho, dstack, pixeltable, semantic-router, ReMe, and now LazyLLM.

5. lazyllm/tools/agent/shell_tool.py:74 + 3× HPC launcher shell=True — the recurring by-design class

Verdict: By-design.

shell_tool.py:74 is the agent’s shell tool — same pattern as fast-agent’s interactive_shell.py, agency-swarm’s PersistentShellTool.py, ReMe’s tools/shell.py. The agent operator opts in; the LLM controls the command on purpose.

The 3× subprocess shell=True in lazyllm/launcher/{base,sco,slurm}.py are HPC job launchers (SLURM / SenseTime SCO cluster). They build cluster-job command strings programmatically and shell them out. By-design for the launcher use case.

6-N. Standard noise / by-design

Finding Files Verdict
17× gitleaks generic-api-key tests/charge_tests/Models/test_validate_api_key.py (11×), tests/basic_tests/Tools/test_feishu_fs_url.py (6×) FP — the first file is literally test_validate_api_key (test fixtures for an API-key validator), the second is Feishu (Lark) test URLs. Both are textbook curation-only-knows fixture-FP shape.
17× pickle.load/dump Across module/module.py, module/servermodule.py, tools/rag/migrate_collections.py, components/finetune/easy_r1/model_merger.py, etc. Mixed — most are local-file model serialization (by-design); one specific call site sits behind an HTTP endpoint and was disclosed privately.
13× non-literal-import Plugin / module-loader discovery By-design
pickles-in-pytorch PyTorch model checkpoint loads By-design — standard PyTorch ckpt path
run-shell-injection .github/actions/{load_cache,run_tests}/action.yml, .github/workflows/publish_release.yml ×2 Real best-practice — the recurring $-into-run: class. Standard env:-indirection fix.
1× SSRF in lazyllm/tools/train_service/serve.py:752 requests.get(data_path) where data_path = job.training_dataset[0].dataset_download_uri Real if the train-job submission is exposed to untrusted users; gated to operator-submitted job configs today.
dangerous-globals-use Dispatch / plugin patterns By-design
dynamic-urllib, 2× insecure-hash URL-builder / non-crypto digest patterns Typically the safe case

Patterns observed

A 16-site pull_request_target cluster is the cleanest single-finding example yet for “this is one architectural pattern, not 16 separate things.” Across the series the worst-case raw count for any one class on one repo had been the SQL-identifier cluster at 139 sites in three files on ReMe. The workflow cluster here is concentrated even more — all 16 are in one file, every job uses the same dangerous shape. A two-workflow split fixes all 16 at once. The class itself isn’t novel (we’ve covered it three times already) but the concentration is.

Corporate backing changes the disclosure calculus more than a published SECURITY.md does. LazyLLM has no SECURITY.md and no PVR enabled; the strict-norm heuristic (presence of SECURITY.md) would have routed today’s curation toward a single public courtesy issue. The dispositive signal turned out to be the top-committer email (@sensetime.com) — a clear corporate disclosure target — together with the severity of two specific findings. The methodology lesson worth adding: severity overrides the heuristic when no formal channel exists. Two findings warranted private disclosure regardless of whether the repo had advertised a channel; the existence of a corporate address made it concretely possible.

The “calculator tool that evals” finding is a teaching moment. Among the LLM-agent-security articles that get cited most often, the example pattern is exactly def forward(exp): return eval(exp). That pattern shipping in a 3.8k-star framework as a registered agent tool is what people mean when they say “the well-known classes show up in real codebases continuously.” Filed in the public post because the pattern is widely-documented; the actionable shape (simpleeval / ast.literal_eval) is one line of code per call site.

Notes on the tool

Disclosure timeline

Reproduce

git clone https://github.com/elfrost/ai-patchlab
cd ai-patchlab
pip install -e ".[dev]"
python scanner/run_scan.py \
  --from-git-url "https://github.com/LazyAGI/LazyLLM" \
  --reports-dir reports/lazyagi-lazyllm \
  --min-severity medium \
  --ignore-samples

External tools (Semgrep, Gitleaks, Trivy, pip-audit) need to be installed separately — see the project README.