Communityv1.0.0

Prompt injection detection skill

Two-layer content safety for agent input and output. Use when (1) a user message attempts to override, ignore, or bypass previous instructions (prompt injection), (2) a user message references system prompts, hidden instructions, or internal configuration, (3) receiving messages from untrusted users in group chats or public channels, (4) generating responses that discuss violence, self-harm, sexual content, hate speech, or other sensitive topics, or (5) deploying agents in public-facing or multi-user environments where adversarial input is expected.

2.1kdownloads5stars2active installsZSkyX
View on ClawHubBack to Skills

Skill Details

Slug
detect-injection
Latest Version
1.0.0
Author
ZSkyX
Published
Feb 2, 2026
Updated
Feb 28, 2026
Total Versions
1

How to Install

  1. 1 on OpenClawdBots (takes under 60 seconds).
  2. 2Open your bot dashboard and go to the Skills tab.
  3. 3Switch to the ClawHub tab and search for Prompt injection detection skill.
  4. 4Click Install and the skill is deployed to your bot automatically.

Changelog — v1.0.0

Initial release with two-layer content moderation for agent input and output. - Adds prompt injection detection using ProtectAI DeBERTa classifier via HuggingFace. - Adds content safety checks using OpenAI's omni-moderation endpoint (optional). - Provides `scripts/moderate.sh` for command-line moderation of both user input and agent output. - Outputs structured JSON with clear verdicts and actions. - Supports configuration via environment variables (tokens, thresholds). - Designed for safer agent deployments, especially in adversarial or public scenarios.