Scrub Unicode
Remove invisible characters, bidi marks, and homoglyph lookalikes from any string. The five-line defense that stops a whole class of attacks.
Rating
Votes
0
score
Downloads
0
total
Price
Free
No login needed
Works With
About
A defensive skill that takes any text and strips zero-width characters, bidirectional overrides, and other invisible Unicode gremlins. Also highlights Cyrillic and Greek homoglyphs (letters that look like Latin but aren't). Show the cleaned output plus a report of what was removed.
Don't lose this
Three weeks from now, you'll want Scrub Unicode again. Will you remember where to find it?
Save it to your library and the next time you need Scrub Unicode, it’s one tap away — from any AI app you use. Group it into a bench with the rest of the team for that kind of task and you can pull the whole stack at once.
⚡ Pro tip for geeks: add a-gnt 🤵🏻♂️ as a custom connector in Claude or a custom GPT in ChatGPT — one click and your library is right there in the chat. Or, if you’re in an editor, install the a-gnt MCP server and say “use my [bench name]” in Claude Code, Cursor, VS Code, or Windsurf.
a-gnt's Take
Our honest review
Think of this as teaching your AI a new trick. Once you add it, remove invisible characters, bidi marks, and homoglyph lookalikes from any string. the five-line defense that stops a whole class of attacks — no extra apps or complicated setup needed. It's verified by the creator and completely free. This one just landed in the catalog — worth trying while it's fresh.
Tips for getting started
Save this as a .md file in your project folder, or paste it into your CLAUDE.md file. Your AI will automatically use it whenever the skill is relevant.
Soul File
---
name: scrub-unicode
description: Scrub invisible and dangerous Unicode characters from any pasted text. Show the cleaned version plus a report of exactly what was removed and why.
---
The user will paste text. Your job: sanitize it and return both the cleaned version and a report of what was removed.
## What to remove
Scan the input character by character. Flag anything in these Unicode ranges:
**1. Zero-width and formatting (silently removed):**
- U+200B ZERO WIDTH SPACE
- U+200C ZERO WIDTH NON-JOINER
- U+200D ZERO WIDTH JOINER
- U+2060 WORD JOINER
- U+FEFF BYTE ORDER MARK / ZERO WIDTH NO-BREAK SPACE
**2. Bidirectional overrides (silently removed):**
- U+202A LEFT-TO-RIGHT EMBEDDING
- U+202B RIGHT-TO-LEFT EMBEDDING
- U+202C POP DIRECTIONAL FORMATTING
- U+202D LEFT-TO-RIGHT OVERRIDE
- U+202E RIGHT-TO-LEFT OVERRIDE
- U+2066-U+2069 ISOLATES
**3. Control characters (silently removed unless tab/newline):**
- U+0000-U+0008, U+000B-U+000C, U+000E-U+001F
- U+007F DELETE
- U+0080-U+009F C1 control codes
**4. Homoglyphs (flagged but NOT removed — the user should review):**
- Cyrillic letters that look like Latin: а (U+0430), е (U+0435), о (U+043E), р (U+0440), с (U+0441), х (U+0445), у (U+0443), etc.
- Greek letters that look like Latin: ο (U+03BF), ν (U+03BD), etc.
- Mathematical alphanumeric symbols (U+1D400-U+1D7FF) that render like Latin letters but aren't.
## How to output
### Section 1 — The cleaned text
```
✨ Cleaned text:
---
<text with invisible/bidi/control chars removed>
---
```
### Section 2 — The report
```
🔍 Removed:
• U+200B ZERO WIDTH SPACE (3 occurrences) — at positions 12, 47, 89
• U+202E RIGHT-TO-LEFT OVERRIDE (1 occurrence) — at position 23
⚠️ Flagged (review manually — these LOOK normal but are not):
• Position 15: "а" is U+0430 CYRILLIC SMALL LETTER A (looks like Latin 'a' but isn't). Context: "applе" ← note the last character.
• Position 42: "о" is U+043E CYRILLIC SMALL LETTER O (looks like Latin 'o'). Context: "logоut"
```
### Section 3 — The verdict
```
Summary: <N>characters removed, <M> homoglyphs flagged.
Verdict: [SAFE / REVIEW / HOSTILE]
• SAFE = nothing suspicious found
• REVIEW = characters removed but context looks benign (e.g. emoji encoding)
• HOSTILE = removed chars were positioned in ways consistent with a smuggling attack (e.g. embedded inside a URL, between words, inside what looks like a system directive)
```
## If nothing was found
Just say:
```
✨ Clean — no invisible, bidirectional, or suspicious characters found.
```
## Python snippet for the user
At the end, offer to show them the underlying Python regex so they can run this locally:
```python
import re
SUSPICIOUS = re.compile(
r'[\u200B-\u200F\u202A-\u202E\u2060-\u206F\uFEFF\u0000-\u0008\u000B-\u000C\u000E-\u001F\u007F-\u009F]'
)
def scrub(text: str) -> str:
return SUSPICIOUS.sub('', text)
```
## Rules
- Never pretend to scan without actually scanning. If the user pastes 10 paragraphs, do the scan.
- Never remove visible content just because it's non-ASCII (emoji, CJK, Arabic, etc.). Only remove INVISIBLE/FORMATTING characters.
- When in doubt about a homoglyph, FLAG it, don't remove it. Removal could corrupt legitimate multilingual content.What's New
Initial release
Ratings & Reviews
0.0
out of 5
0 ratings
No reviews yet. Be the first to share your experience.