"Here's the thing - this technique completely breaks traditional code review. You can't spot what you can't see. GitHub's diff view? Shows nothing suspicious. Your IDE's syntax highlighting? All clear. Manual code inspection? Everything looks normal.
The invisible code technique isn't just clever - it's a fundamental break in our security model. We've built entire systems around the assumption that humans can review code. GlassWorm just proved that assumption wrong."
Yeah the whole article is awful to read. Everything the LLM added is completely useless fluff, sometimes misleading, and always painful to get through.
that screenshot looks suspicious as hell, and my editor (Emacs) has a whitespace mode that shows unprintable characters sooooo
if GitHub's diff view displays unprintable characters like this that seems like a problem with GitHub lol
"it isn't just X it's Y" fuck me, man. get this slop off the front page. if there's something useful in it, someone can write a blog post about it. by hand.
Why not just indicate non-printable characters in code review tools? I've always wondered that, regardless of security implications. They are super rare in real code (except line breaks and tabs maybe), so no disruption in most cases.
Also, as notes in other comments, you can't do shady stuff purely with invisible code.
Because spaces, tabs, CR and LF are invisible too yet perfectly normal to find within code. You could very easily implement a decode() function that uses only those characters.
But to get any meaningful result, you'd need to insert them in unusual ways or amounts, likely breaking formatting rules. Trailing whitespace or excessive line breaks should be caught by linting tools and/or code review.
So, they have a custom decode function that extracts info from unprinted characters which they then pass to `eval`. This article is trying to make this seem way fancier than it is. Maybe GitHub or `git diff` don't give a sense of how many bits of info are in the unicode string, but the far scarier bit of code is the `eval(atob(decodedString))` at the bottom. If your security practices don't flag that, either at code review, lint, or runtime then you're in trouble.
Not to say that you can't make innocuous looking code into a moral equivalent of eval, but giving this a fancy name like Glassworm doesn't seem warranted on that basis.
Yeah, doing eval(extract_and_decode(file)) is marginally sneakier than eval(fetch_from_internet()) , but it's not so far as being some sort of, er... "mirror life" biology.
Is there a linter written in Rust or such that I can throw in any project to scan it for unexpected Unicode? It would help for the linter to support a config file.
This is an old-man rant, but the first time I saw Unicode I felt like I was looking at a train wreck coming from a long way off. It has too many edge cases, footguns and unintuitive artifacts like this. I wish we constrained its use to only where required. Text was so much easier to reason about and safer to manipulate in the ASCII days.
Using non-printable characters to encode malicious code is creative, but I wouldn't say it "breaks our security model".
I would be pretty suspicious if I saw a large string of non-printable text wrapped in a decode() function during code review... Hard to find a legitimate use for encoding things like this.
Also another commenter[1] said there's an eval of the decoded string further down the file, and that's definitely not invisible.
Has no one thought to review the AI slop before publishing?
There's no self-propagation happening, that's just the terrible article's breathless hyping of how devastating the attack is. It's plain old deliberately injected and launched malware. OpenVSX is a huge vector for malicious actors taking real Marketplace extensions, injecting a payload, and uploading them. The article lists exactly one affected Marketplace extension, but that extension does not exist.
> Has no one thought to review the AI slop before publishing?
If only Koi reviewed their AI slop before publishing :(
Cool write-up. Seems pretty unintuitive to me that Unicode would allow someone to serialize normal code as invisible characters and that something like an IDE or a git diff has never been hardened against that at all.
In my mind it's one thing to let a string control whitespace a bit versus having the ability to write any string in a non-renderable format. Can anyone point me to some more information about why this capability even exists?
> Seems pretty unintuitive to me that Unicode would allow someone to serialize normal code as invisible characters
If you have a text encoding with two invisible characters, you can trivially encode anything that you could represent in a digital computer in it, in binary, by treating one as a zero and the other as a one. More invisible characters and some opinionated assumptions about what you are allows denser representation than one bit per character.
Of course, the trick in any case is you have to also slip in the call to decode and execute the invisible code, and unless you have a very unusual language, that’s going to be very visible.
That should tell you (everyone) how much these companies actually care about our security the next time they claim to be stripping away our freedoms "for our security".
I was always afraid of browser extensions and now I'm also afraid of IDE extensions. Recently came across SecureAnnex[0] and it looks promising to get some control over it.
> Let me say that again: the malware is invisible. Not obfuscated. Not hidden in a minified file. Actually invisible to the human eye.
I stopped reading at this point. This is not only false, but yet another strong reason to lint out the silly nonsense people argued for on here years ago. No emoji, no ligatures, etc.
The invisible code technique isn't just clever - it's a fundamental break in our security model. We've built entire systems around the assumption that humans can review code. GlassWorm just proved that assumption wrong."
This is pure Claude talk.