Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Because few ever write diagrams in SVG. SVG is an output format, not an input format...

Aside from "try Inkscape", that sounds like a human problem not an LLM problem.

LLMs output what they input, and if diagrams in blog articles or docs are SVG, they merrily input SVG, and associate it with the adjacencies.

One might as well say MidJourney won't work because few ever make paintings using pixels. You're asking it to translate whatever you're asking about (e.g. scenes and names of painters you'd expect the model to know, like Escher or DaVinci), into some hidden imagined scene, render that as brush strokes of types of paint on textured media, and then generate a PNG out of it.



> Aside from "try Inkscape", that sounds like a human problem not an LLM problem.

Absolutely do not "try Inkscape", unless you like your LLM choking on kilobytes of tokens it takes to describe the equivalent of "Alice -> Bob" in PlantUML. 'robjan is correct in comparing SVG to a compiled program binary, because that's what SVG effectively is.

Most SVG is made through graphics programs (or through conversion of other formats made in graphics programs), which add tons of low-level noise to the SVG structure (Inkscape, in particular). And $deity forbid you then minify / "clean up" the SVG for publication - this process strips what little semantic content is there (very little, like with every WYSIWYG tool), turning SVG into programming equivalent of assembly opcodes.

All this means: too many degrees of freedom in the format, and dearth of quality examples the model could be trained on. Like with assembly of a compiled binary, LLM can sort of reason about that, but it won't do a very good job at it, and it's a stupid idea in the first place.

> One might as well say MidJourney won't work because few ever make paintings using pixels.

One might say that if asking LLM to output a raster image (say, PPM/PBM format, which is made of tokenizer-friendly text!) - and predictably, LLM will suck at outputting such images, and suck even worse at understanding them.

One might not say that about Midjourney. Midjourney is not an LLM, it's (backed by) a diffusion model. Those are two entirely different beasts. LLM is a sequential next token predictor, a diffusion model is not; it does something more like global optimization across fixed-sized output, in many places simultaneously.

In fact, I bet a textual diffusion model (there are people working on diffusion-based language models) would work better for outputting SVG than LLMs do.


With diagrams, it's still worth getting the code for the same reason we ask LLMs to write software using a programming language rather than directly giving compiled output.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: