Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Compared to the olden days, when specs were written by pedantic old Unix dudes

I think that is one of the reasons (among many others) that the semantic web failed (which doesn't contradict the author, whose point is literally the worse-is-better mantra).

People really leaned into the eXtensible part of XML and I think a certain amount of fatigue set it. XSL, XHTML, XSD, WSDL, XSLT, RDF, RSS, et al. just became a bit too much. It was architecture astronautics for data formats when what the world at the time needed was simple interchange formats (and JSON fit the bill).

But I actually believe XML's time has come. I've noticed that XML appears a lot in leaked system prompts from places like Anthropic. LLMs appear to work very well with structured text formats (Markdown and XML specifically).

I believe that MCP is the wrong model, though. I believe we should be "pushing" context to the models rather than giving them directions on how to "pull" the context themselves.



Interesting observation. I've been kinda hyped about XML/XSLT lately because I was working on a JSON macro expansion language (just four different macro tags that replace themselves with the macro's expansion, #= #& #? #! assignment, substitution, conditional branch like cond, and call for custom functions) and realised I was re-inventing XSLT, but what I really wanted was xpath, a way to describe traversal over graphs, back and forth along different axes, it's actually an amazing spec. Then I found basex [0] which pulls arbitrary XML documents into a queryable database that can be have queries described in XPATH or XQUERY.

In my mind the best way to create a reliable natural language interface to a dataset without hallucination would be to hand over XML schemas to the system prompt and have it write the queries to retrieve the data.

[0] https://basex.org/


What I miss is E4X [1]. The ability to grab collections of child elements and collections of descendent elements made working with XML bearable.

1. https://en.wikipedia.org/wiki/ECMAScript_for_XML


> I believe we should be "pushing" context to the models rather than giving them directions on how to "pull" the context themselves.

How could that possibly work for the cases that people want the intern to solve for them? If they knew the information ahead of time, presumably they'd just solve the problem by themselves

I get the impression that the value folks get from MCP is "run some query for me, don't make me learn how to glue together 15 sources"


I'm not sure I understand your objection. You seem to imply that knowing the context is the same as knowing the solution to the problem the context provides?

Let me think of an example here. Context needed to determine if there is cancer in a radiology scan would be the contents of the scan. So there are two modes here, one I say "LLM please tell me if there is cancer in this patients scan" and the LLM makes an MCP call to load the patients report. The second mode is I say "LLM, here is the patients radiology scan, can you tell me if it has signs of cancer".

The first example is what I was calling a "pull" model and the second example is what I am calling a "push" model.


The point above about enterprise glue is why this is a pull model.

In your push model, the onus is on you to go find the scan from one of five backends, traverse whatever hoops of access are needed, and actually handle the files manually.

In the pull model, each backend implements the server once, the LLM gets connected to each one once, and you have one single flow to interact with all of them.


It is interesting that the model I am proposing inverts many peoples expectation of how LLMs will benefit us. In one vision, we give a data-lake of information to LLMs, they tease out the relevant context and then make deductions.

In my view, we hand craft the context and then the LLM makes the deductions.

I guess it will come down to how important crafting the relevant context is for making useful deductions. In my experience with writing code using LLMs, the effectiveness increases when I very carefully select the context and the effectiveness goes down when I let the agent framework (e.g. Cursor) figure out the context. The ideal case is the entire project fits in the context window obviously, but that won't always be possible.

What I've found is that LLMs struggle to ask the right questions. I will often ask the LLM "what other information can I provide you to help solve this problem" and I rarely get a good answer. However, if I know the information that will help it solve the problem and I provide it to the agent then it often does a good job.


> In my view, we hand craft the context and then the LLM makes the deductions.

We (as in users) provide the source material and our questions, the LLM provides the answers. The entire concept of a context is incidental complexity resulting from technical constraints, it's not anything that users should need to care about, and certainly not something they should need to craft themselves.


But it makes a radical difference to the quality of the answer. How is the LLM (or collaboration of LLMs) going to get all the useful context when it’s not told what it is?

(Maybe it’s obvious in how MCP works? I’m only at the stage of occasionally using LLMs to write half a function for me)


In short, that's the job of the software/tooling/LLM to figure out, not the job of the user to specify. The user doesn't know what the context needs to be, if they did and could specify it then they probably don't need an LLM in the first place.

MCP servers are a step in the direction of allowing the LLM to essentially build up its own context, based on the user prompt, by querying third-party services/APIs/etc. for information that's not part of their e.g. training data.


For a lot of office jobs, knowing the context is almost the same as knowing the solution to the problem at hand.


I'd be interested in examples of this. I've worked in offices for all of my adult life and I don't have any examples that come to mind.

I think of logic puzzles I used to do as a kid. The whole idea of the puzzle is that all of the information you need is provided, the fun is in solving using deduction. Sudoku scratches the same itch.

At the least, I would argue there are many problems that don't fit the mold you are suggesting and MCP is not the correct method for addressing them.


> I'd be interested in examples of this. I've worked in offices for all of my adult life and I don't have any examples that come to mind.

Wow, you must have worked in some really mature shops then if you knew instantly which of [Google Drive, Confluence, Airtable, GitHub wiki, ${that one deprecated thing that Alice was using}, ...] contained the reference to Project Frazlebaz mentioned in Slack last.. day? week? Maybe it was today but time is a blur?


I don't know how that is solved by MCP? How would the LLM possibly know where to search? Just making an API (or series of APIs) to slack/jira/airtable available doesn't magically surface the context, or the right search incantation to reveal it. The LLM still has to figure out which tool to search within, what search terms/tools are the right ones to choose, etc. If there are a million documents in your set of data providers and only 1000 fit into the context of the LLM, that filter happens somewhere.

This idea that if you don't know where the data is, magically the LLM will, is very confusing to me.


In my experience, this kind of thing is exactly what LLMs are good at, and fast at.

Here's a real example from my job as a BI dev. I needed to figure out how to get counts of incoming products from an ERP with a 1000+ table database schema in a data lake with 0 actual foreign keys. I sorta knew the data would need to come from the large set of "stock movements" tables, which I didn't know how to join, and I had no idea which rows from that table would be relevant to incoming product or even which fields to look at to even begin to determine that. I simultaneously asked a consultant for the ERP how to do it and asked Cursor a very basic "add the count of incoming units to this query" request.

Cursor gave me a plausible answer instantly, but I wasn't sure it was correct. When the consultant got back to be a few days later, the answer he gave was identical to Cursor's code. Cursor even thought of an edge case that the consultant hadn't.

It blew my mind! I don't know if Cursor just knows about this ERP's code or what, or if it ran enough research queries to figure it out. But it got it right. The only context I provided was the query I wanted to add the count to and the name of the ERP.

So, I 100% believe that, especially with something like MCP, the pull model is the right way. Let the LLM do the hard work of finding all the context.


MCP is just function calls with parameters. Whether or not it's push or pull can be decided by the author. A push model takes the scan as an input to the mcp call. A pull model does the pulling within the mcp call. Neither is right or wrong, it's situational.


I'm not sure if it's productive to give a full database worth of context to an LLM. There'd be a lot of data that could potentially pollute the LLM's reasoning with information that it doesn't need to answer your prompts. This can be specially troublesome for small models.


XML tags work well for LLMs. But notably the vast majority are just xml tags. Nobody™ is feeding LLMs well-formed XML with an xml declaration (the <?xml version="1.0" encoding="UTF-8"?> at the start) and we aren't using namespaces, XSLT, XML Schemas etc. It's just some ad-hoc collection of sgml-style tags


I've noticed that as well, but I doubt those additional tokens hurt.

You can make an unholy mess with namespaces and all of the bells-and-whistles that XML provides, or you can not. But even if you just structure using <tag></tag> without any of the fancy stuff, you still can create pipelines that operate on that structured format in ways that are more powerful than plain text.


still its a format that allows for lossy transmission which I think meshes well with the fuzzy ingestion of llms, the closing tags' redundancy is a feature that helps the llm stay focused


I occasionally dally with XML in hobby code as a document source format, but I think what drives me away at the end is silly stuff with syntax, because it is a big spec well beyond the angle-bracket stuff, it wants to cover all the bases, do security right and do character encoding right - which means that "plain text editing" in it is some really unintuitive stuff where you can type in something like this paragraph and it might be parsed in a compatibility mode, but it won't be valid. As an interchange format or something loaded into application software tailored for it, it has more legs - and LLMs definitely wouldn't have any trouble making sense of it, which is good. A lot of yesteryear's challenges came from programmers short on time and eager to hack in features taking a heavily structured spec and wielding it like a blunt instrument. "XML? Sure, I can write a regex for that." Repeat that across three different programs and three different authors and you have a mess.

There is a format that I've hit upon that actually does get at what I want for myself, and that's BBCode. That's a great source format for a lot of stuff - still basically an angle-bracket, but with the right amount of structure and flexibility to serve as a general-purpose frontend syntax. Early implementations were "write a regex for that" but after decades of battle-testing, there are more graceful parsers around these days as well.


> the semantic web failed

It failed because they couldn't figure out how to stuff ads in.


It failed because nobody wanted to use it. Nobody built a compelling product from it, nobody wanted to author the data, nobody cared to use it when anyone did. It wasn't killed, it never got started because it never had a useful purpose.


I remember learning XML and XSLT from library books in the late 90’s and found it interesting, but couldn’t find a use for it - I wasn’t really handling data in any meaningful way back then, it was all just hobby learning, but in the decades since I’ve touched XML _at all_ maybe three times? Once was a Salesforce integration I wish I could fully forget, the other times were client one-offs when they had a few things stored in XML that we needed to pipe into WordPress.


I found this talk really interesting at conj, argues that RDF and the semantic web may have finally found its niche.

https://youtu.be/OxzUjpihIH4?si=DRSp1n9u56iGbZFZ


> I believe we should be "pushing" context to the models rather than giving them directions on how to "pull" the context themselves.

I believe MCP "Resources"[0] are what you're looking for.

[0]: https://modelcontextprotocol.io/docs/concepts/resources#reso...


> I believe we should be "pushing" context to the models rather than giving them directions on how to "pull" the context themselves.

Models have very very very limited capacity for context, it's one of their primary bottlenecks, therefore it's important to optimize (minimize) that information as much as possible, allowing the model to pull what it decides it needs makes that constraint much easier to satisfy.


>what it decides it needs

This is what I am suggesting: relying on the model to decide what it needs is maybe not the best use of the available context. It might be better for us to give it the information we are certain it will need.


I'm talking about situations where the user wants to ask questions about data that's many orders of magnitude larger than what can be included in any LLM request context. I just provide access to the data (via MCP server, RAG server, whatever) and my question/prompt, it's the LLM's responsibility to do everything else.


I think the main reason that "Semantic Web" has failed is that basically no one can even explain what the "Semantic Web" is or what it's good for. It's all just a bunch of generic sounding buzzwords about exchanging data in some way.


Is it that hard? Instead of just providing data, you provide data plus "semantic tags" that make the data machine-indexable. It was ultimately meant to solve the same sort of "fuzzy-search" problem that we today can use LLMs for: "find me a website which sells home goods" could be addressed with tags that classify a page as a marketplace, and therein one which sells home goods. I think the real problem is that this requires a cohesive ontology, and if you've ever gotten really into tagging your notes/bookmarks/whatever you'll know how hard that can be.


Yes, most of the problem with the semantic web is that you end up with having to agree on how to describe the world: what are the concepts, and how they are related. This kind of work on semantic networks has been done for a while in computational linguistics for natural language processing, but it's very tedious. For instance users of the Minitel in France in the 80s could lookup vets by querying the yellow pages directory with sentences such as "my dog is sick". Nice tech, but not that useful and not very discoverable either.

LLMs promise to just discover all these relations between concepts for you and do the right thing instead. Sometimes it works, sometimes not...


> Sometimes it works, sometimes not...

Sometimes it won't work because there just isn't agreement between humans on what the concept is. Semantic also fails there.


The Semantic Web has yet to fully fail. Most companies have implemented a closed world limited version of it internally. It has already been proposed as a solution to correct LLM outputs by many people.


>LLMs appear to work very well with structured text (XML

Plausible.

>Markdown

What? Markdown is specifically designed to be the least structured markup format possible, the absolute minimum set of features. Did you mean JSON or something?


Two years ago he would have written „pedantic white old Unix dudes“.


True, though "dudes who were never in danger of being gruntled" made it pretty clear who he was targeting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: