*"It's a sign that web standards are getting too complicated."* Is there precede...

dredmorbius · on Nov 20, 2018

What frequently happens is that a simplified alternative appears.

HTML5 rather than XHTML, Markdown vs. HTML or LaTeX, HTML, originally, vs. SGML or Sun's ... proprietary hypertext system (Vue?).

Arguably, replacement of much office suite software with Web technologies.

Multics -> Unix.

anfilt · on Nov 20, 2018

This is true, but a web browser can't really make those choices without a breaking a lot existing stuff. The big problem is that we keep piling onto HTML, CSS, and JS. For instance if we wanted web apps it would have been better to make something separate. Instead we have taken HTML which was originally just a way of rich text formatting and have made into the beast that it is today.

jcrites · on Nov 20, 2018

This may be a nitpick, but hopefully it's also an interesting rabbit-hole:

HTML was originally contemplated as more than a method of rich text formatting. It was created as a way to describe and link arbitrary media and applications. I'd recommend reading the first published proposal for (what later became known as) the World Wide Web written by Tim Berners-Lee [1]. In my reading, I see it as being intended applications as powerful as the kind we build today - at least as far as could be contemplated and described in 1989, and given the degree of abstraction with which the document as written:

> "Hypertext" is a term coined in the 1950s by Ted Nelson [...], which has become popular for these systems, although it is used to embrace two different ideas. One idea[] is the concept: "Hypertext": Human-readable information linked together in an unconstrained way. The other idea [...], is of multimedia documents which include graphics, speech and video. I will not discuss this latter aspect further here, although I will use the word "Hypermedia" to indicate that one is not bound to text.

An example of anticipated usage:

> The data to which a link (or a hot spot) refers may be very static, or it may be temporary. In many cases at CERN information about the state of systems is changing all the time. Hypertext allows documents to be linked into "live" data so that every time the link is followed, the information is retrieved. If one sacrifices portability, it is possible so make following a link fire up a special application, so that diagnostic programs, for example, could be linked directly into the maintenance guide.

Another category of use-case was web crawling, link-based document search, and other data analysis.

These and other anticipated use-cases envision more than text formatting; the primary purposes of the proposal were, in my opinion, the inter-linking of information and the formal modeling of information, especially for the purpose of combining different programs or facilities into a single user experience.

[1] https://www.w3.org/History/1989/proposal.html

robocat · on Nov 20, 2018

I wish Google Search would create an HTML5 subset for documents that would boost rankings if used.

A good majority of search results I am looking for should be simple single page HTML documents that don't use complex HTML5 features that are needed for web apps.

Change ranking, and you give websites the incentive to avoid JavaScript or CSS features that are against the reader's interests.

pitaj · on Nov 20, 2018

I'm 80% sure you're joking, but just in case, this is essentially what AMP does.

anfilt · on Nov 20, 2018

Last thing we need is google dictating more about the internet.

xg15 · on Nov 20, 2018

My understanding was that this was the original plan for XHTML. Keep HTML 4.x around as a "legacy standard" for old content, make new developments in a new language with an architecture more suited for modern use cases.

Of course this would have required browser vendors to support two languages at the same time for a sufficiently long transition period, which was apparently too much to demand.

marcosdumay · on Nov 20, 2018

But they did support both languages, and support them to this day.

It's the sites that didn't adopt XHTML. Everybody on the infrastructure side loved it.

dredmorbius · on Nov 20, 2018

..without a breaking a lot existing stuff...

That's specifically why and how new standards apear. They accomplish most (though not all) the earlier capbilities, with a masive reduction of complexity. It's a form of risk mitigation and debt reduction.

Compare browsers generally: Netscape -> MSIE -> Mozilla -> Firefox -> Chrome -> Firefox. Each predecessor reached a point of complexity at which, even with massive infusions of IPO, software monopoly, or advertising monopoly cash, they were unsustainable.

The old, dedicated dependencies (frames, ActiveX, RealPlayer, Flash, ...) broke. Simpler designs continued to function.

hoffs · on Nov 20, 2018

>For instance if we wanted web apps it would have been better to make something separate

But then we need to make another app + browser version? Which defeats the purpose...

anfilt · on Nov 20, 2018

Moreover, we have gone from Microsoft pushing complexities to Google.

Like the latest two HTTP protocols are both based of tech that google has already made. However, IETF is like that sounds good. It's got it's advantages, but there is very little push back saying well that makes things more complicated.

For instance with HTTP/2 it has support for pushing files to the client. Most back end web stacks are still trying to think of good ways to make that easy to use. Mainly since what files to send depend on what the page contains. So either you have to specify a custom list or the web-server now needs to understand HTML to get a list of required resources. This also gets more complicated since a push will be useless if the resource is already cached. This means your webserver has to have some kinda of awareness of how clients will cache data. Again this starts to mean your web server needs more client knowledge.

This is does not even take into account how the browser should handle these things.

Additionally, while cryptography is a good thing, the standard for HTTP/2 does not require it. However, pretty much all the browsers ignore that un-encrypted HTTP/2 is allowed. So if you wanted to run HTTP/2 without TLS the browsers act like site does not exist. This gets into the problem since there are so few browsers they can basically make defacto standards. So if you went through the effort and followed the standards what you encounter may not follow those standards at all.

rgbrenner · on Nov 20, 2018

The standard for h2 may not have required it, but practically it was required. There are middleboxes on the internet that assume any traffic over port 80 is http 1.1, and will destroy/interfere/break non-1.1 traffic. There are also servers that will respond with a 400 error if they see an unrecognized protocol in the upgrade header. This is why actual data shows h2 has a higher success rate when sent over tls.

IIRC MS/IE wanted to implement it, but they backed off because of these issues

Asking browsers to implement h2c is asking them to make their browsers flakier... their users would see a higher connection error rate... which the user WOULD attribute to their browser, especially if they open the same URL in another browser without h2c and it works.

Using the upgrade header instead of alpn is slower anyway.

Dylan16807 · on Nov 20, 2018

> HTML5 rather than XHTML

Huh? Parsing HTML5 is much more complicated than XHTML, and everything else is about the same.

brohee · on Nov 20, 2018

The issue with XHTML is not parsing, it's generating valid one. The internet got years to try, failed, time to switch to something else...

Because parsing invalid XHTML, which all browsers ended doing, is more complicated than parsing HTML5...

Dylan16807 · on Nov 20, 2018

It's pretty easy to generate a valid XHTML doc. The issues come when someone is editing by hand and doesn't care.

> Because parsing invalid XHTML, which all browsers ended doing, is more complicated than parsing HTML5...

I don't understand what you mean. Isn't the non-strict parser for XHTML just the normal HTML parser? The complication levels should be equal.

gsnedders · on Nov 20, 2018

> It's pretty easy to generate a valid XHTML doc.

In the face of arbitrary user-content, like comments? Are you checking they don't include a U+FFFF byte sequence in there? (Ten years ago almost none of the biggest XHTML advocates had websites that would keep outputting well-formed XML in the face of a malicious user, sometimes bringing their whole site down.)

It's absolutely possible to write a toolchain that ensures this, just essentially nobody does.

> Isn't the non-strict parser for XHTML just the normal HTML parser?

Yes. It's literally the same parser; browsers fork simply based on the Content-Type (text/html v. application/xhtml+xml), with no regard for the content.

The bigger problem with XML parsers is handling DOCTYPEs (and even if you don't handle external entities, you still have the internal ones), and DOCTYPEs really make XML parsers as complex as HTML ones. Sure, an XML parser without DOCTYPE support is simpler than an HTML parser, but then you aren't parsing XML.

brohee · on Nov 20, 2018

The problem is that with the glut of document declaring strict conformance but failing to be, fallback mechanisms had to be implemented, making it like a two pass parser, where if strict fails, you reparse in non strict. In the end slightly more complex, and definitely slower.

Anything more would be paraphrasing http://www.webdevout.net/articles/beware-of-xhtml

bzbarsky · on Nov 20, 2018

In the particular case of web standards, my impression is that some companies that develop browsers (1) tie individual performance evaluations (e.g. bonuses) to whether the engineer has added stuff to standards and (2) _really_ like over-engineering things. The effect on web standards has not been good.

pdonis · on Nov 20, 2018

https://xkcd.com/927/