One more thing I do is this to prevent FOUC: <style>html{visibility: hidden;opac...

chrismorgan · on April 27, 2021

On your FOUC preventer: please never ever do this. If your styles.css link tag is placed sanely (largely meaning “in the head”), the document won’t be drawn until it’s loaded or failed to load. If it loads successfully, the <style> tag served no purpose. If it fails to load, the <style> tag is now blocking access to the content.

On the rest: I mostly agree.

<title> versus og:title is commonly stupid, especially because consumers will tend to fall back to the <title> tag anyway (though in theory they should differ, with <title> normally including the site name, but og:title not supposed to); og:url and rel=canonical are even more stupid, because consumers will fall back to using the actual URL, so their only purpose is cases where the content is actually found at multiple URLs. (“But og:title and og:url are mandatory for OpenGraph validity!” you say? Please, you didn’t even put the prefix="og: https://ogp.me/ns#" attribute on the root element. No one does. No one cares. And I don’t think anyone cares for at least og:title and og:url either, though <meta property=og:description> versus <meta name=description> is uglier, and you can’t even merge the two or Twitter ignores the thing.)

Why can’t we assume UTF-8 unless told otherwise? Historical reasons, mostly. The web is an old place. Rejoice that everything new is done purely in UTF-8, and be comforted that even some of the older things are slowly shifted to a UTF-8 default over time, as the impacts of doing so become smaller.

The Apple icons are a more complicated historical mess. Be glad it’s down to only one stupid extra tag now, rather than a dozen or so, for each of their different device sizes. The time has come when they could fairly reasonably merge it back into icon, since they’ve given up on all of their fancy appearance of their icons.

webstrand · on April 27, 2021

In my experience, rel="canonical" is very useful for the very reason that if the content gets served from a different URL, say an archival page or a misconfigured server, you don't end up with two versions of the page being indexed by search engines. I try to add it to every page, now, so long as I can determine the page's canonical URL. I've found that it saves me a lot of future pain, when having to deal with duplicates that I may not even be able to control.

account42 · on April 27, 2021

> so their only purpose is cases where the content is actually found at multiple URLs

Most content can be retrieved using multiple URLs - for example this comment section is available at both https://news.ycombinator.com/item?id=26952557 and https://news.ycombinator.com/item?id=26952557&blah as well as infinite other variants. Even if you only actively link to one version, you can't be sure no one else will fuck it up so rel="canonical" is useful. og:url should just default to that though (no idea if it does).

mschuster91 · on April 27, 2021

> Why can't it use <title> for og:title?

Because the <title> is often something like "My Content | The Example Blog", which is a redundant information on social media cards so you set og:title to "My Content" only.

> Why can't it use the favicon for apple-touch-icon?

Because the favicon usually is a 32x32 icon that is small and only seen in the tab of the browser, whereas the apple-touch-icon is physically huge, and so requires more resolution.

> Why can't it use the URL for the url?

I assume you're talking about the canonical link tag? This is to prevent SEO penalties - specify a canonical link and Google will prefer your site as origin for the content over spammers that don't bother setting the tag, or in case you run an old-school site with dedicated mobile and desktop sites (see https://developers.google.com/search/docs/advanced/crawling/...).

> Why can't we assume UTF-8 unless told otherwise?

Historical baggage, mostly - the default charset for old content unless specified was either US-ASCII or its superset ISO-8859-1/latin1 (the HTTP/1.0 spec defaulted to US-ASCII, see https://tools.ietf.org/html/rfc1945 section 3.4 and the HTTP/1.1 spec, section 3.7.1, https://www.ietf.org/rfc/rfc2616.txt that defaults to ISO-8859-1). And there is a lot of old content that would break if this default assumption would change.

megous · on April 27, 2021

> Historical...

If you call 2019 history already...

https://techcommunity.microsoft.com/t5/windows-10/windows-10...

mschuster91 · on April 27, 2021

Jesus Christ

account42 · on April 27, 2021

> Because the favicon usually is a 32x32 icon that is small and only seen in the tab of the browser, whereas the apple-touch-icon is physically huge, and so requires more resolution.

You can have multiple rel=icon elements with different resolutions or even scalable ones. Also, browser tabs are not the only place where the favicon is used - and definitely not the original one.

lexicality · on April 27, 2021

> Why can't it use the URL for the url?

Because you can have multiple URLs for the same content and you need a way to say "by the way if you want to link to this page here's the proper URL"

For example, if you have example.com/blog/foo, it's possible that m.example.com/blog/foo, example.com/blog/foo/, example.com/index.php?path=/blog/foo, example.com/index.php?p=1234, example.com/blog/foo?textonly=1 etc etc all point to the same blog post and in many cases you simply can't set up a redirect because the URL is valid.

runarberg · on April 27, 2021

What I don’t understand is if we omit e.g. the og:title meta tag, why can’t scrapers fall back to <title>? Or if there is no <meta name="og:description"> why not fall back to <meta name="description">?

Most of the pages I make are unique and supplying these values twice is simply redundant. My only conclusion is that scrapers are stupid and developers are too laze to fix it (which ironically casts the work on other developers as we now need to supply these redundant values).

kijin · on April 27, 2021

The stupid scrapers we're talking about are major social networks like Facebook and Twitter.

At this point I think it's more about unequal power than it has anything to do with developer laziness. Facebook and Twitter can tell others to add whatever stupid tags they demand, and people will comply. They can even call it a standard, and people will readily acknowledge it as such. Why change the status quo when you have the upper hand?

collinmanderson · on April 28, 2021

> What I don’t understand is if we omit e.g. the og:title meta tag, why can’t scrapers fall back to <title>? Or if there is no <meta name="og:description"> why not fall back to <meta name="description">?

They do fall back to <title> and <meta name="description">. At least Facebook does.

You can test it here (requires facebook login). It works though they complain about the missing tags. https://developers.facebook.com/tools/debug/

aflag · on April 27, 2021

What if your style never loads for one reason or another? Isn't it better for the user to get the content they are after as soon as possible? I thought the whole point of rendering the page as soon as possible was to be helpful to the user, why disable that?

hsivonen · on April 27, 2021

> Why can't we assume UTF-8 unless told otherwise?

See https://hsivonen.fi/utf-8-detection/

megous · on April 27, 2021

Because people using default OS encodings didn't mark their html files with explicit encoding. And defaults were not utf-8 for a long time. For example notepad defaults to utf-8 only since 2019, lol.

https://techcommunity.microsoft.com/t5/windows-10/windows-10...

hutzlibu · on April 27, 2021

Because there are millions of people involved who have millions opinions on the topic and everyone wants their way. Thats why we have the bloody mess called HTML as a compromise.

"Why can't we assume UTF-8 unless told otherwise?"

And this I really do not understand either. I lost hours to restore data, that got lost in conversion from various text encoding formats, discovered too late.