This got me thinking that maybe one of the other big reasons for this is that the algorithms prioritize newer pages over older pages. This produces the problem where instead of covering a topic and refining it over time, the incentive is to repackage it over and over again.
It reminds me of an annoyance I have with the Kindle store. If I wanted to find a book on, let's say, Psychology, there is no option to find all-time respected books of the past centenary. Amazon's algorithms constantly push to recommend the latest hot book of the year. But I don't want that. A year is not enough time to have society determine if the material withstands time. I want something that has stood the test of time and is recommended by reputable institutions.
This is just a guess, but I believe that they use machine learning and rank it by the clicks. I took some coursera courses and Andrew Ng sort of suggested that as their strategy.
The problem is that clickbait and low effort articles could be good enough to get the click, but low effort enough to drag society into the gutter. As time passes, the system is gamified more and more where the least effort for the most clicks is optimized.
But they have. or could have. At least Google (and to a smaller extend Microsoft), if you are using Chrome/Bing have exactly that signal. If you stay on the site and scroll (taking time, reading, not skimming) all this could be a signal to evaluate if the search result met your needs.
I've heard google would guess with bounce rate. Or another way, if the user clicks on LinkedIn website A, after a few moments keeps trying other linksw/related search. It would mean it was not valuable.
It is pretty obvious if you search for any old topic that is also covered incessantly by the news. "royal family" is a good example. There's no way those news stories published an hour ago are listed first due to a high PageRank score (which necessarily depends on time to accumulate inbound links).
Even your example would depend upon the context. There are many cases where a programming question in 2021 is identical to one from 2012, along with the answer. In those instances, would you rather a shallow answer from 2021 or an indepth answer from 2012? This is not meant to imply that older answers offer greater depth, yet a heavy bias towards recent material can produce that outcome in some circumstances.
Yes, yet there are programming questions that go beyond "how do I do X in language Y" or "how do I do X with library Y". The language and library specific questions are the ones where I would be less inclined to want additional depth anyhow, well, provided they aren't dependent upon some language or library specific implementation detail.
There are of course a variety of factors, including the popularity of the site the page is published on. The signals related to the site are often as important as the content on the page itself. Even different parts of the same site can lend varying weight to something published in that section.
Engagement, as measured in clicks and time spent on page, plays a big part.
But you're right, to a degree, as frequently updated pages can rank higher in many areas. A newly published page has been recently updated.
A lot depends on the (algorithmically perceived) topic too. Where news is concerned, you're completely right, algos are always going to favor newer content unless your search terms specify otherwise.
PageRank, in it's original form, is long dead. Inbound link related signals are much more complex and contextual now, and other types of signals get more weight.
Your Google search results show the date on articles do they not? If people are more likely to click on "Celebrity Net Worth (2021)" than "Celebrity Net Worth (2012)", then the algo will update to favour those results, because people are clicking on them.
The only definitive source on this would be the gatekeeper itself. But Google never says anything explicitly, because they don't want people gaming search rankings. Even though it happens anyway.
The new evergreen is refreshed sludge for bottom dollar. College kids stealing Reddit comments or moving around paragraphs from old articles. Or linking to linked blogs that link elsewhere.
It's all stamped with Google Ads, of course, and then Google ranks these pages high enough to rake in eyeballs and ad dollars.
Also there's the fact that each year, the average webpage picks up two more video elements / ad players, one or two more ad overlays, a cookie banner, and half a dozen banner/interstitials. It's 3-5% content spread thinly over an ad engine.
The Google web is about squeezing ads down your throat.
Really makes you wonder: you play whack a mole and tackle the symptoms with initiatives like this search engine. But the root of that problem and many many others is the same: advertising. Why don't we try to tackle that?
> This got me thinking that maybe one of the other big reasons for this is that the algorithms prioritize newer pages over older pages.
Actually that's not always the case. We publish a lot of blog content and it's really hard to publish new content that replaces old articles. We still see articles from 2017 coming up as more popular than newer, better treatments of the same subject. If somebody knows the SEO magic to get around this I'm all ears.
It reminds me of an annoyance I have with the Kindle store. If I wanted to find a book on, let's say, Psychology, there is no option to find all-time respected books of the past centenary. Amazon's algorithms constantly push to recommend the latest hot book of the year. But I don't want that. A year is not enough time to have society determine if the material withstands time. I want something that has stood the test of time and is recommended by reputable institutions.