Before someone suggests a new search engine where the ranking algorithm is replaced with AI, I would like to propose a return to human-curated directories. Yahoo had one, and for a while, so did Google. It was pre-social-media and pre-wiki, so none of these directories were optimized to take advantage of crowdsourcing. Perhaps it's time to try again?
>
Before someone suggests a new search engine where the ranking algorithm is replaced with AI, I would like to propose a return to human-curated directories. Yahoo had one, and for a while, so did Google. It was pre-social-media and pre-wiki, so none of these directories were optimized to take advantage of crowdsourcing.
False. Google Directory (and many other major name, mostly now defunct, web directories) were powered by data from DMOZ which was crowdsourced (and kind of still is, through Curlie [0] and while some parts of the website show updated as recently as today, enough fairly core links are dead or without content that its pretty obviously not a thriving operation.) Also, it was not pre-Wiki: WikiWikiWeb was created in 1995, DMOZ in 1998. It was pre-Wikipedia, but Wikipedia wasn’t the first Wiki.
Actually, several static snapshops exist (a benefit of open licensing) despite the fact attempts to fork and continue have been not so successful. In addition to the one upthread there are also:
One issue there was that human-curated directories are everywhere. Hacker News is one. Reddit is another. And during the Yahoo times, directories were made everywhere and all over the place. Which one is authoritative? There's too much of them out there.
That said, in NL a lot of people's home pages for a long time was set to startpagina.nl, which was just that, a cool directory of websites that you could submit to the website. It seems to exist still, too.
I don't think we need any "AI" in the modern sense of that word. It would be an improvement to bring google back to its ~2010 status.
Not sure if the kagi folks are willing to share, but I get the impression that pagerank, tf-idf and a few heuristics on top would still get you pretty far.
Add some moderation (known-bad sites that e.g. repost stackoverflow content) and you're already ahead of what google gives me.
https://en.wikipedia.org/wiki/Google_Directory