Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
iAnnotate – Whatever happened to the web as an annotation system? (2013) (sspnet.org)
76 points by Tomte on July 2, 2023 | hide | past | favorite | 48 comments


Certainly the Ted Nelson vision for hypertext was all about annotation, in particular as a means of correcting half-truths and distortions in the mass media.

In 1995 Dan Huttenlocher and I created CoNote, which supported annotation on web documents. See Computer Support for Cooperative Learning, 1995 conference, http://jamesrdavis.org/Papers/CSCL95/cscl95.rtf


As already discussed yesterday [2], the W3C charter for web annotations has completed with a Recommendation (final) status, that's what happened to it. The Open Annotation Group, on whose work it was based, has a github repo [1] (with last material updates around 2015/16) where it has maintained annotator and other libraries that are or were used by hypothes.es and other services. I have my reservations about using JSON-LD for this purpose, having the vibe of W3C looking for problems to their solutions, like they did with XML all over ie embedding JSON fragments into HTML script elements for metadata (!) when there's extensibility in HTML and/or RDFa already.

Sadly, it's safe to say it never quite took off, and the only collaboration effort that has seen some use in quantities remains the much more modest pingback/linkback protocol. It's not clear to me, though, whether pingback is/was only popular because it could be exploited for DDoS attacks.

[1]: https://github.com/openannotation

[2]: https://news.ycombinator.com/item?id=36560937


I'd love for a distributed, third-party, comment system to gain some traction, but there are at least a few major challenges:

* Ease of use: Getting someone to install an extension is really hard (and doesn't even work on mobile for most users), and getting them to use a specialized browser app is even harder. Firefox would be in a good position to do something on this front. Conceptually, it could even fit in with Pocket. (e.g. "Pocket saves content from everywhere, and allows you to make comments on anything.")

* Canonical addresses: A page/topic/article story can exist at many slightly different urls. How do you properly/automatically consolidate comments? The problem becomes adversarial as soon as it gains any bit of success. No company wants competitors to have the ability to communicate with their site visitors, and will do everything they can to keep comments from being discoverable.

* Spam/moderation: Lots of options here, but none I know of that don't involve big tradeoffs between cost and usefulness.

The good news is that there's probably some opportunity for an "AI" assistant to serve much of the same function as a web-wide commenting/annotations system. The bad news is that if it comes from any of the major players, it might be too sanitized/milquetoast to be useful.


This looks like a feature in web archive: it already keeps snapshots of web pages, so it could add a comments section. It can dodge the moderation problem entirely by not hosting the comments: instead it would render comments from rss sources provided by the user agent, and the user will be responsible for discovering those sources. Personal rss sources could be pinned to domain names, and there need to be some aggregator that combine thousands of sources - those will be moderators of their "herd". Users would advertise their rss comments just like they advertise their twitter pages. There won't be a global view on all comments on a specific page: everyone will see something different, and that's good. All this hinges on the will of the admins of web archive to implement a comments renderer.

Another thought is that such a comments platform already exists: bitcoin. Anyone can attach a message to a hash associated with a url (for a fee). It's gonna be awfully slow, though. And since it's global, any popular url will be buried under mountains of trash, including stuff that will make govs shut down bitcoin completely.


For this community, I’m sure a more fully realised HN browser plug-in that pulled up the article or helped you make a new submission could be useful.

In my experience these browser plugins have come and gone likely due to being more an exploratory coding project rather than a multi dev community supported endeavour.

No idea if the real demand is there tbh.


Those aren't the real challenge. The real challenge is annotation software has an anti-network effect: https://news.ycombinator.com/item?id=23576213

An AI system that tries to summarize the annotations to tame the chaos might be interesting, though whether people are willing to "engage" with a system that takes their input and merely feeds it into an averaging model such that nobody will ever see their actual comment might be a tough sell.


I think Genius.com stated their mission to annotate the web right when they changed their trademark from Rap Genius. Marc Andressen also weighed in on this as investor, saying that he always wanted an annotation system for the world-wide web.

I haven't seen or heard much about the development or adoption of their web annotation concept since that announcement...


Rap Genius (Genius.com) was sold at a loss a few years ago[1]. The annotation idea is, simply put, bad. The only part of Genius.com that made any money was the lyrics serving (they have deals with Spotify and Apple Music iirc); the lyrics explaining, on the other hand, no one cared about.

I'm not exactly sure why the annotation idea is bad, but it has been tried many times now (yours truly included) and has failed over and over again. It probably has something to do with our faulty intuition. It makes sense that it should work (the internet is virtually the perfect medium), but in reality, it stubbornly doesn't want to.

[1] https://variety.com/2021/digital/news/genius-sold-medialab-w...


> The annotation idea is, simply put, bad.

It's definitely proved to be a bad business multiple times over, but I foolishly hold on to hope that it's a good a good idea. (Wikipedia might have some parallels.) Without a really low friction way to discover and contribute comments, it's definitely not going anywhere.


Annotation is work, and almost nobody want to do it for free, but many people probably want to have it for free. On the internet, a conversational style accomplishes this better than a didactic one, and feels less like work.


"Annotation" definitely sounds like work, but people freely leave comments all over the place (Reddit, YouTube, TikTok, Hacker News, etc.) for little more than likes, thumbs-up, karma and other "internet points". A system that allowed people to accumulate these points across the web could be really popular.


Kinda my point exactly, though. Commenting and discussing is really the bread and butter of social right now. This site included. But, pure note taking never made sense. Companies pursuing annotation missed the fact that Vbulletin, Reddit, discord, etc all solve the same problem in a different way.


A lot of people want to have their opinion heard!

Making that opinion into something that others would read, and not only for lulz, is a different, much harder trick.

And, of course, commercial.spam and non-profit vandalism are problems every commenting system must address.


And they are very hard problems, yet unsolved.


Annotation is comment, with the flexibility of going much more granular on the object of comment. We are already commenting on ideas and links for free on HN. No reason not to do it with a greater focus on the text.


Some people (not very many) want to annotate for good reasons. A whole ecosystem of bad actors want to annotate for bad reasons. And not very many site owners want to be annotated; why would you let people scribble on your pages?

(How big is the demand to read annotations? What sort of annotations?)


The problem is that annotation is a feature of a developed documentation platform, but because of the nature of the web one needs to sell it as a single product.

Nobody cares about annotating everything (except geeks like me), but people working within specific documentation domains require annotation as part of their workflow.


I have configured squid as kerberized service. The company distributed a browser specific to the proxy server and that was the company browser.

This meant that using that browser passed through squid and all the web traffic was cataloged. The browsing could then be annotated and cataloged for the office.

It's the kind of thing that takes a while to get configured but after that it is easy to replicate as part of the standard deployment.

Anybody could use any browser to do what they want without passing through the proxy server. One browser was configured to use it.


Vanevar Bush’s Memex was also an annotation system for “the web”. We still don’t have it!


Remember the future is already here, it is just unevenly distributed.


Please try hypothes.is . It's really quite good. I wish it would take off. https://hypothes.is/


Its founder Dan Whaley is a fascinating guy, and hypothesis has a neat backstory.

He was trying to get factual feedback on climate change news, a bit like Twitter community notes now.

Built a web annotation system to enable that.


Come to think of it, twitter is an annotation system, implemented in the shittiest way possible.


Is there an extension that exposes top Twitter threads that link to a given page?


I don't think the world needs one more walled garden.

But, instead of commenting on a URL, what if you could comment on a snapshot in the Wayback Machine on The Internet Archive by using that link instead. And then make a hash of that link and post your comments (and search for others' comments) on an open platform like NOSTR?

The building blocks already exist today.

Here is an example: https://snort.social/e/nevent1qqsytrx2ya4854mvhz5tvpwzutvdlf...


Looked great until I signed up and found there was no official Firefox extension. The unofficial one hasn't been updated in two years.


The way tools like GitHub or Phabricator do code review annotations is pretty good. My ideal annotation system would just be git with annotations based on a selected range of words from the source document, stored in a standardized format within the git repo itself. Something like this has been done before using the 'notes' feature in git, however, notes lack some of the features and behaviors which would be desirable to make them more usable. Ideally notes should be able to take advantage of all of the benefits of git, such as being easily mirrored, forked, merged, etc...


Diigo has solved a number of these goals for me. It’s strange it hasn’t developed as much of the social side of the annotation that it easily could.


I will look into it.


I think the title of this submission should be updated to reflect it was written in 2013


I built and launched Smort.io [1] a year ago to easily edit, annotate and share articles without logging in.

Just prepend smort.io/ before any URL to read.

It also works on ArXiv papers published till May 2023.

[1] https://smort.io


What happened in May 2023?

Would there be a benefit in social network features? E.g. I would love to see the currently most shared and annotated articles for any topic. It would also be interesting to see the Tweets and posts that reference an article.


Can I use it to annotate a random pdf book and save comments for personal use?


What happened to ArXiv in May to make it stop working?


The first annotation network I remember was pre-2000. They could only get investment if they created a system that required modifying DNS. I struggled against this and was the last commercial internet work I did.

The commercial web shifted the development from people self-publishing to ads that needed to be sold.

This has had a tremendous effect on everything.

The best example I have is the massive infrastructure many people insist is required for a website.


> The best example I have is the massive infrastructure many people insist is required for a website.

This is my biggest pet peeve, and I think a lot of it (among supposedly tech-savvy people at least, less technical people are a different story) is caused by people looking at the cost of a random selection of AWS products, often quoting on-demand prices rather than the 40% discount you can get by buying a year of reserved capacity at once, multiplying by 12, and then freaking out.

Many cloud products are not good deals, and almost seem designed to make people think running a web service is inaccessible to them. Those products are usually given a healthy markup because you’re paying to avoid certain setup steps or for the ability to scale infinitely large in two clicks.

You can still just rent a few cheap servers (or even just one) and, if you set them up properly, you can run a decently sized website off of them no problem.


I think what scares a lot of people is the prospect of maintaining a server, configuring it to be secure and keeping it up to speed with security updates. With a cloud product, the only concern becomes keeping your project's dependencies updated which is less intimidating.

It's something that's on my mind when I think about launching a site that's intended to draw a significant userbase. Back in the day I'd set up VPS instances with nginx+unicorn+rails and it was relatively smooth, but security has seemingly become so much more critical that I don't know I'd trust myself to get all the biggest holes patched up and more importantly, keep them patched.


Yes. It's the "servers should be cattle not pets" philosophy; then you realize that having one server necessarily makes it a "pet" that demands periodic care and feeding with occasional emergencies that cost money or wake you up at night.

Also: people use big services for discovery. If you write a blog, nobody's going to read it unless you get out there on the social media and promote it.


Discovey as a reason to spend money is another web ad fantasy. The old days discovery was by word of mouth. The trust deficit goes down when one doe t rely on discovery.

How important is it to discover a blog post no one is talking about?


It's important to the writer, surely? Why write if you have no readers? I mean, ultimately that's why we're writing these comments here to each other rather than each on our own web sites?


The attitude before discovery was if you build it they will come. I see most people pushing discovery/SEO becuase it is complex and people can be convinced it is needed.

A writer needing an audience is nothing new. I think it is just as valid to create no matter what comes.


Agreed, a 3 year reserved small ec2 is a few bucks a month. It can run multiple small websites fine. Hosting has never been more accessible, people just get scared by the concept.


A website is a poorly defined term.

A personal blog is a website.

A forum is a website.

YouTube is a website.

Amazon.com is a website.

Twitter is a website.

They all require very different technology (especially backed) very different technical resources, vastly different amount of labor.

Also, on a lot of smaller websites, crawling bots produce the majority of visits. With www being so big, and its common attractors so strong, it's really hard to have an audience.


All of those I can host from my home with a static IP. They all require a web server. Some may need databases. One needs a streaming service. You are correct that most traffic is some sort of crawler. If one isn't worried about discoverability one get pretty aggressive in killing those sessions. One can develop one's own blacklist. I happen to be toiling away with such a system. We may get the chance to see what happens to an anti-discovery network movement pops up.


Isn't this largely solved with links (including fragment links) and comment sections? Webpages are much shorter than books. There's no need to use anything like page number.


There's definitely some opportunity for intranet/internal tool use. Many times I've wanted to easily extended a tool with supplemental information, but couldn't feasibly do it without direct access to the code (not even feasible for proprietary systems). Example use case might be leaving comments on a Salesforce or Tableau report that you otherwise don't have edit access to.


Many specs of the Permaweb are built around tags, they allow annotations of any upload.

https://specs.arweave.dev

Since code is also just an asset on the Permaweb, you can even define renderers, so every file comes with a web app that can open it.


Also see Fermat's Library: https://fermatslibrary.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: