Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No.

Either it picks up too much garbage if you allow any P2P data exchange (can't allow only outgoing AFAIK) or it kinda only knows about the sites you know about. Which kinda defeats the purpose.

Even assuming you just want a specific index for yourself of your own content then it struggles to display useful snippets about the results, which makes it really tedious to shift through the already poor results.

If you try to proactively blacklist garbage, which is incredibly tedious because there's no quick "delete from index and blocklist" button under index explorer, then you'll soon run into an unmanageable blocklist, the admin interface doesn't handle long lists well. At some point (around 160k blocked domains) Yacy just runs out of heap during startup trying to load it which makes the instance unusable.

It also can't really handle being reverse proxied (accessed securely by both the users and peers).

It also likes to completely deplete disk space or memory, so both have to be forcefully constrained. But that ends up with a nonfunctional instance you can't really manage. It also doesn't separate functionality enough that you could manually delete a corrupt index for example.

Running (z)grep on locally stored web archives works significantly better.



Those are pretty bad issues. I remember using it along time ago and only remember the results being bad. I've heard that Yacy could be good for searching sites you've already visited but it sounds like even that might not be a good use case for it.

I do understand the taking up of disk space thing. It's hard to store text of all your sites without it talking up a lot of space unless you can intelligently determine which text is unique and desired. Unless you are just crawling static pages it becomes hard to know what needs to be saved or updated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: