Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would love to be able to cache websites to my server and access them in the event the site goes offline. I've tried a number of things like wget to "offline" a website and had mixed success. Does anyone know of a proven way to do something like this? (I'd even settle for no images a la google archive/cache but pulling images and scripts would be a huge win)

I'm younger but I can already see link-rot destroying my bookmarks. I now use (and pay) for pinboard.in however I'd like a way to do it myself. I've considered writing a chrome plugin to send url's I visit over to a process running on my server to archive it (with the ability to black/whitelist domains) but haven't found a way to do it yet the works reliably (I'd also probably need to send a copy of my cookies for auth sites).



> I've tried a number of things like wget to "offline" a website and had mixed success. Does anyone know of a proven way to do something like this?

What about httrack[0]? From description in OpenBSD ports:

HTTrack is an easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.

Or, you can use wget for downloading a single page or recursive download. :)

[0]: http://www.httrack.com/


How often a descent website goes offline?


What we consider "decent" today is not always "decent" tomorrow and things like personal blogs go down all the time or change their URL structure. Also not everyone has a community/family that will keep their work online after they are gone and I don't want to lose content because someone's hosting lapsed after their death.

Looking back I wish I had archived some of the forums that I used as a kid as a number of them are just gone, no wayback machine, no cache, no archive, just gone.

Sidenote: I'd love to work or just use on a service that can will allow for community funding of both hosting/domain reg so that you could add a widget on your site and have it stay online even after your death as long as people donate, maybe even make the site static if no one can pay and use proceeds from other sites to float the cost. There is a chance that you could die and your close friends/relative would either not have the access (password/key) or technical know-how to keep your site online even if they had the funds to do so




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: