Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In total the VMs use 182 GB of RAM and 94 CPU cores. The total storage capacity is 620 GB, but that’s not all used.

That level of hardware/cores seems a bit over the top given what TPB does.

When I was a boy we had this thing called 'Alta Vista'. It was the search engine before Bing! came along. Processors did not run at gigahertz speeds back then and a large disk was 2Gb. Nonetheless most offices had the internet and when people went searching 'Alta Vista' was the first port of call for many.

TPB has an index of a selective part of the internets, i.e. movies, software, music, that sort of thing. Meanwhile, back in the 1990's, AltaVista indexed everything, as in the entire known internets, with everything stored away in less than the 620Gb used by TPB for their collection of 'stolen' material.

From http://en.wikipedia.org/wiki/AltaVista

Alta Vista is a very large project, requiring the cooperation of at least 5 servers, configured for searching huge indices and handling a huge Internet traffic load. The initial hardware configuration for Alta Vista is as follows:

Alta Vista -- AlphaStation 250 4/266 4 GB disk 196 MB memory Primary web server for gotcha.com Queries directed to WebIndexer or NewsIndexer

NewsServer -- AlphaStation 400 4/233 24 GB of RAID disks 160 MB memory News spool from which news index is generated Serves articles (via http) to those without news server

NewsIndexer -- AlphaStation 250 4/266 13 GB disk 196 MB memory Builds news index using articles from NewsServer Answers news index queries from Alta Vista

Spider -- DEC 3000 Model 900 (replacement for Model 500) 30 GB of RAID disk 1GB memory Collects pages from the web for WebIndexer

WebIndexer -- Alpha Server 8400 5/300 210 GB RAID disk (expandable) 4 GB memory (expandable) 4 processors (expandable) Builds the web index using pages sent by Spider. Answers web index queries from Alta Vista



Whats your point? that the 20 year old web service used what they had available at their time?

They also didn't get as much traffic as TBP, since there wasn't that many connected back then.

I would also imagine that they didn't have to HIDE their services either.


>It was the search engine before Bing! came along.

IIRC there where (quite) a few before bing. More to the point google was the pinnacle of web searches long before bing came into existence.


How many pages were there on "the entire internet" back then? How many reqs/second did AltaVista serve? How does that compare to the numbers for TPB?


From what I remember the whole of TPB server + data could fit onto a 90Mb usb stick in 2012. Sure we have had many episodes of really important reality TV series and other great stuff that all needs pirating, yet, in 2014 I doubt that 90Mb has ballooned into peta bytes. We are still in the same range - let's say 1Gb might be a reasonable size USB stick to buy for it.

Alta Vista started out with a modest size index of 20 million pages. Let's imagine those pages were all of 1Kb in size, then, 20 10^6 10^3 comes to 20 *10^9 or 20Gb. So, in terms of stuff indexed, that is considerably larger than TPB. Agreed?

Well, maybe not. They could have used compression to get the vastness of TPB onto that USB stick. Around that time - 2012 - they had 1.6 million torrents. That is some way off the Gb that AltaVista indexed, no matter how you bloat the maths. Sad to say, but, in the 1990's, the internet was actually larger than your porn collection.

How useful is reqs/second anyway? By that score Google probably does very badly as a search usually returns the answer on the first page. With old-style search engines you might need to go through scores of pages before getting what you want. I found TPB to be a bit like that too, wading through results pages more than necessary.

TPB is not 'safe for work' and in a lot of jurisdictions you cannot even access it from home. In the UK (which is a small but well populated country) it is not that easy to get onto TPB - you have to have hacker voodoo skills to do that or route through a VPN as none of the main ISPs will let you on. Most of the civilised world has the same need to protect citizens from the evils of TPB so places where it can be accessed are not that common. Even if you could access it, would you? Probably...

Meanwhile, back in 1998 - a year or two before the dotcom crash - plenty of people were using search engines such as AltaVista (which was the best back then) for actual work. Maybe not everyone, but enough people knew about computers and things like AOL disks, modems and what not. The internet was big.

Which reminds me of my main point, the one you thought so important to down vote rather than give kudos for being insightful. TPB uses a constellation of computers and consumes vastly more resources than the biggest search engine of the 1990's, yet, the utility of TPB is limited to only a few fortunate enough to live somewhere where TPB can be accessed. What can be searched for on TPB is a mere subset of what was on Altavista albeit different and not so useful stuff. I would say that with AltaVista they were doing far more with what they had, reaching a better audience, doing something more useful for the world (than serving weight loss adverts) and all together performing a miracle. TPB is a slouch in comparison.


I'm not sure what your point is. AltaVista probably had to put a lot of effort into tuning every part of their infrastructure to keep the site running on that hardware. Why would TPB do that when they can simply get another VM for a fraction of the cost?

Running a top 100 site[1] on 21 VMs in 2014 is quite impressive.

[1]: http://www.alexa.com/siteinfo/thepiratebay.se


Perhaps as a result of a certain duplication for redundancy & decentralisation?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: