Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't this just the cost of having a "free" product? Bots are not really a problem. Its just that their traffic cannot be monetized. If you could monetize-bot traffic your problem would be solved. Or put another way, if you framed the issue as a business model one, not a technical one, it might be a useful exercise.


   > if you framed the issue as a business model one, not 
   > a technical one, it might be a useful exercise.
That was kind of my point. Clearly most of the bots are trying to scrape my search engine for some specific data. I would (generally) be happy to just sell them that data rather than have them waste time trying to scrape us (that is the business model, which goes something like "Hey we have a copy of the big chunk of the web on our servers, what do you want to know?" but none of the bot writers seem willing to got there. They don't even send an email to ask us "Hey, could we get a list of every site you've crawled that uses the following Wordpress theme?" No instead they send query after query for "/theme/xxx" p=1, p=2, ... p=300.

On a good day I just ban their IP for a while, when I'm feeling annoyed I send them results back that are bogus. But the weird thing is you can't even start a conversation with these folks, and I suppose that would be like looters saying "Well ok how about you help load this on a truck for me for 10 cents on the dollar and then your store won't be damaged." or something.


You may try to contact scrapers through access denied page.

Did you try to explicitly state that your data is available for sale when denying access to p=300?


If you wanted to buy data from Google, how would you email? What is Google's email address?


Google posts lots of contact information on their contact page. You would probably want to reach business development. I don't think they are willing to sell access to that index however, we (at Blekko) would. I suppose you could also try to pull it out of common crawl.


It need not to be commercial service. For example, Wikipedia is a donation-only service. A bot visit is generally not different then most user visiting (I'd assume most users don't donate anyway). Wikipedia doesn't really mind serving users that aren't donating, but the bot, while generally not different to normal user, are stealing resources away from actual users.


That's why Google needs proper API or Pro edition where you could execute proper SQL queries, etc.

Instead, Google is making their search less functional. I don't get why.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: