Whitelisting these user agents ensures that anything claiming to be a major search engine is allowed open access. The downside is that user-agent strings are easily spoofed, so a bad bot could crawl along and say, “hey look, I’m teh Googlebot!” and the whitelist would grant access.
How many of these so-called "bad bots" already do this sort of spoofing? Would usage of these techniques only encourage such behavior?
How many of these so-called "bad bots" already do this sort of spoofing? Would usage of these techniques only encourage such behavior?