Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Care about search privacy? Duck Duck Go no longer logs IPs. (gabrielweinberg.com)
189 points by epi0Bauqu on March 15, 2010 | hide | past | favorite | 95 comments


It seems sort of like you're destroying the possibility for growth with this, though. From what I understand of Google's work, they really have improved tremendously from analyzing data (to find how searches go wrong), and from using the data in new features (such as with spell checking, adding new terms into the query, personalizing searches, etc.).

And I mean, I value privacy. I do. But, I value the quality of the search results even more than whatever privacy is lost from search engines logging, I guess. Unless there is some path to improvement, all I can ever see myself using Duck Duck Go for is porn, tips on murdering my spouses, and information on illegal drugs. Of course, other people would use it for much more-- but for me, there something truly wonderful about being able to search for "nose" and have the first result be a unit test runner I want to use, not a piece of anatomy I already know everything I need to about. Sure, it's stupid-- I should search for "nose tests", but I sometimes make mistakes, and here Google doesn't even need to say so and make me try again. It finds the target anyway, without problems (and the less-likely Wikipedia article is right below, as result #2). Without Google logging what I searched for and building a profile, this wouldn't be possible.

I don't mean to be fanboyish and loving of Google (they do scare me sometimes). If they added some sort of feature that said, "don't record anything I do right now with these searches", I would be very happy. But, similarly, I would also be happy if Duck Duck Go at least had an option to opt-in to logging and monitoring and so on. On my end, I would consider my relatively minimal, and explicitly given, loss of privacy to be worth it to help make a search engine I am using better (assuming, of course, that I'm using it-- and I'm not opposed to trying something new to me).


In my opinion Google results used to be great before they did all the personalizing, and now they are often really bad. I get infuriated sometimes how hard it is to search for a specific term. I have to use all kinds of + and - signs in my search terms to get anything near a meaningful result and it is not straightforward at all to simply go to an English version of a Google site because Google insists that I must be Dutch and therefore only want localized information.

Of course they did not get worse because of the personalizing (though the location-based stuff absolutely has to go away - we have browser language strings for that, don't try to guess what I want if I have a way to tell you what I want), but I do think they did pretty well before they did all that, so I see no reason why DuckDuckGo can't as well and wish them the best.


> it is not straightforward at all to simply go to an English version of a Google site because Google insists that I must be Dutch and therefore only want localized information

This kills me when traveling. There's ways around it, but it's a bit of a hassle. Really one of my only hassles with Google, though I understand it probably serves 99%+ of searches better.



There is a lot you can do with the aggregate query data.

I do plan to eventually have accounts, i.e. opt-in features.

And as for results, I would suggest you try it for a week or so and see for yourself.


The best reason to keep identity in your logs is spam prevention. It may not be a big deal now, but it will be if this works out.


Why don't you just hash the ip and do a geo ip lookup before you log it? Voila, you add privacy, but keep the relevant data.


This is definitely a killer niche to fill given the current privacy conscious climate [around here]. Kudos.


"Privacy conscious climate"? Are we using the same Internet? (The one with Twitter, Facebook, Foursquare, etc...)


By privacy conscious I mean users - specifically us techy users (where there is a lot of privacy debate currently going on)


It's certainly a niche, but I'm not sure how many techy users really care about privacy. We all have Gmail accounts, right? Which store emailed passwords, password reset links, private conversations, account balances, and all sorts of other stuff?

Another data point... Our userbase at RescueTime is comprised almost entirely geeks. We measure and store second-by-second attention data on their computer. We make all sorts of privacy features available (ability to delete bits of data, track only whitelisted stuff, etc., etc.). The VAST majority of our users never touch any of these features.

And we're growing like gangbusters.

I think people who make the bet that people in the future are going to be increasingly privacy-conscious are making a pretty bad bet... But I do imagine that it will continue to be a niche (though likely a shrinking one).


I think you are right but in the short term only. I think long term there will be some huge flap involving online privacy and after that people will wake up to what the consequences can be.


Really? What can the consequences be? I'm not being flip-- I think it's easy to say that but its harder to pin down dangerous scenarios.

The increasing fact is that the more Google knows about you, the better they can serve up search results (and ads, of course). I think most users would NOT opt-out given the chance to do so (if it meant harming their search quality).

Related quote from Matt Cutts' blog:

"It’s a long article, but an example useful fact is that if X is the number of people who visit the Ad Preferences page and opt out, 10X people don’t opt out and 4X people actually edit their categories to improve the targeting relevance of the ads they see. Let me say that again: four times as many people change their settings to make their ads more relevant than opt out of interest-based targeting. I think the Ad Preferences page is a good example where users get more transparency and control regarding their privacy."


Let me give you just two possible scenarios, there are plenty more:

- An insurance company somehow gets their hands on search data and finds a way to associate that with their customers, figures out who is probably about to apply for coverage for a bunch of expensive treatments and cancels their insurance before they can make the claim for some trumped up reason.

- A political party gets their hands on this information and datamines it to the hilt in order to figure out which talking points to trot out on what TV stations in order to swing a maximum block of voters to them.

I'm sure there will be people that will say both of these are unrealistic, but I think that given the amount of data out there it is only a matter of time before someone will find a way to abuse it in a way that we will not be able to ignore.


>An insurance company somehow gets their hands on search data and finds a way to associate that with their customers

You mean like the AOL search data leak? I never read anything about insurance companies misappropriating the data in this manner. So I'd say its more of a long shot.

Same with political parties... again, never read anything like that.

Not to say that it hasn't happened, I'd be curious if anyone has any information to show if it has?


A fair point certainly.

However there has been a lot of discussion about this recently - especially regarding search. I think there is a fair number of people for whom anonymous searching is important and that is only likely to get larger.


I don't think the debate is about privacy but more about how certain bits of information about people can be used to manipulate their attention in order to make money. Google uses the information it collects to serve more relevant ads but this is in a certain sense an unfair manipulation of your attention and this is what I find offensive.


That's just the internet you know about. :)


I don't use any of those and I deny google cookies.


"current privacy conscious climate" sounds like something Fox News would use. Just like we were all subjected to years of "...in these uncertain times." after 9/11.

And as said, this is probably the least privacy conscious the internet has ever been. Most people just don't care (They're also the ones clicking on ads).


I was more referring to the climate in places such as here (HN) and so forth. We fall quite squarely into his target audience (based on previous postings anyway) and a good proportion of us are security conscious now :)


true, true. Unfortunately we're also hard to monetize.


I clicked to open the comments threads for both this and the "Please TechCrunch, stop using "dead simple" to describe every new app." article.

I honestly thought this was an ironic response to the second article!


I noticed you're running your own servers. What kind of setup do you have? How many machines? Are the machines literally sitting in your garage or are you colo'ing somewhere?

Not asking because of privacy, just curious about the technology. If you've answered these questions already somewhere, sorry!


He wrote about his site's architecture here: http://www.gabrielweinberg.com/blog/2009/03/duck-duck-go-arc...

He also answered some questions on HN here: http://news.ycombinator.com/item?id=525048


He does log the browser string. The EFF's panopticlick project shows that this is a way to identify a user surprisingly uniquely.


Wow, really? Any references? I'd consider ditching them. They're not doing much for me anyway.



Right, that's tracking everything the browser outputs, not just the user agent, which is maybe 1% of the full string.


Given that IE seems to include all the installed programs on your computer in the UA, I would have thought there's quite a lot of info. I've seen some really long IE UA strings, including all versions of .NET installed, InfoPath, random stuff for adding smilies to emails (really!), and so on.

Combine that with a few other hints, and you've got something interesting.

Maybe you could distill it to a very simple description, like IE6 or FF3.5? I wouldn't have thought you'd want to drop knowing which browsers your users are using.


Good idea. For the time being, I just turned user agents off. I have't really got much use out of knowing what browser's people are on anyway.


I did an experiment and turned off javascript and redid the test... it still is unique enough to track a browsing session. Give it a try.

I would consider using Duck Duck Go if nothing was logged. Right now I use another service precisely because they don't log anything.


I just turned user agents off.


Wow, now that's what I call customer service :-) Thanks for listening... I'll give it a try.


Make sure to update your blog entry !


what other service? Is it scroogle?


I believe cuil (hee hee) has advertised their lack of logging.

http://www.cuil.com/info/privacy/


yes


If you're only logging the info provided in the GET request you only have less than 20 "bits of identifying information". They also use some Javascript to collect additional information, which increases the available information quite a bit.


Ditching them would probably identify as "the guy without browser identification strings".


How is the legal situation in US? Aren't web services forced to keep records?


No, but there are rules if you do. (e.g., have to share with FBI when asked, have to delete after X months).

Most search companies want to identify you. The problem is keeping them from having too many records.


>> "Most search companies want to identify you."

I'm not convinced they want to do it for any malicious/profit based motive.

They want to identify me to give me a better user experience, sure.

Being able to identify users searching on a search engine doesn't suddenly mean it's easier to monetize. Google would still make billions from the adverts next to search results regardless of how identifiable users are. It's not related to their bottom line.

I guess I just don't buy the conspiracy theories on it.


Storing that much data about users isn't harmless just because the company doesn't do anything bad with it.

Even if that company would forever be good, they could still have their servers infiltrated into or stolen, or some government could decide that they want the data over night.


If you're looking at risk though:

  * Computer could have been compromised
  * Wifi/local network could have been compromised
  * Your ISP could have been compromised
  * Some of the peers on the internet between your ISP and google "
  * The datacenter where Googles servers are
And that's before the data gets to Google.

I'd guess that the most common privacy theft is at the first 2 steps - local computer / wifi network.

Any one of those could happen, someone could be logging all your searches + all internet traffic in some of those cases.

Personally though, "What I've searched for" is in the 'meh don't care if anyone knows this' bucket for me.


When we're talking about that much data, the risk isn't just personal any more. It's not that they'll embarrass you with your search words from 10 years ago or let burglars know when you're shopping for plane tickets for Honolulu.

It's about what that amount of data about a whole country 's population could be used for.


just because your data can be stolen in other ways does not mean you should do nothing about it. you could address all those issues.


Right, but knowing the risk associated with things is important... For example I don't care much about terrorism, I care about making sure my kids look both ways when crossing the road.

Similarly, I care more about individual users PCs getting compromised than Google getting hacked.


Remember when AOL purposely released search data? They tried to think it through and took steps to anonymize the data before they let anyone look at it. A lot of interesting things fell out of it, a lot more then you'd expect.

http://consumerist.com/2006/08/aol-user-927-illuminated.html http://en.wikipedia.org/wiki/AOL_search_data_scandal

In the end you're either part of that set or your not. What's bothering people is not having a choice.


I have a copy of that dataset.


There's still an issue of trust, why should I trust Duck Duck Go any more than a company that has a reputation to lose?

Quis custodiet ipsos custodes?


And that's the problem with trying to appeal to paranoid users: they can always find something else to be paranoid about.


What could be done in this area to make you trust? EFF certification of some sort?


I think his point was not that you need to go yet farther to court the privacy niche but that you're going to have a very tough time doing so, no matter how hard you try. Google makes billions of dollars a year off of their search, so people can have pretty good faith that they won't do anything to make people wary of searching with them.

So you'll catch a small niche of crazies (but small in the scope of the Internet may still be pretty big) and maybe get yourself in some more people's minds as "that place to search if I ever need to run search queries that might get subpoenaed."

The solution, though, is NOT to invest lots of time, energy, and money into make people think of you as better for privacy. You're currently the only search engine that logs so little, so you've got that niche (within people who know about you). Go court some other niches and work on making your overall product better.


That's a question you have to answer yourself. Right now, Duck Duck Go is the only search service I've ever heard of even trying to make that promise.

As an aside, there are a couple issues with companies and reputation. First is that a company might risk its reputation if the rewards are worth it. Second is that a company might care about its reputation but simply fail to hold true to it. Finally, a company might care about its reputation some of its employees might not.



If you still want to use Google but want privacy, try:

https://ssl.scroogle.org/

What they do is explained here: http://www.scroogle.org/ and here https://ssl.scroogle.org/sslnote.html


What about your upstream provider? Do you know what level of data they retain about your traffic and visitors/sessions?


Doubtful any. It's FIOS business, and I'm directly connected via to their fiber backbone.


I have duckduckgo set as my default search engine in Firefox. Does anyone know how I can make it use SSL? In other words, make it use https://duckduckgo.com instead of http://duckduckgo.com.


In chrome you just add an 's' in the URL in your default search engine settings.


Aha, I did this:

  find ~/.mozilla -name "*duck*"
Then I edited duck-duck-go.xml, changing http to https, and restarted Firefox.


epi0Bauqu you may want to add DDG to Firefox's list of search engines (drop down search engine box, manage search engines, get search engines):

https://addons.mozilla.org/en-GB/firefox/search-engines/

Didn't see it there.


Hmm..we do have a Firefox add-on and I am working on another one. I'm not sure why it doesn't show up there though. Thx.


Go to http://duckduckgo.com and click "Add To Firefox" at the bottom.

*Edit: Just kidding, that's not what you meant. Leaving this comment for those who don't know where to add DDG to their Firefox search engines.


Great to hear! I have been considering switching to DDG to give it a long-term try for a while now and this pushed me over the edge! DDG is now my default search site in FireFox!

Great work epi0Bauqu!


I use ixquick, and they have SSL, too: https://ixquick.com/

I am curious which of these will grant more privacy and better searches.


Unfortunately, if your search engine becomes important enough to spam, you won't be able to use logs as a signal . . .

Best of luck though. Seems like a good publicity strategy.


The search results page retrieves some content from s3.amazonaws.com. The http referer in those requests contains the search term. Thus, even though duckduckgo.com doesn't know you performed a search, amazonaws.com knows your IP address, what you searched for, and when you searched for it. They probably have a log of all that information too.


I'm in the process of changing these links to https.


So the one thing that never seems quite right about this: how/why do we trust that the guy doesn't log IP addresses? Let's put it this way: If someone promises not to do anything bad with the IP addresses they store, and we don't trust them about that, why would we trust someone when they say they don't store them at all?


Flip the question around for a moment. Why aren't we all super paranoid about Google? Because we more or less believe that the way they use the information they collect isn't damaging to us. It doesn't mean we trust them completely, 100%, in every way, shape or form. It doesn't mean we don't use google's service with some degree of reservation that using their service is a bad idea even if they are telling the truth.

If I didn't have a basic level of trust with google I wouldn't use the service. I can apply that same basic level of trust with duckduckgo.

Once you apply that unbiased trust, you have two services: one that uses a variety of technology to protect the information they collect and one that doesn't collect information at all and takes steps to prevent others from doing the same.

Google might get bought out. Google might be subject to a sopoena. A google employee might steal something. Google might make a mistake. Any one of those things might prevent Google from fulfilling the basic promise of "we won't misuse your data." They won't ever have to lie or do anything evil. They might just make a mistake, and all your data will be there, vulnerable.


Which did happen once with AOL searches. I see your point.


Good move, you now have one more user to not-track. I'll be trying DuckDuckGo out as my default search. I'll give you feedback if it doesn't work out. I hope others will give you a try by setting you as the default, too - you can always switch back. (Those that do, change http to https.)


That's great news. Pretty ballsy move too, I'd like to see google match you on that one.

The one little issue is that IP traffic passes through a whole pile of routers before reaching your target, so if you really care about your privacy that much then you should probably use https.


Ballsy, I don't know. I'd call it very smart. Maybe I don't know the extent of the benefits of IP tracking, but being relatively unknown, this is a great way to gain market share as a niche.

> you should probably use https

Not sure if this is what you're implying, but https://duckduckgo.com is mentioned in the article.


I prefer Google. The full link is in green, it's justified left, the results are wider and don't take as much vertical room. There's a cache too. What does DDG do better than Google? Does DDG have the crawling power of Google or does it pull in Google's results?


What does DDG do better than Google?

SSL connection. Nothing logged. When I don't care about results personalization, and if I get results I am looking for, DDG has the superior service.

Today, if an employee working for Big Evil uses google to do a search for "Software Developer open positions near Cambridge, MA", that search will likely be logged by websense and the internet firewall. That information might be found and used against you by a political enemy (when you work for Big Evil, political enemies emerge from the woodwork).

A search made over an ssl connection won't log the details.


Will the SSL hide a post/get? The SERP still have the terms in the querystring. You can get SSL on other Google apps. Maybe not on search.


Yes, it will hide them. TLS runs at a protocol layer below HTTP: http://en.wikipedia.org/wiki/Transport_Layer_Security


Our about page is now dedicated to answering this question: http://duckduckgo.com/about.html


duck duck go just became my #1 search engine.


Nice. A few searches turn up relevant results. But I think it should return more than 5 results.


Try scrolling :)


I don't need to care if people retain my IP. I'm on a netbook and mobile 99% of the time.


Even so, your sessions would be obvious in the log, i.e. one could tie multiple searches together. But to your broader point, some people care about privacy, some don't.


Am I being cynical with noticing that this comes right after your brand awareness post?


Yup. I meant to write that post two weeks ago when I did the poll, but got too busy.

This came about from comments on reddit. I gave it a lot of thought over the past few days and decided to pull the trigger.


It was a tounge in cheek comment really. This move will definitely give you some leverage with a very vocal crowd who has concerns over privacy. I really would like to see more search engines around. I am just tired of big G's de facto monopoly status. So well done and keep it up :)


What was the original reason to store IPs?


Analytics. Google et al. also use it for personalization, though I never did that.


So did you ditch any sort of tracking? It doesn't look like you added cookies or anything. (very cool if so)


Yeah, I ditched it.


Loving this search engine! Even works great on my Samsung Jet mobile phone browser (webkit based).


Just turn off cookies and use a proxy, Google results are still way way superior.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: