Isn't that the main point of differential privacy?

TeMPOraL · on Feb 7, 2017

From cursory reading of the Wikipedia article on differential privacy, I understand this is about limiting access to data to a set of queries that are designed to make it difficult to use the query results to infer data the provider wants to protect. That doesn't seem to help in any way though, if the malicious actor owns the dataset and can run whatever queries they like on it. I'm also not sure how this addresses the problem of including additional datasets to help correlate out identities.

divbit · on Feb 7, 2017

(From memory, so possibly miss some details)

Differential Privacy uses the concept of adding 'noise' to a dataset to make it statistically provable, that, if queries are supposed to only be A, B, C... then the adversary can only tell the target from noise with some probability - The point of restricting access to a subset of queries isn't really the main point of how DP improves privacy, but rather, restricting access to a subset of queries makes the formal proofs of things doable.

Paul-ish · on Feb 7, 2017

> I'm also not sure how this addresses the problem of including additional datasets to help correlate out identities.

I agree with you on the first part of you post, but this part is a little off the mark. In the original paper[1], Cynthia Dwork confronts the issue you point out head on; they actually start with an impossibility proof that show no treatment of the data will get you the property "access to a statistical database should not enable one to learn anything about an individual that could not be learned without access". The impossibility result relies on the existence of outside datasets.

DP instead tries to quantify the probability of identification, and adds differing amounts of Laplace noise to get this. The idea is that the dataset shouldn't look "too different" with or without your information in it. If your participation doesn't change the dataset much, how could someone tell if you are in it or not, or moreover link you to a data point in it?

[1] http://www.ccs.neu.edu/home/cbw/static/class/5750/papers/dwo...

frankmcsherry · on Feb 7, 2017

If the data owner is malicious, you are right that there is no guarantee differential privacy is being used. That's a far cry from "there is no such thing as anonymized data"; you could just as easily say "there is no such thing as encrypted data", which I think we agree is wrong?

One can provide differentially private access to web histories that provably mask the presence/absence of individuals in that dataset, even when combined with arbitrary exciting side information, like social networks.

Even with something as simple as differentially private counting, you can pull out correlations between visits, finding statistically interesting "people who visit page X then visit page Y surprisingly often" nuggets, which are exciting to people who don't have their own search logs.