Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I'm also not sure how this addresses the problem of including additional datasets to help correlate out identities.

I agree with you on the first part of you post, but this part is a little off the mark. In the original paper[1], Cynthia Dwork confronts the issue you point out head on; they actually start with an impossibility proof that show no treatment of the data will get you the property "access to a statistical database should not enable one to learn anything about an individual that could not be learned without access". The impossibility result relies on the existence of outside datasets.

DP instead tries to quantify the probability of identification, and adds differing amounts of Laplace noise to get this. The idea is that the dataset shouldn't look "too different" with or without your information in it. If your participation doesn't change the dataset much, how could someone tell if you are in it or not, or moreover link you to a data point in it?

[1] http://www.ccs.neu.edu/home/cbw/static/class/5750/papers/dwo...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: