Netflix released an anonymized sample of their user database last year to aid in a competition to develop a better recommendation system. Bruce Schneier, a well-known American cryptographer, leads today's Cryptogram newsletter with a piece on how this and other anonymized datasets have been de-anonymized. In the case of Netflix, the set was compared against user reviews on IMDB. But it's not just the IMDB reviewers who were identified. 99% of the dataset was identified!
Anonymized datasets are certainly useful to researchers, but the social fingerprint is hard to smudge. Before releasing data into a networked world, leaders need to compare the nature of the information to the potential consequences of compromising privacy.
