« Stethophone | Main | Mark Warschauer @ OLPC News »

Lifting the lid on the Netflix database

Netflix released an anonymized sample of their user database last year to aid in a competition to develop a better recommendation system. Bruce Schneier, a well-known American cryptographer, leads today's Cryptogram newsletter with a piece on how this and other anonymized datasets have been de-anonymized. In the case of Netflix, the set was compared against user reviews on IMDB. But it's not just the IMDB reviewers who were identified. 99% of the dataset was identified!

Anonymized datasets are certainly useful to researchers, but the social fingerprint is hard to smudge. Before releasing data into a networked world, leaders need to compare the nature of the information to the potential consequences of compromising privacy.

Post a comment


Please enter the security code you see here

About

This page contains a single entry from the blog posted on January 15, 2008 9:33 PM.

The previous post in this blog was Stethophone.

The next post in this blog is Mark Warschauer @ OLPC News.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.34