It turns out people have researched how to do algorithmic recommendations without users having to reveal their personal preferences, and I am intrigued. Apparently, in principle we could have the good parts of, say, Netflix suggesting more things you might want to watch, without exposing ourselves to entities like Facebook selling all our data.
See "Distributed Differential Privacy and Applications" by Narayan, for example. (Also that's the first CC-BY licensed PhD thesis I've seen!)
@b_cavello Okay, I've now skimmed the Leaking in Data Mining paper and watched Octavio Good's talk. They were both interesting and I learned things, but I'm not yet seeing how either one is related to either deidentification or differential privacy. Could you explain more?
At this point I'm nervous about any deidentification technique that doesn't have a differential privacy proof. There have been too many successful reidentification attacks; this feels like "don't roll your own crypto" again.
@jamey I don't think they're directly linked, exactly, but related in goal. The idea of training systems to ignore particular data to me seems hopeful for developing less biased models.
@b_cavello I still don't see the relation, but I agree that the use of adversarial networks to limit over-training was a really interesting part of that talk. I've seen stuff before about trying to remove bias from word2vec embeddings so that for example "doctor" doesn't get associated to "man" and "nurse" doesn't get associated to "woman", and I could imagine using the GAN approach to try to tackle that kind of problem too.
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!