Data is a core element of modern society but its collection and use also raise serious privacy concerns. To allow data to be used while preserving privacy, GDPR and other legal frameworks rely on the notion of “anonymous data”.
In this talk, I will first show how historical anonymization methods fail on modern large-scale datasets including how to quantify the risk of re-identification, how noise addition doesn't fundamentaly help, and finally recent work on how the incompleteness of datasets or sampling methods can be overcomed. This has lead to the development of online anonymization systems which are becoming a growing area of interest in industry and research. Second, I will discuss these the limits of these systems and more specifically new research attacking a dynamic anonymization system called Diffix. I will describe the system, both our noise-exploitation attacks, and their efficiency against real-world datasets. I will finally conclude by discussing the potential of online anonymization systems moving forward.