On June 13th 2016 at the Apple Worldwide Developers Conference, Craig Federighi, VP of software engineering, mentioned a particular kind of privacy enhancing tool that would enable “crowdsourced learning” while keeping people’s information completely private.  He was talking about differential privacy.

Although the concept is apparently not new[1] -at least in the fields of statistics and mathematics-  it was announced by Apple as a novel way of protecting people’s privacy and, since it is new to me, I got a copy of the original article[2] featuring this concept and read through it.

In the article’s authors’ (Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adan Smith) words, “the goal of a privacy-preserving statistical database is to enable the user to learn properties of the population as a whole while protecting the privacy of the individual contributors”[3].

But how is this achieved? Again, in the author’s words, “On input a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database.  To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user”[4].  Piece of cake! No? In other words, “In differential privacy nobody actually looks at raw data.  There is an interface that sits between the data analyst and the raw data and it ensures that privacy is maintained”[5].

How will this work exactly and how will it protect user’s data? When introduced with MacOS Sierra, the differential privacy algorithm will apparently come with an opt-in feature[6]. The technique is expected to improve Apple’s text, emoji and link suggestions.  It is important to note also that images stored by users will be off-limits and will not be used to improve image recognition algorithms.

Differential privacy would solve therefore the dilemma around the requirement of big data analysis in order to improve features and product experiences and the need to protect user’s data privacy. For how long? According to Aaron Roth, “the Fundamental Law of Information Recovery states that overly accurate answers to too many questions will destroy privacy in a spectacular way.  The goal of algorithmic research on differential privacy is to postpone this inevitability as long as possible”[7].

It must be added, to be fair, that Google has been using the differential privacy technique for some time now with regard to its RAPPOR Project (Randomized Aggregatable Privacy-Preserving Ordinal Response)[8] aimed at finding out which websites are most popular with people when launching Google Chrome browser.

Bearing in mind that even differential privacy’s promise can fail, Cynthia Dwork, one of the authors of the original article regarding this matter recommends: “If you’re interested in privacy, sometimes restrain might be the right approach”. Amen.

 

[1] LALWANI, Mona, “Apple’s use of differential privacy is necessary but not new”, Engadget, June 14th 2016, https://www.engadget.com/2016/06/14/apple-differential-privacy/

[2] DWORK Cynthia, et al., “Calibrating Noise to Sensitivity in Private Data Analysis”, Theory of Cryptography, Third Tehory of Cryptography Conference, TCC 2006, New York, March 4-7 2006, Proceedings,  http://www.cse.psu.edu/~ads22/pubs/PS-CSAIL/sensitivity-tcc-final.pdf

[3] DWORK Cynthia, et al., “Calibrating Noise to Sensitivity in Private Data Analysis”, op. cit., page 1.

[4] DWORK Cynthia, et al., “Calibrating Noise to Sensitivity in Private Data Analysis”, op. cit., page 1.

[5] DWORK, Cynthia, cited by LALWANI, Mona, Apple’s use of differential privacy us necessary but not new, engadget, June 14th 2016, https://www.engadget.com/2016/06/14/apple-differential-privacy/

[6] LALWANI, Mona, “Apple’s differential privacy algorithm will require you to opt-in”, Engadget, June 24th 2016, https://www.engadget.com/2016/06/24/apples-differential-privacy-algorithm-opt-in/

[7] DWORK, Cynthia and ROTH, Aaron, The Algorithmic Foundations of Differential Privacy, Foundations and Trends in Theoretical Computer Science, Vol. 9, Nos. 3-4 (2014), Page 5.

[8] ERLINGSSON, ‚Ulfar, PIHUR, Vasyl and KOROLOVA, Aleksandra, „RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response“, Proceedings of the 21st ACM Conference on Computer and Communications Security ACM, Scottsdale, Arizona, 2014.