Differential Privacy to Machine Learning, and Back (Part 1)

Publication
Dec 31, 1969
Abstract

In this tutorial, I will outline some of the recent developments in the area of differentially private machine learning, which exhibit strong connections between statistical data privacy and robust (stable) machine learning. The main theme of this tutorial would be to demonstrate techniques to achieve one from the other. In the first part, I will first review some of the high-profile privacy breaches (and also outline some the underlying principles behind them) to motivate the need for a rigorous notion of statistical data privacy. Subsequently, I will move on to introduce the notion of differential privacy, and present some of the basic concepts commonly used in the design of differentially private algorithms. I will conclude part one by showing that  differential privacy is a strong form of regularization (a common technique used in machine learning to control prediction error), and analyze the  Follow-the-perturbed-leader (FTPL) by Kalai and Vempala' 2005 in this context. In the second part, I will analyze the robustness (stability) properties of the some of the commonly used machine learning algorithms (e.g., gradient descent, LASSO, and, Frank-Wolfe algorithm) and bootstrap the robustness properties into obtaining optimal differentially private algorithms for empirical risk minimization, and model selection. Some of the robustness analyses are new, and can be of independent interest irrespective of privacy.

BibTeX