Advances in Privacy-Preserving Machine Learning
This talk introduces the problem of privacy-preserving machine learning,
and some recent results. The goal of privacy-preserving machine
learning is to provide machine learning algorithms that adhere to strong
privacy protocols, yet are useful in practice. As increasing amounts of
sensitive data are being digitally stored and aggregated, maintaining
the privacy of individuals is critical. However, learning cumulative
patterns, such as disease risks from medical records, could benefit
society. Our work on privacy-preserving machine learning seeks to
facilitate a compromise between these two opposing goals, by providing
general techniques, for the design of algorithms to learn from private
databases, that manage the inherent trade-off between privacy and
learnability.
I will present a new method for designing privacy-preserving machine
learning algorithms. Researchers in the cryptography and information
security community [Dwork et al. '06] had shown that if any function
learned from a database is randomly perturbed in a certain way, the
output respects a very strong privacy definition. The amount of
perturbation depends on the function however, and could render the
output ineffectual for machine learning purposes. We introduce a new
paradigm: perturb the optimization problem, instead of its solution, for
functions learned via optimization. It turns out that, for a canonical
machine learning algorithm, regularized logistic regression, our new
method yields a significantly stronger learning performance guarantee,
and demonstrates improved empirical performance over the previous
approach, while adhering to the same privacy definition. Our techniques
also apply to a broad class of convex loss functions.
This talk is based on joint work with Kamalika Chaudhuri (UC San Diego).