k-anonymization with minimal loss of information
Source:
IEEE Transactions on Knowledge and Data Engineering (TKDE), accepted for publication (2008)
Abstract:
The technique of k-anonymization allows the releasing of
databases that contain personal information while ensuring some
degree of individual privacy. Anonymization is usually
performed by generalizing database entries. We
formally study the concept of generalization, and propose three
information-theoretic measures for capturing the amount of
information that is lost during the anonymization process. The
proposed measures are more
general and more accurate than those that were proposed by
Meyerson and Williams and Aggarwal et al.
We study the problem of achieving k-anonymity with minimal
loss of information. We prove that it is NP-hard and
study polynomial approximations for the optimal solution. Our
first algorithm gives an approximation guarantee of
O(ln k) for two of our measures as well as for the previously
studied measures. This improves the best-known
O(k)-approximation of Aggarwal et al. While the previous approximation
algorithms relied on the graph representation framework, our
algorithm relies on a novel hypergraph representation that
enables the improvement in the approximation ratio from O(k) to
O(ln k). As the running time of the algorithm is O(n^{2k}), we
also show how to adapt the algorithm of Aggarwal et al. in order to
obtain an O(k)-approximation algorithm that is polynomial in both n and k.
Download: