Names and in Faces the News
Source:
IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Volume 02, p.848-854 (2004)
Abstract:
We show quite good face clustering is possible for a
dataset of inaccurately and ambiguously labeled face images. Our
dataset is 44,773 face images, obtained by applying a face finder to
approximately half a million captioned news images. This dataset is
more realistic than usual face recognition datasets, because it
contains faces captured "in the wild" in a variety of configurations
with respect to the camera, taking a variety of expressions, and
under illumination of widely varying color. Each face image is
associated with a set of names, automatically extracted from the
associated caption. Many, but not all such sets contain the correct
name. We cluster face images in appropriate discriminant
coordinates. We use a clustering procedure to break ambiguities in
labeling and identify incorrectly labeled faces. A merging
procedure then identifies variants of names that refer to the same
individual. The resulting representation can be used to label faces
in news images or to organize news pictures by individuals
present. An alternative view of our procedure is as a process that
cleans up noisy supervised data. We demonstrate how to use entropy
measures to evaluate such procedures.