Exploiting relationship between attributes for improved face verification

  • Department of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, PR China

Highlights

A novel method to model the relationship between attributes.

An effective method to exploit the learned relationship for model training.

A single framework adapted to both discrete and continuous attributes.

Promising results for face verification and object recognition.


Abstract

Recent work has shown the advantages of using high level representation such as attribute-based descriptors over low-level feature sets in face verification. However, in most work each attribute is coded with extremely short information length (e.g., “is Male”, “has Beard”) and all the attributes belonging to the same object are assumed to be independent of each other when using them for prediction. To address the above two problems, we propose a discriminative distributed-representation for attribute description; on the basis of this description, we present a novel method to model the relationship between attributes and exploit such relationship to improve the performance of face verification, in the meantime taking uncertainty in attribute responses into account. Specifically, inspired by the vector representation of words in the literature of text categorization, we first represent the meaning of each attribute as a high-dimensional vector in the subject space, then construct an attribute-relationship graph based on the distribution of attributes in that space. With this graph, we are able to explicitly constrain the searching space of parameter values of a discriminative classifier to avoid over-fitting. The effectiveness of the proposed method is verified on two challenging face databases (i.e., LFW and PubFig) and the a-Pascal object dataset. Furthermore, we extend the proposed method to the case with continuous attributes with promising results.

Keywords

  • Attribute relationship graph;
  • Attribute-graph regularized SVM;
  • Face verification

1. Introduction

Recently, there has been growing interest in using middle-to-high level feature descriptors for face representation. One typical example is the attribute descriptors [1], [2], [3], [4], [5], [6], [7], [8] and [9]. N.Kumar et al. [3] and [4] have recently shown that using the outputs of a series of component classifiers with each tailored to some particular aspects of the human face images, called visual attributes, they are able to achieve close to state-of-the-art performance of face verification on the challenging Labeled Faces in the Wild (LFW) [10]. This result is interesting in several aspects. Firstly, the number of features used in their work is very small (i.e., only 73 attributes), which means that it provides a very economical but powerful way to describe faces. This is in sharp contrast with the commonly used low-level features in image description, such as pixel values, gradient directions, scale-invariant feature transform (SIFT) [11], where usually thousands of features are needed. Secondly, the attribute descriptor is user-friendly in that its meaning is understandable to human beings (everyone knows what “white male” means) while the meaning of most previously mentioned low-level features is less intuitive to us. Last but not least, such a descriptor is generalizable and sharable, which makes it particularly suitable for such problems as zero-shot learning [12] and [13] or between-class transfer learning [2] and [14].

However, in most work each attribute is coded with extremely short information length (e.g., using binary code such as “is Male”, “has Beard”) and all the attributes belonging to the same object are assumed to be independent of each other when using them for prediction. The one-bit information length of attribute coding makes the representation less stable, and could bring trouble to many interesting subsequent processing tasks, such as modeling the similarity between attributes. Actually, research in the field of cognitive discovery has shown the usefulness of the relationship between feature sets. For example, Bhatt and Rovee-Collier [15] experimentally showed that infants as young as three months of age gain the capability to encode the relations among object features, and use such a feature configuration for general object recognition. However, traditionally one of the major challenges in modeling the feature configurations lies in the huge number of low-level features (e.g.  , the dimension of a 100×100 face image is as high as 10,000 using the gray-value features). In addition, it is very difficult for a human being to understand what exactly such a big feature configuration mean. Fortunately, both aforementioned problems can be addressed by the attribute descriptors due to its high level and compactness in object description. Indeed, despite the partial success of using attribute descriptors by treating them statistically independent of each other [1], [3], [4] and [16] or conditionally independent given the class label [2], recent work has shown that it is beneficial to exploit the relationship between attributes under various contexts [5], [17], [18] and [19]. Some of them will be discussed in the next section.

In this work, we propose a discriminative distributed-representation for attribute description; on the basis of this description, we investigate how to model the similarity relationship between attributes and how such relationship could be exploited to improve the performance of face verification. The idea of distributed representation was first introduced by Hinton [20], and successfully applied in statistical language modeling [21]. In this work, we develop a new distributed representation for each individual attribute by taking the information of subject identification into account. The method is inspired by the vector representation of words in the literature of text categorization, and the meaning of each attribute is embedded into a high-dimensional vector in the subject space (cf. Fig. 1). Such a representation allows us to model the similarity between attributes in a much stable and reliable way. In particular, we construct an attribute-relationship graph based on the distribution of attributes in the subject space, which effectively encodes the pairwise closeness relationship between any two attributes. For example, a “male” attribute is highly related to such attributes as “wearing necktie”, “bushy eyebrows”, “beard”, and so on (cf. Fig. 9). To exploit such information for prediction, we integrate the attribute-relationship graph into a linear classifier to constrain the searching space of its parameters, based on the assumption that similar attributes should have similar weights. This is helpful to avoid over-fitting and improve the generalization capability of the learned classifier. The uncertainty in attributes responses is also taken into account in the final model.