Exponential family Fisher vectors for image classification

Abstract

One of the fundamental problems in image classification is to devise models that allow us to relate the images to higher-level semantic concepts in an efficient and reliable way. A widely used approach consists on extracting local descriptors from the images and to summarize them into an image-level representation. Within this framework, the Fisher vector (FV) is one of the most robust signatures to date. In the FV, local descriptors are modeled as samples drawn from a mixture of Gaussian pdfs. An image is represented by a gradient vector characterizing the distributions of samples w.r.t. the model. Equipped with robust features like SIFT, the FV has shown state-of-the-art performance on different recognition problems. However, it is not clear how it should be applied when the feature space is clearly non-Euclidean, leading to heuristics that ignore the underlying structure of the space. In this paper we generalize the Gaussian FV to a broader family of distributions known as the exponential family. The model, termed exponential family Fisher vectors (eFV), provides a unified framework from which rich and powerful representations can be derived. Experimental results show the generality and flexibility of our approach.

Paper

Jorge Sánchez and Javier Redolfi
Exponential family Fisher vector for image classification
Pattern Recognition Letters, 2015 (submitted)

Source code

The code we used in our evaluations can be found here.