Get Adobe Flash player

research

A new technique called two-dimensional Gabor Fisher discriminant (2DGFD) is derived and implemented for image representation and recognition. In this approach, the Gabor wavelets are used to extract facial features. The principal component analysis (PCA) is applied directly on the Gabor transformed matrices to remove redundant information from the image rows and a new direct two-dimensional Fisher linear discriminant (direct 2DFLD) method is derived in order to further remove redundant information and form a discriminant representation more suitable for face recognition. The conventional Gabor-based methods transform the Gabor images into a high-dimensional feature vector. However, these methods lead to high computational complexity and memory requirements. Furthermore, it is difficult to analyse such high-dimensional data accurately. The novel 2DGFD method was tested on face recognition using the ORL, Yale and extended Yale databases, where the images vary in illumination, expression, pose and scale. In particular, the 2DGFD method achieves 98.0% face recognition accuracy when using 20%3 feature matrices for each Gabor output on the ORL database and 97.6% recognition accuracy compared with 91.8% and 91.6% for the 2DPCA and 2DFLD method on the extended Yale database. The results show that the proposed 2DGFD method is computationally more efficient than the Gabor Fisher classifier method by approximately 8 times on the ORL, 135 times on the Yale and 1.2801%108 times on the extended Yale B data sets.

More: continued here

The application of different downsampling filters in video coding directly models visual information at lower resolutions and influences the compression performance of a chosen coding system. In wavelet-based scalable video coding the spatial scalability is achieved by the application of wavelets as downsampling filters. However, characteristics of different wavelets influence the performance at targeting spatio-temporal decoding points. An analysis of different downsampling filters in popular wavelet-based scalable video coding schemes is presented. Evaluation is performed for both intra- and inter-coding schemes using wavelets and standard downsampling strategies. On the basis of the obtained results a new concept of inter-resolution prediction is proposed, which maximises the average performance using a combination of standard downsampling filters and wavelet-based coding.

More: continued here

We develop a novel method for class-based feature matching across large changes in viewing conditions. The method is based on the property that when objects share a similar part, the similarity is preserved across viewing conditions. Given a feature and a training set of object images, we first identify the subset of objects that share this feature. The transformation of the feature’s appearance across viewing conditions is determined mainly by properties of the feature, rather than of the object in which it is embedded. Therefore, the transformed feature will be shared by approximately the same set of objects. Based on this consistency requirement, corresponding features can be reliably identified from a set of candidate matches. Unlike previous approaches, the proposed scheme compares feature appearances only in similar viewing conditions, rather than across different viewing conditions. As a result, the scheme is not restricted to locally planar objects or affine transformations. The approach also does not require examples of correct matches. We show that by using the proposed method, a dense set of accurate correspondences can be obtained. Experimental comparisons demonstrate that matching accuracy is significantly improved over previous schemes. Finally, we show that the scheme can be successfully used for invariant object recognition.

More: continued here

This paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of emph{dissimilarity space} that is adapted to the asymmetric classification problem, and in turn to the emph{query-by-example} and emph{relevance feedback} paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (emph{ie} TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.

More: continued here

Few hour ago, I was reading some papers to increase my understanding on (SVM).  I came across the term and I was reminded of challenge I had when I was working on image segmentation of medium to long milled rice grains for quality evaluation. Although there are some published papers that deal with this problem (e.g. “” and “Separation and identification of touching kernels and dockage components in digital images“), they were able to solve only those touching grains with circular-like objects.

So I thought, using convex hull to define the initial boundary of an active contour could be an interesting idea.  But it looks like somebody has already done it – see An unsupervised GVF snake approach for white blood cell segmentation based on nucleus.