Statistical, DCT and vector quantisation-based video codec September 23, 2008
Posted by whaldsz in : research , add a commentThe authors present a novel hybrid statistical, DCT and vector quantisation-based video-coding technique. In intra mode of operation, an input frame is divided into a number of non-overlapping pixel blocks. A discrete cosine transform then converts the coefficients in each block into the frequency domain. Coefficients with the same frequency index at different blocks are put together generating a number of matrices, where each matrix contains the coefficients of a particular frequency index. The matrix, which contains the DC coefficients, is losslessly coded. Matrices containing high frequency coefficients are coded using a novel statistical encoder. In inter mode of operation, overlapped block motion estimation / compensation is employed to exploit temporal redundancy between successive frames and generates a displaced frame difference (DFD) for each inter-frame. A wavelet transform then decomposes the DFD-frame into its frequency subbands. Coefficients in the detail subbands are vector quantised while coefficients in the baseband are losslessly coded. To evaluate the performance of the codec, the proposed codec and the adaptive subband vector quantisation (ASVQ) video codec, which has been shown to outperform H.263 at all bitrates, were applied to a number of test sequences. Results indicate that the proposed codec outperforms the ASVQ video codec subjectively and objectively at all bitrates.
More: continued here
Discriminant analysis of the two-dimensional Gabor features for face recognition September 23, 2008
Posted by whaldsz in : research , add a commentA new technique called two-dimensional Gabor Fisher discriminant (2DGFD) is derived and implemented for image representation and recognition. In this approach, the Gabor wavelets are used to extract facial features. The principal component analysis (PCA) is applied directly on the Gabor transformed matrices to remove redundant information from the image rows and a new direct two-dimensional Fisher linear discriminant (direct 2DFLD) method is derived in order to further remove redundant information and form a discriminant representation more suitable for face recognition. The conventional Gabor-based methods transform the Gabor images into a high-dimensional feature vector. However, these methods lead to high computational complexity and memory requirements. Furthermore, it is difficult to analyse such high-dimensional data accurately. The novel 2DGFD method was tested on face recognition using the ORL, Yale and extended Yale databases, where the images vary in illumination, expression, pose and scale. In particular, the 2DGFD method achieves 98.0% face recognition accuracy when using 20%3 feature matrices for each Gabor output on the ORL database and 97.6% recognition accuracy compared with 91.8% and 91.6% for the 2DPCA and 2DFLD method on the extended Yale database. The results show that the proposed 2DGFD method is computationally more efficient than the Gabor Fisher classifier method by approximately 8 times on the ORL, 135 times on the Yale and 1.2801%108 times on the extended Yale B data sets.
More: continued here
Influence of downsampling filter characteristics on compression performance in wavelet-based scalable video coding September 23, 2008
Posted by whaldsz in : research , add a commentThe application of different downsampling filters in video coding directly models visual information at lower resolutions and influences the compression performance of a chosen coding system. In wavelet-based scalable video coding the spatial scalability is achieved by the application of wavelets as downsampling filters. However, characteristics of different wavelets influence the performance at targeting spatio-temporal decoding points. An analysis of different downsampling filters in popular wavelet-based scalable video coding schemes is presented. Evaluation is performed for both intra- and inter-coding schemes using wavelets and standard downsampling strategies. On the basis of the obtained results a new concept of inter-resolution prediction is proposed, which maximises the average performance using a combination of standard downsampling filters and wavelet-based coding.
More: continued here
Class-Based Feature Matching Across Unrestricted Transformations September 23, 2008
Posted by whaldsz in : research , add a commentWe develop a novel method for class-based feature matching across large changes in viewing conditions. The method is based on the property that when objects share a similar part, the similarity is preserved across viewing conditions. Given a feature and a training set of object images, we first identify the subset of objects that share this feature. The transformation of the feature’s appearance across viewing conditions is determined mainly by properties of the feature, rather than of the object in which it is embedded. Therefore, the transformed feature will be shared by approximately the same set of objects. Based on this consistency requirement, corresponding features can be reliably identified from a set of candidate matches. Unlike previous approaches, the proposed scheme compares feature appearances only in similar viewing conditions, rather than across different viewing conditions. As a result, the scheme is not restricted to locally planar objects or affine transformations. The approach also does not require examples of correct matches. We show that by using the proposed method, a dense set of accurate correspondences can be obtained. Experimental comparisons demonstrate that matching accuracy is significantly improved over previous schemes. Finally, we show that the scheme can be successfully used for invariant object recognition.
More: continued here
Design of Multimodal Dissimilarity Spaces for Retrieval of Video Documents September 23, 2008
Posted by whaldsz in : research , add a commentThis paper proposes a novel representation space for multimodal information, enabling fast and efficient retrieval of video data. We suggest describing the documents not directly by selected multimodal features (audio, visual or text), but rather by considering cross-document similarities relatively to their multimodal characteristics. This idea leads us to propose a particular form of emph{dissimilarity space} that is adapted to the asymmetric classification problem, and in turn to the emph{query-by-example} and emph{relevance feedback} paradigm, widely used in information retrieval. Based on the proposed dissimilarity space, we then define various strategies to fuse modalities through a kernel-based learning approach. The problem of automatic kernel setting to adapt the learning process to the queries is also discussed. The properties of our strategies are studied and validated on artificial data. In a second phase, a large annotated video corpus, (emph{ie} TRECVID-05), indexed by visual, audio and text features is considered to evaluate the overall performance of the dissimilarity space and fusion strategies. The obtained results confirm the validity of the proposed approach for the representation and retrieval of multimodal information in a real-time framework.
More: continued here