Panasonic R&D Center Singapore has harnessed great leaders as much as it has great talent. These key figures lead major research and development projects at the top of their individual fields. Commanding authority and notable achievements in the industry, we are proud that they are a core part of the team.
Multi-sparse Descriptor: A Scale Invariant Feature for Pedestrian Detection
Yazhou Liu, Pongsak Lasang, Mel W. Siegel, Quan-Sen Sun
Neurocomputing, Vol. 184, pp. 55-65, 2015.12
Multi-sparse Descriptor: A Scale Invariant Feature for Pedestrian Detection
Yazhou Liu, Pongsak Lasang, Mel W. Siegel, Quan-Sen Sun
Neurocomputing, Vol. 184, pp. 55-65, 2015.12
This paper presents a new descriptor, multi-sparse descriptor (MSD), for pedestrian detection in static images. Specifically, the proposed descriptor is based on multi-dictionary sparse coding which contains unsupervised dictionary learning and sparse coding. During unsupervised learning phase, a family of dictionaries with different representation abilities is learnt from the pedestrian data. Then the data are encoded by these dictionaries and the histogram of the sparse coefficients is calculated as the descriptor. The benefit of this multi-dictionary sparse encoding is three-fold: firstly, the dictionaries are learnt from the pedestrian data, they are more efficient for encoding local structures of the pedestrian; secondly, multiple dictionaries can enrich the representation by providing different levels of abstractions; thirdly, since the dictionaries based representation is mainly focused on the low frequency, better generalization ability along the scale range is obtained. Comparisons with the state-of-the-art methods reveal the superiority of the proposed method.
Conditional Convolutional Neural Network for Modality-Aware Face Recognition
Chao Xiong, Xiaowei Zhao, Danhang Tang, Jayashree Karlekar, Shuicheng Yan, Tae-Kyun Kim
International Conference on Computer Vision (ICCV), 2015.12
Faces in the wild are usually captured with various poses, illuminations and occlusions, and thus inherently multimodally distributed in many tasks. We propose a conditional Convolutional Neural Network, named as c-CNN, to handle multimodal face recognition. Different from traditional CNN that adopts fixed convolution kernels, samples in c-CNN are processed with dynamically activated sets of kernels. In particular, convolution kernels within each layer are only sparsely activated when a sample is passed through the network. For a given sample, the activations of convolution kernels in a certain layer are conditioned on its present intermediate representation and the activation status in the lower layers. The activated kernels across layers define the sample-specific adaptive routes that reveal the distribution of underlying modalities. Consequently, the proposed framework does not rely on any prior knowledge of modalities in contrast with most existing methods. To substantiate the generic framework, we introduce a special case of c-CNN via incorporating the conditional routing of the decision tree, which is evaluated with two problems of multimodality – multi-view face identification and occluded face verification. Extensive experiments demonstrate consistent improvements over the counterparts unaware of modalities.
This article presents a low bit-rate super wideband MDCT coder, which is adopted as a part of the recently standardized codec for Enhanced Voice Services. To maximize codec performance at 13.2 kbps, existing algorithms are reviewed and several new tools are introduced into the low bit-rate MDCT coder to improve the performance of the coder while coding music and mixed content. A subjective listening test demonstrates the advantage of the proposed system for 13.2 kbps when compared to AMR-WB+.
Shopper Behavior Recognition for In-Store Merchandising using Camera Image
We are developing a system to detect and analyze human behavior for retail stores, which extracts undiscovered behavioral features of shoppers from camera images, in order to propose innovative ideas for in-store merchandising. Human detection and tracking based on image feature matching is likely to fail in an environment with various backgrounds, such as in crowded stores. Therefore, we have implemented a method of human detection based not only on Histogram of Oriented Gradients (HOG) descriptors but also on Histogram of Depth Difference (HDD) descriptors acquired from machine learning an image from an RGB-Depth sensor. As a result, we have succeeded in reaching a precision of 99% and recall rate of 97% for human detection, even for cases which can rarely be detected only by using HOG from an RGB image. We also discovered that we could reduce the error rate of human tracking by increasing the frame rate of the image rather than the quality.
This paper presents a low bit-rate MDCT coder, which is adopted as a part of the recently standardized codec for Enhanced Voice Services. To maximize codec performance for NB to SWB input signals for low bit-rates (7.2 to 16.4 kbps), new adaptive bit-allocation and spectrum quantization schemes, which emphasize perceptually important spectrum while efficiently coding full spectrum, was introduced into the low bit-rate MDCT coder. Further, small symbol switched Huffman coding is exploited for reducing the bits consumption for quantizing band energies of the spectrum. Finally, the performance of the coder is illustrated with some listening test results.
Geodesic Invariant Feature: A Local Descriptor in Depth
Yazhou Liu, Pongsak Lasang, Mel W. Siegel, Quan-Sen Sun
IEEE Transactions on Image Processing ( Volume: 24 , Issue: 1 , Jan. 2015 )
Geodesic Invariant Feature: A Local Descriptor in Depth
Yazhou Liu, Pongsak Lasang, Mel W. Siegel, Quan-Sen Sun
IEEE Transactions on Image Processing ( Volume: 24 , Issue: 1 , Jan. 2015 )
Different from the photometric images, depth images resolve the distance ambiguity of the scene, while the properties, such as weak texture, high noise, and low resolution, may limit the representation ability of the well-developed descriptors, which are elaborately designed for the photometric images. In this paper, a novel depth descriptor, geodesic invariant feature (GIF), is presented for representing the parts of the articulate objects in depth images. GIF is a multilevel feature representation framework, which is proposed based on the nature of depth images. Low-level, geodesic gradient is introduced to obtain the invariance to the articulate motion, such as scale and rotation variation. Midlevel, superpixel clustering is applied to reduce depth image redundancy, resulting in faster processing speed and better robustness to noise. High-level, deep network is used to exploit the nonlinearity of the data, which further improves the classification accuracy. The proposed descriptor is capable of encoding the local structures in the depth data effectively and efficiently. Comparisons with the state-of-the-art methods reveal the superiority of the proposed method.
Learning Hierarchical Feature Representation in Depth Image
Yazhou Liu, Pongsak Lasang, Mel W. Siegel, Quan-Sen Sun
Proc. Asian Conference on Computer Vision (ACCV), November 2014, pp. 593 – 608
This paper presents a novel descriptor, geodesic invariant feature (GIF), for representing objects in depth images. Especially in the context of parts classification of articulated objects, it is capable of encoding the invariance of local structures effectively and efficiently. The contributions of this paper lie in our multi-level feature extraction hierarchy. (1) Low-level feature encodes the invariance to articulation. Geodesic gradient is introduced, which is covariant with the non-rigid deformation of objects and is utilized to rectify the feature extraction process. (2) Mid-level feature reduces the noise and improves the efficiency. With unsupervised clustering, the primitives of objects are changed from pixels to superpixels. The benefit is two-fold: firstly, superpixel reduces the effect of the noise introduced by depth sensors; secondly, the processing speed can be improved by a big margin. (3) High-level feature captures nonlinear dependencies between the dimensions. Deep network is utilized to discover the high-level feature representation. As the feature propagates towards the deeper layers of the network, the ability of the feature capturing the data’s underlying regularities is improved. Comparisons with the state-of-the-art methods reveal the superiority of the proposed method.
Cher Keng Heng, Samantha Yue Ying Lim, Zhiheng Niu, Bo Li
The Visual Object Tracking VOT2014 Challenge in conjunction with ECCV 2014, 2014.09
PLT 14 tracker is an improved version of PLT tracker used in VOT 2013 [35], with size adaptation for the tracked object. PLT 14 uses discriminative pixel features to 20 Authors Suppressed Due to Excessive Length compute the scanning window score in a tracking-by-detection framework. The window score is ‘back projected’ to its contributing pixels. For each pixel, the pixel score is computed by summing the back projected scores of the windows that use this pixel. This score contributes to estimate which pixel belongs to the object during tracking and determine a best bounding box
Novel edge preserve and depth image recovery method in RGB-D camera systems
Pongsak Lasang, Shengmei Shen and Wuttipong Kumwilaisak
Proc. IEEE International Conference on Consumer Electronics – Berlin, September 2014, pp. 346-349.
Novel edge preserve and depth image recovery method in RGB-D camera systems
Pongsak Lasang, Shengmei Shen and Wuttipong Kumwilaisak
Proc. IEEE International Conference on Consumer Electronics – Berlin, September 2014, pp. 346-349.
We propose a new edge preserve and depth image recovery method in RGB-D camera systems that gives a sharp and accurate object shape from a noisy boundary depth map. The edges of an input depth image are detected and the noisy pixels around them are removed from the depth image. An anisotropic diffusion edge tensor of an input RGB image is computed. Missing depth pixels are then recovered using the total generalized variation optimization with guidance of the RGB-image edge tensor. Thus, accurate object depth boundary can be obtained and well aligned with the object edges in RGB images. The missing or invalid depth pixels in the large hole areas and the thin object can also be recovered. Experimental results show the improvement in edge preserve and depth image recovery with the expense on computation complexity when compared with previous works.