Panasonic R&D Center Singapore has harnessed great leaders as much as it has great talent. These key figures lead major research and development projects at the top of their individual fields. Commanding authority and notable achievements in the industry, we are proud that they are a core part of the team.
Driver Pose Estimation Using Recurrent Lightweight Network and Virtual Data Augmented Transfer Learning
Driver poses recognition contains three tasks such as body joint, head angle, and face landmark estimation, which is of paramount interest for the advanced driver assistance systems (ADAS). Recently proposed methods intend to use deeper and more complicated networks to achieve better performance, which leads to heavy models that are not feasible for the resource limited applications such as ADAS. To resolve this issue, we have worked on the following aspects: 1) a lightweight network model, which is referred to as recurrent multi-task thin net (RM-ThinNet), has been proposed which was especially designed for the computationally and memory limited devices; 2) a recurrent structure has been introduced to handle the scale difference and dependency between different tasks, and this recurrence ensures the different tasks are accomplished at different stages and their outputs can augment each other; and 3) a virtual data synthesization pipeline and a couple transfer learning method have been presented, by which network can be learnt effectively by relatively a small number of real data. Comparisons with the state-of-the-art methods reveal the superiority of the proposed method and competitive performance can be achieved with smaller model size and faster speed.
Deep Face Recognition Model Compression via Knowledge Transfer and Distillation
Jayashree Karlekar, Jiashi Feng, Zi Sian Wong, Sugiri Pranata
arXiv, 2019, 2019.06
Fully convolutional networks (FCNs) have become de facto tool to achieve very high-level performance for many vision and non-vision tasks in general and face recognition in particular. Such high-level accuracies are normally obtained by very deep networks or their ensemble. However, deploying such high performing models to resource constraint devices or real-time applications is challenging. In this paper, we present a novel model compression approach based on student-teacher paradigm for face recognition applications. The proposed approach consists of training teacher FCN at bigger image resolution while student FCNs are trained at lower image resolutions than that of teacher FCN. We explored three different approaches to train student FCNs: knowledge transfer (KT), knowledge distillation (KD) and their combination. Experimental evaluation on LFW and IJB-C datasets demonstrate comparable improvements in accuracies with these approaches. Training low-resolution student FCNs from higher resolution teacher offer fourfold advantage of accelerated training, accelerated inference, reduced memory requirements and improved accuracies. We evaluated all models on IJB-C dataset and achieved state-of-the-art results on this benchmark. The teacher network and some student networks even achieved Top-1 performance on IJB-C dataset. The proposed approach is simple and hardware friendly, thus enables the deployment of high performing face recognition deep models to resource constraint devices.
Anomaly Detection with Adversarial Dual Autoencoders
Ha Son Vu, Daisuke Ueta, Kiyoshi Hashimoto, Kazuki Maeno, Sugiri Pranata, Sheng Mei Shen
arXiv, 2019, 2019.02
Semi-supervised and unsupervised Generative Adversarial Networks (GAN)-based methods have been gaining popularity in anomaly detection task recently. However, GAN training is somewhat challenging and unstable. Inspired from previous work in GAN-based image generation, we introduce a GAN-based anomaly detection framework – Adversarial Dual Autoencoders (ADAE) – consists of two autoencoders as generator and discriminator to increase training stability. We also employ discriminator reconstruction error as anomaly score for better detection performance. Experiments across different datasets of varying complexity show strong evidence of a robust model that can be used in different scenarios, one of which is brain tumor detection.
Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition
Jian Zhao, Yu Cheng, Yi Cheng, Yang Yang, Fang Zhao, Jianshu Li, Hengzhu Liu, Shuicheng Yan, Jiashi Feng
The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), 2019.01
Despite the remarkable progress in face recognition related technologies, reliably recognizing faces across ages still remains a big challenge. The appearance of a human face changes substantially over time, resulting in significant intra-class variations. As opposed to current techniques for age-invariant face recognition, which either directly extract age-invariant features for recognition, or first synthesize a face that matches target age before feature extraction, we argue that it is more desirable to perform both tasks jointly so that they can leverage each other. To this end, we propose a deep Age-Invariant Model (AIM) for face recognition in the wild with three distinct novelties. First, AIM presents a novel unified deep architecture jointly performing cross-age face synthesis and recognition in a mutual boosting way. Second, AIM achieves continuous face rejuvenation/aging with remarkable photorealistic and identity-preserving properties, avoiding the requirement of paired data and the true age of testing samples. Third, we develop effective and novel training strategies for end-to-end learning the whole deep architecture, which generates powerful age-invariant face representations explicitly disentangled from the age variation. Moreover, we propose a new large-scale Cross-Age Face Recognition (CAFR) benchmark dataset to facilitate existing efforts and push the frontiers of age-invariant face recognition research. Extensive experiments on both our CAFR and several other cross-age datasets (MORPH, CACD and FG-NET) demonstrate the superiority of the proposed AIM model over the state-of-the-arts. Benchmarking our model on one of the most popular unconstrained face recognition datasets IJB-C additionally verifies the promising generalizability of AIM in recognizing faces in the wild.
Image Denoising with Deep Convolutional and Multi-directional LSTM Networks under Poisson Noise Environments
Teerawat Piriyatharawet, Wuttipong Kumwilaisak, Pongsak Lasang
18th International Symposium on Communications and Information Technologies (ISCIT) 2018, 2018.09
Image Denoising with Deep Convolutional and Multi-directional LSTM Networks under Poisson Noise Environments
Teerawat Piriyatharawet, Wuttipong Kumwilaisak, Pongsak Lasang
18th International Symposium on Communications and Information Technologies (ISCIT) 2018, 2018.09
Image denoising especially in low-light conditions is a very challenging task because the noise characteristics of different noise sources contributed in varying proportions at different signal levels are difficult to capture. Inaccurate modeling of the noise often results in producing undesirable artifacts in the restored images. This paper presents a new image denoising method in Poisson noise called Deep Convolutional and Multidirectional Long-short Term Memory Networks (DCSLNet). Deep Convolutional Neural Network (CNN) is firstly used to extract features and estimate noise components of the images. Multi-directional Long-short Term Memory (LSTM) network is then introduced in the second stage to effectively capture the long-range correlations of the noise in the deeper CNN layers. The proposed DCSLNet is trainable end-to-end to restore the clean image from the input noisy image. Experimental results in both subjective and objective qualities show that the proposed DCSLNet is very competitive in denoising low-light images under heavy-noise conditions compared with the other state-of-the-art Poisson image denoising methods.
ThinNet: An Efficient Convolutional Neural Network for Object Detection
Sen Cao, Yazhou Liu, Changxin Zhou, Quan-Sen Sun, Pongsak Lasang, Shengmei Shen
24th International Conference on Pattern Recognition (ICPR) 2018, 2018.08
Great advances have been made for the deep networks, but relatively high memory and computation requirements limit their applications in the embedded device. In this paper, we introduce a class of efficient network architecture named ThinNet mainly for object detection applications on memory and computation limited platforms. The new architecture is based on two proposed modules: Front module and Tinier module. The Front module reduce the information loss from raw input images by utilizing more convolution layers with small size filters. The Tinier module use pointwise convolution layers before conventional convolution layer to decrease model size and computation, while ensuring the detection accuracy. Experimental evaluations on ImageNet classification and PASCAL VOC object detection datasets demonstrate the superior performance of ThinNet over other popular models. Our pretrained classification model(ThinNet_C) attains the same top-l and top-5 performance as the classic AlexNet but only with 1/50th the parameters. The detection model also obtains significant improvements over other detection methods, while requiring smaller model size to achieve high performance
Jian Zhao, Lin Xiong, Yu Cheng, Yi Cheng, Jianshu Li, Li Zhou, Yan Xu, Jayashree Karlekar, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng
2018 International Joint Conference on Artificial Intelligence (IJCAI), 2018.07
Learning from synthetic faces, though perhaps appealing for high data efficiency, may not bring satisfactory performance due to the distribution discrepancy of the synthetic and real face images. To mitigate this gap, we propose a 3D-Aided Deep Pose-Invariant Face Recognition Model (3D-PIM), which automatically recovers realistic frontal faces from arbitrary poses through a 3D face model in a novel way. Specifically, 3D-PIM incorporates a simulator with the aid of a 3D Morphable Model (3D MM) to obtain shape and appearance prior for accelerating face normalization learning, requiring less training data. It further leverages a global-local Generative Adversarial Network (GAN) with multiple critical improvements as a refiner to enhance the realism of both global structures and local details of the face simulator’s output using unlabelled real data only, while preserving the identity information. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks clearly demonstrate superiority of the proposed model over state-of-the-arts.
Pose variation is one key challenge in face recognition. As opposed to current techniques for pose invariant face recognition, which either directly extract pose invariant features for recognition, or first normalize profile face images to frontal pose before feature extraction, we argue that it is more desirable to perform both tasks jointly to allow them to benefit from each other. To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties. First, PIM is a novel and unified deep architecture, containing a Face Frontalization sub-Net (FFN) and a Discriminative Learning sub-Net (DLN), which are jointly learned from end to end. Second, FFN is a well-designed dual-path Generative Adversarial Network (GAN) which simultaneously perceives global structures and local details, incorporated with an unsupervised cross-domain adversarial training and a “learning to learn” strategy for high-fidelity and identity-preserving frontal view synthesis. Third, DLN is a generic Convolutional Neural Network (CNN) for face recognition with our enforced cross-entropy optimization strategy for learning discriminative yet generalized feature representation. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks demonstrate the superiority of the proposed model over the state-of-the-arts.
Dual-Mode Vehicle Motion Pattern Learning for High Performance Road Traffic Anomaly Detection
Yan Xu, Xi Ouyang, Yu Cheng, Shining Yu, Lin Xiong, Sugiri Pranata, Shengmei Shen, Junliang Xing
2018 International Joint Conference on Artificial Intelligence (IJCAI), 2018.06
Dual-Mode Vehicle Motion Pattern Learning for High Performance Road Traffic Anomaly Detection
Yan Xu, Xi Ouyang, Yu Cheng, Shining Yu, Lin Xiong, Sugiri Pranata, Shengmei Shen, Junliang Xing
2018 International Joint Conference on Artificial Intelligence (IJCAI), 2018.06
Anomaly detection on road traffic is an important task due to its great potential in urban traffic management and road safety. It is also a very challenging task since the abnormal event happens very rarely and exhibits different behaviors. In this work, we present a model to detect anomaly in road traffic by learning from the vehicle motion patterns in two distinctive yet correlated modes, i.e., the static mode and the dynamic mode, of the vehicles. The static mode analysis of the vehicles is learned from the background modeling followed by vehicle detection procedure to find the abnormal vehicles that keep still on the road. The dynamic mode analysis of the vehicles is learned from detected and tracked vehicle trajectories to find the abnormal trajectory which is aberrant from the dominant motion patterns. The results from the dual-mode analyses are finally fused together by driven a re-identification model to obtain the final anomaly. Experimental results on the Track 2 testing set of NVIDIA AI CITY CHALLENGE show the effectiveness of the proposed dual-mode learning model and its robustness in different real scenes. Our result ranks the first place on the final Leaderboard of the Track 2.
Audio-Visual Emotion Recognition with Capsule-like Feature Representation and Model-Based Reinforcement Learning
Xi Ouyang, Srikanth Nagisetty, Gue Hua Ester Goh, Shengmei Shen, Wan Ding, Huaiping Ming, Dong-Yan Huang2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 2018.05
Audio-Visual Emotion Recognition with Capsule-like Feature Representation and Model-Based Reinforcement Learning
Xi Ouyang, Srikanth Nagisetty, Gue Hua Ester Goh, Shengmei Shen, Wan Ding, Huaiping Ming, Dong-Yan Huang2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 2018.05
This paper presents the techniques used in our contribution to Multimodal Emotion Recognition Challenge (MEC 2017). The purpose of the challenge is to classify the eight basic emotions (happy, sad, angry, worried, anxious, surprise, disgust and neutral) from Chinese Natural Audio-Visual Emotion Database (CHEAVD) 2.0 selected from Chinese movies and TV programs. As racial expressions are caused by the movement of racial features such as the mouth and eyebrows, a capsule like feature representation is proposed to captures not only the existences of static racial emotions in video frames but also the instantiation parameters. In order to further improve the performance of emotion classification accuracy, a model based reinforcement learning is proposed for audio-visual fusion method, which exploits feedbacks of submission on challenge testing dataset as rewards to learn the fusion model. The overall accuracy of proposed approach on test dataset is 52.3% and the macro average precision is 39.7%. The performance achieves the top 2 of the MEC2017 audio-visual sub challenge.