Multimedia Laboratory @
Nanyang Technological University
Affiliated with S-Lab



MMLab@NTU was formed on the 1 August 2018, with a research focus on computer vision and deep learning. Its sister lab is MMLab@CUHK. It is now a group with three faculty members and more than 40 members including research fellows, research assistants, and PhD students.

Members in MMLab@NTU conduct research primarily in low-level vision, image and video understanding, creative content creation, 3D scene understanding and reconstruction. Have a look at the overview of our research. All publications are listed here.

We are always looking for motivated PhD students, postdocs, research assistants who have the same interests like us. Check out the careers page and follow us on Twitter.

New Challenges

07/2022: We are hosting PointCloud-C Challenge (robustness of 3D models, deadline: Sep 19, 2022) and OmniBenchmark Challenge (generalization of 2D models, deadline: Oct 9, 2022).

View more

ECCV 2022

07/2022: The team has a total of 18 papers (including 3 oral papers) accepted to ECCV 2022. We will release our papers and code.

View more

CVPR 2022

03/2022: The team has a total of 18 papers (including 6 oral papers) accepted to CVPR 2022.

View more


12/2021: Haonan Qiu, Bo Li, Yuhan Wang, Siyao Li, Quanzhou Li, Jianyi Wang are awarded the competitive AISG Fellowship 2022 to pursue their PhD study. Congrats!

View more

Check Out

News and Highlights

  • 12/2021: We release MMHuman3D, a new toolbox under OpenMMLab, for the use of 3D human parametric models in computer vision and computer graphics.
  • 09/2021: Kelvin Chan and Fangzhou Hong are awarded the very competitive and prestigious Google PhD Fellowship 2021 under the area “Machine Perception, Speech Technology and Computer Vision”.
  • 09/2021: The team has a total of 8 papers accepted to NeurIPS 2021.
  • 09/2021: Six outstanding ICCV 2021 reviewers from our team! Congrats to Chongyi Li, Kelvin Chan, Jingkang Yang, Liang Pan, Zhongang Cai, and Kaiyang Zhou.
  • 07/2021: We organize two challenges in conjunction with ICCV 2021 Sensing, Understanding and Synthesizing Humans Workshop, namely, MVP Point Cloud Challenge and Face Forgery Analysis Challenge. The deadline has passed. Check out the workshop for more details.
  • 07/2021: The team has a total of 11 papers accepted to ICCV 2021 (including one oral).

View more

ECCV 2022

MIPI Workshop

We organize a new workshop called Mobile Intelligent Photography and Imaging (MIPI) in conjunction with ECCV 2022. We invited a cool lineup of speakers from both academia and industry to share their recent work.

Submit a paper to the workshop (deadline is August 8, 2022) or check out the website to participate in exciting challenges and win great prizes!



AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
F. Hong, M. Zhang, L. Pan, Z. Cai, L. Yang, Z. Liu
ACM Transactions on Graphics, 2022 (SIGGRAPH - TOG)
[arXiv] [Project Page]

We propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
S. Yang, L. Jiang, Z. Liu, C. C. Loy
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2022 (CVPR)
[arXiv] [Project Page] [YouTube]

We extend StyleGAN to accept style condition from new domains while preserving its style control in the original domain. This results in an interesting application of high-resolution exemplar-based portrait style transfer with a friendly data requirement. DualStyleGAN, with an additional style path to StyleGAN, can effectively model and modulate the intrinsic and extrinsic styles for flexible and diverse artistic portrait generation.

Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data
L. Jiang, B. Dai, W. Wu, C. C. Loy
in Proceedings of Neural Information Processing Systems, 2021 (NeurIPS)
[arXiv] [PDF] [Project Page] [YouTube]

We introduce a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively.

K-Net: Towards Unified Image Segmentation
W. Zhang, J. Pang, K. Chen, C. C. Loy
in Proceedings of Neural Information Processing Systems, 2021 (NeurIPS)
[arXiv] [Project Page]

Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class.

Unsupervised Object-Level Representation Learning from Scene Images
J. Xie, X. Zhan, Z. Liu, Y. S. Ong, C. C. Loy
in Proceedings of Neural Information Processing Systems, 2021 (NeurIPS)
[arXiv] [Project Page]

We introduce Object-level Representation Learning (ORL), a new self-supervised learning framework towards scene images. Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.

Path-Restore: Learning Network Path Selection for Image Restoration
K. Yu, X. Wang, C. Dong, X. Tang, C. C. Loy
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021 (TPAMI)
[DOI] [arXiv] [Project Page]

We observe that some corrupted image regions are inherently easier to restore than others since the distortion and content vary within an image. To this end, we propose Path-Restore, a multi-path CNN with a pathfinder that can dynamically select an appropriate route for each image region. We train the pathfinder using reinforcement learning with a difficulty-regulated reward. This reward is related to the performance, complexity and "the difficulty of restoring a region".

3D Human Texture Estimation from a Single Image with Transformers
X. Xu, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV, Oral)
[PDF] [arXiv] [Project Page]

Texformer estimates high-quality 3D human texture from a single image. The Transformer-based method allows efficient information interchange between the image space and UV texture space.

Focal Frequency Loss for Image Reconstruction and Synthesis
L. Jiang, B. Dai, W. Wu, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize by down-weighting the easy ones. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent bias of neural networks.

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
Y. Zang, C. Huang, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We propose a simple yet effective method, Feature Augmentation and Sampling Adaptation (FASA), that addresses the data scarcity issue by augmenting the feature space especially for rare classes. FASA is a fast, generic method that can be easily plugged into standard or long-tailed segmentation frameworks, with consistent performance gains and little added cost.

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Y. Jiang, Z. Huang, X. Pan, C. C. Loy, Z. Liu
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Unlike previous works that regard the editing as traversing straight lines in the latent space, here the fine-grained editing is formulated as finding a curving trajectory that respects fine-grained attribute landscape on the semantic field. 2) The curvature at each step is location-specific and determined by the input image as well as the users’ language requests.

GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
K. C. K. Chan, X. Wang, X. Xu, J. Gu, C. C. Loy
in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021 (CVPR, Oral)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We show that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Switching the bank allows the method to deal with images from diverse categories, e.g., cat, building, human face, and car. Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness in comparison to existing methods.