Multimedia Laboratory @
Nanyang Technological University
Affiliated with S-Lab



MMLab@NTU was formed on the 1 August 2018, with a research focus on computer vision and deep learning. Its sister lab is MMLab@CUHK. It is now a group with four faculty members and more than 35 members including research fellows, research assistants, and PhD students.

Members in MMLab@NTU conduct research primarily in low-level vision, image and video understanding, creative content creation, 3D scene understanding and reconstruction. Have a look at the overview of our research. All publications are listed here.

We are always looking for motivated PhD students, postdocs, research assistants who have the same interests like us. Check out the careers page and follow us on Twitter.


09/2021: Kelvin Chan and Fangzhou Hong are awarded the very competitive and prestigious Google PhD Fellowship 2021 under the area “Machine Perception, Speech Technology and Computer Vision”!

View more

NeurIPS 2021

09/2021: The team has a total of 8 papers accepted to NeurIPS 2021.

View more

ICCV 2021

07/2021: The team has a total of 11 papers accepted to ICCV 2021 (including one oral).

View more

Three Champions in NTIRE 2021 Challenge

04/2021: NTIRE is the most competitive challenge for low-level vision tasks. With BasicVSR++, we won three Champions in the tracks for video super-resolution and quality enhancement of heavily compressed videos. Congrats to the team!

View more

Check Out

News and Highlights

View more



Path-Restore: Learning Network Path Selection for Image Restoration
K. Yu, X. Wang, C. Dong, X. Tang, C. C. Loy
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021 (TPAMI)
[DOI] [arXiv] [Project Page]

We observe that some corrupted image regions are inherently easier to restore than others since the distortion and content vary within an image. To this end, we propose Path-Restore, a multi-path CNN with a pathfinder that can dynamically select an appropriate route for each image region. We train the pathfinder using reinforcement learning with a difficulty-regulated reward. This reward is related to the performance, complexity and "the difficulty of restoring a region".

GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
K. C. K. Chan, X. Wang, X. Xu, J. Gu, C. C. Loy
in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021 (CVPR, Oral)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We show that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Switching the bank allows the method to deal with images from diverse categories, e.g., cat, building, human face, and car. Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness in comparison to existing methods.

3D Human Texture Estimation from a Single Image with Transformers
X. Xu, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV, Oral)
[PDF] [arXiv] [Project Page]

Texformer estimates high-quality 3D human texture from a single image. The Transformer-based method allows efficient information interchange between the image space and UV texture space.

Focal Frequency Loss for Image Reconstruction and Synthesis
L. Jiang, B. Dai, W. Wu, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize by down-weighting the easy ones. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent bias of neural networks.

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
Y. Zang, C. Huang, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We propose a simple yet effective method, Feature Augmentation and Sampling Adaptation (FASA), that addresses the data scarcity issue by augmenting the feature space especially for rare classes. FASA is a fast, generic method that can be easily plugged into standard or long-tailed segmentation frameworks, with consistent performance gains and little added cost.

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Y. Jiang, Z. Huang, X. Pan, C. C. Loy, Z. Liu
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Unlike previous works that regard the editing as traversing straight lines in the latent space, here the fine-grained editing is formulated as finding a curving trajectory that respects fine-grained attribute landscape on the semantic field. 2) The curvature at each step is location-specific and determined by the input image as well as the users’ language requests.

Unsupervised Object-Level Representation Learning from Scene Images
J. Xie, X. Zhan, Z. Liu, Y. S. Ong, C. C. Loy
in Proceedings of Neural Information Processing Systems, 2021 (NeurIPS)
[arXiv] [Project Page]

We introduce Object-level Representation Learning (ORL), a new self-supervised learning framework towards scene images. Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.