Multimedia Laboratory @
Nanyang Technological University
Affiliated with S-Lab



MMLab@NTU was formed on the 1 August 2018, with a research focus on computer vision and deep learning. Its sister lab is MMLab@CUHK. It is now a group with four faculty members and more than 35 members including research fellows, research assistants, and PhD students.

Members in MMLab@NTU conduct research primarily in low-level vision, image and video understanding, creative content creation, 3D scene understanding and reconstruction. Have a look at the overview of our research. All publications are listed here.

We are always looking for motivated PhD students, postdocs, research assistants who have the same interests like us. Check out the careers page and follow us on Twitter.

MVP Point Cloud Challenge @ ICCV 2021

07/2021: Join this challenge to evaluate your point cloud completion and registration methods. Deadline on September 12, 2021.

View more

ForgeryNet: Face Forgery Analysis Challenge @ ICCV 2021

07/2021: Join this challenge to benchmark your anti-deepfake methods on the largest face forgery dataset. Deadline on September 12, 2021.

View more

Three Champions in NTIRE 2021 Challenge

04/2021: NTIRE is the most competitive challenge for low-level vision tasks. With BasicVSR++, we won three Champions in the tracks for video super-resolution and quality enhancement of heavily compressed videos. Congrats to the team!

View more

ICCV 2021

07/2021: The team has a total of 11 papers accepted to ICCV 2021 (including one oral).

View more

Check Out

News and Highlights

View more



Path-Restore: Learning Network Path Selection for Image Restoration
K. Yu, X. Wang, C. Dong, X. Tang, C. C. Loy
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021 (TPAMI)
[DOI] [arXiv] [Project Page]

We observe that some corrupted image regions are inherently easier to restore than others since the distortion and content vary within an image. To this end, we propose Path-Restore, a multi-path CNN with a pathfinder that can dynamically select an appropriate route for each image region. We train the pathfinder using reinforcement learning with a difficulty-regulated reward. This reward is related to the performance, complexity and "the difficulty of restoring a region".

GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
K. C. K. Chan, X. Wang, X. Xu, J. Gu, C. C. Loy
in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021 (CVPR, Oral)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We show that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). Switching the bank allows the method to deal with images from diverse categories, e.g., cat, building, human face, and car. Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness in comparison to existing methods.

3D Human Texture Estimation from a Single Image with Transformers
X. Xu, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV, Oral)
[PDF] [Project Page]

Texformer estimates high-quality 3D human texture from a single image. The Transformer-based method allows efficient information interchange between the image space and UV texture space.

Focal Frequency Loss for Image Reconstruction and Synthesis
L. Jiang, B. Dai, W. Wu, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Project Page]

We show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize by down-weighting the easy ones. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent bias of neural networks.

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
Y. Zang, C. Huang, C. C. Loy
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Supplementary Material] [Project Page]

We propose a simple yet effective method, Feature Augmentation and Sampling Adaptation (FASA), that addresses the data scarcity issue by augmenting the feature space especially for rare classes. FASA is a fast, generic method that can be easily plugged into standard or long-tailed segmentation frameworks, with consistent performance gains and little added cost.

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Y. Jiang, Z. Huang, X. Pan, C. C. Loy, Z. Liu
in Proceedings of IEEE/CVF International Conference on Computer Vision, 2021 (ICCV)
[PDF] [arXiv] [Project Page]

We propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Unlike previous works that regard the editing as traversing straight lines in the latent space, here the fine-grained editing is formulated as finding a curving trajectory that respects fine-grained attribute landscape on the semantic field. 2) The curvature at each step is location-specific and determined by the input image as well as the users’ language requests.

Unsupervised Object-Level Representation Learning from Scene Images
J. Xie, X. Zhan, Z. Liu, Y. S. Ong, C. C. Loy
Technical report, arXiv:2106.11952, 2021
[arXiv] [Project Page]

We introduce Object-level Representation Learning (ORL), a new self-supervised learning framework towards scene images. Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.



ICCV 2021 - The 3rd Workshop on

Sensing, Understanding and Synthesizing Humans

In our workshop this year, we are organizing two exciting challenges.

In MVP Point Cloud Challenge, you can compete with others using your point cloud completion and registration methods based on the newly proposed MVP dataset, a high-quality multi-view partial point cloud dataset. It contains over 100,000 high-quality scans of partial 3D shapes rendered from 26 uniformly distributed camera poses for each 3D CAD model.

In ForgeryNet: Face Forgery Analysis Challenge, you will benchmark your anti-deepfake methods on the largest face forgery data ForgeryNet.

Both challenges have the deadline on September 19, 2021. Great prizes up for grabs!