MMLab@NTU

Multimedia Laboratory @
Nanyang Technological University
Affiliated with S-Lab

About

MMLab@NTU

MMLab@NTU was formed on the 1 August 2018, with a research focus on computer vision and deep learning. Its sister lab is MMLab@CUHK. It is now a group with three faculty members and more than 40 members including research fellows, research assistants, and PhD students.

Members in MMLab@NTU conduct research primarily in low-level vision, image and video understanding, creative content creation, 3D scene understanding and reconstruction. Have a look at the overview of our research. All publications are listed here.

We are always looking for motivated PhD students, postdocs, research assistants who have the same interests like us. Check out the careers page and follow us on Twitter.

The AI Talks

09/2022: We launch a new initiative, The AI Talks, inviting active researchers from all over the globe to share their latest research in AI, machine learning, computer vision etc. Subscribe the newsletter here.

View more

New Challenges

07/2022: We are hosting PointCloud-C Challenge (robustness of 3D models, deadline: Sep 19, 2022) and OmniBenchmark Challenge (generalization of 2D models, deadline: Oct 9, 2022).

View more

ECCV 2022

07/2022: The team has a total of 18 papers (including 3 oral papers) accepted to ECCV 2022.

View more

CVPR 2022

03/2022: The team has a total of 18 papers (including 6 oral papers) accepted to CVPR 2022.

View more

Check Out

News and Highlights

  • 09/2022: Call for Papers: IJCV Special Issue on The Promises and Dangers of Large Vision Models. Full paper submission deadline is March 1st, 2023.
  • 07/2022: Our journal paper 'Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation' was selected as the `Most Popular Article' by IEEE Transactions on Pattern Analysis and Machine Intelligence in July 2022.
  • 12/2021: Haonan Qiu, Bo Li, Yuhan Wang, Siyao Li, Quanzhou Li, Jianyi Wang are awarded the competitive AISG Fellowship 2022 to pursue their PhD study. Congrats!
  • 12/2021: We release MMHuman3D, a new toolbox under OpenMMLab, for the use of 3D human parametric models in computer vision and computer graphics.
  • 09/2021: Kelvin Chan and Fangzhou Hong are awarded the very competitive and prestigious Google PhD Fellowship 2021 under the area “Machine Perception, Speech Technology and Computer Vision”.
  • 09/2021: The team has a total of 8 papers accepted to NeurIPS 2021.
  • 09/2021: Six outstanding ICCV 2021 reviewers from our team! Congrats to Chongyi Li, Kelvin Chan, Jingkang Yang, Liang Pan, Zhongang Cai, and Kaiyang Zhou.
  • 07/2021: We organize two challenges in conjunction with ICCV 2021 Sensing, Understanding and Synthesizing Humans Workshop, namely, MVP Point Cloud Challenge and Face Forgery Analysis Challenge. The deadline has passed. Check out the workshop for more details.
  • 07/2021: The team has a total of 11 papers accepted to ICCV 2021 (including one oral).

View more

ECCV 2022

MIPI Workshop

We organize a new workshop called Mobile Intelligent Photography and Imaging (MIPI) in conjunction with ECCV 2022 (Sunday Oct. 23). We invited a cool lineup of speakers from both academia and industry to share their recent work. Come and join us!

Recent

Projects

VToonify: Controllable High-Resolution Portrait Video Style Transfer
S. Yang, L. Jiang, Z. Liu, C. C. Loy
ACM Transactions on Graphics, 2022 (SIGGRAPH ASIA - TOG)
[arXiv] [Project Page] [YouTube] [Demo]

We present a novel VToonify framework for controllable high-resolution portrait video style transfer. VToonify leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details. The resulting fully convolutional architecture accepts non-aligned faces in videos of variable size as input, contributing to complete face regions with natural motions in the output.

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation
Z. Chen, G. Wang, Z. Liu
ACM Transactions on Graphics, 2022 (SIGGRAPH ASIA - TOG)
[arXiv] [Project Page] [YouTube] [Demo]

We propose a zero-shot text-driven framework, Text2Light, to generate 4K+ resolution HDRIs without paired training data.

Extract Free Dense Labels from CLIP
C. Zhou, C. C. Loy, B. Dai
European Conference on Computer Vision, 2022 (ECCV, Oral)
[arXiv] [Project Page]

We examine the intrinsic potential of CLIP for pixel-level dense prediction, specifically in semantic segmentation. With minimal modification, we show that MaskCLIP yields compelling segmentation results on open concepts across various datasets in the absence of annotations and fine-tuning. By adding pseudo labeling and self-training, MaskCLIP+ surpasses SOTA transductive zero-shot semantic segmentation methods by large margins.

Open-Vocabulary DETR with Conditional Matching
Y. Zang, W. Li, K. Zhou, C. Huang, C. C. Loy
European Conference on Computer Vision, 2022 (ECCV, Oral)
[arXiv] [Project Page]

We propose a novel open-vocabulary detector based on DETR, which once trained, can detect any object given its class name or an exemplar image. This first end-to-end Transformer-based open-vocabulary detector achieves non-trivial improvements over current state of the arts.

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
Z. Cai, D. Ren, A. Zeng, Z. Lin, T. Yu, W. Wang, X. Fan, Y. Gao, Y. Yu, L. Pan, F. Hong, M. Zhang, C. C. Loy, L. Yang, Z. Liu
European Conference on Computer Vision, 2022 (ECCV, Oral)
[arXiv] [Project Page] [YouTube]

We contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames. HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes; 2) popular mobile device is included in the sensor suite; 3) a set of 500 actions, designed to cover fundamental movements; 4) multiple tasks such as action recognition, pose estimation, parametric human recovery, and textured mesh reconstruction are supported and evaluated.

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
F. Hong, M. Zhang, L. Pan, Z. Cai, L. Yang, Z. Liu
ACM Transactions on Graphics, 2022 (SIGGRAPH - TOG)
[arXiv] [Project Page]

We propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.

Text2Human: Text-Driven Controllable Human Image Generation
Y. Jiang, S. Yang, H. Qiu, W. Wu, C. C. Loy, Z. Liu
ACM Transactions on Graphics, 2022 (SIGGRAPH - TOG)
[arXiv] [Project Page] [YouTube]

We present a text-driven controllable framework, Text2Human, for high-quality and diverse human generation. We synthesize full-body human images starting from a given human pose with two dedicated steps: 1) With some texts describing the shapes of clothes, the given human pose is first translated to a human parsing map. 2) The final human image is then generated by providing the system with more attributes about the textures of clothes.

Explore

MMLab@NTU