MMLab@NTU
Multimedia Laboratory @
Nanyang Technological University
Affiliated with S-Lab
About
MMLab@NTU
MMLab@NTU was formed on the 1 August 2018, with a research focus on computer vision and deep learning. Its sister lab is MMLab@CUHK. It is now a group with three faculty members and more than 40 members including research fellows, research assistants, and PhD students.
Members in MMLab@NTU conduct research primarily in low-level vision, image and video understanding, creative content creation, 3D scene understanding and reconstruction. Have a look at the overview of our research. All publications are listed here.
We are always looking for motivated PhD students, postdocs, research assistants who have the same interests like us. Check out the careers page and follow us on Twitter.
CVPR 2023
03/2023: The team has a total of 14 papers (including 4 highlights) accepted to CVPR 2023.
ICLR 2023
01/2023: The team has a total of 5 papers (including 2 oral and 1 spotlight papers) accepted to ICLR 2023.
Google PhD Fellowship 2022
11/2022: Yuming Jiang and Jiawei Ren are awarded the very competitive and prestigious Google PhD Fellowship 2022 under the area “Machine Perception, Speech Technology and Computer Vision”.
Check Out
News and Highlights
- 02/2023: Call for Papers: IJCV Special Issue on Mobile Intelligent Photography and Imaging. Full paper submission deadline is June 1st, 2023.
- 10/2022: Chongyi Li, Shuai Yang and Kaiyang Zhou are selected as outstanding reviewers of ECCV 2022. Congrats!
- 09/2022: Call for Papers: IJCV Special Issue on The Promises and Dangers of Large Vision Models. Full paper submission deadline is March 1st, 2023.
- 07/2022: The team has a total of 18 papers (including 3 oral papers) accepted to ECCV 2022.
CVPR 2023
Second MIPI Workshop
Here comes the second workshop of Mobile Intelligent Photography and Imaging (MIPI) to be held in conjunction with CVPR 2023 (Sunday June 18). We organize several challenge tracks and also call for workshop papers.
- Paper submission deadline: Feb 12, 2023
- Challenge start date: Dec 25, 2022
- Challenge end date: Feb 20, 2023
Recent
Projects
VToonify: Controllable High-Resolution Portrait Video Style Transfer
S. Yang, L. Jiang, Z. Liu, C. C. Loy
ACM Transactions on Graphics, 2022 (SIGGRAPH ASIA - TOG)
[arXiv]
[Project Page]
[YouTube]
[Demo]
We present a novel VToonify framework for controllable high-resolution portrait video style transfer. VToonify leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details. The resulting fully convolutional architecture accepts non-aligned faces in videos of variable size as input, contributing to complete face regions with natural motions in the output.
Text2Light: Zero-Shot Text-Driven HDR Panorama Generation
Z. Chen, G. Wang, Z. Liu
ACM Transactions on Graphics, 2022 (SIGGRAPH ASIA - TOG)
[arXiv]
[Project Page]
[YouTube]
[Demo]
We propose a zero-shot text-driven framework, Text2Light, to generate 4K+ resolution HDRIs without paired training data.
Extract Free Dense Labels from CLIP
C. Zhou, C. C. Loy, B. Dai
European Conference on Computer Vision, 2022 (ECCV, Oral)
[arXiv]
[Project Page]
We examine the intrinsic potential of CLIP for pixel-level dense prediction, specifically in semantic segmentation. With minimal modification, we show that MaskCLIP yields compelling segmentation results on open concepts across various datasets in the absence of annotations and fine-tuning. By adding pseudo labeling and self-training, MaskCLIP+ surpasses SOTA transductive zero-shot semantic segmentation methods by large margins.
Open-Vocabulary DETR with Conditional Matching
Y. Zang, W. Li, K. Zhou, C. Huang, C. C. Loy
European Conference on Computer Vision, 2022 (ECCV, Oral)
[arXiv]
[Project Page]
We propose a novel open-vocabulary detector based on DETR, which once trained, can detect any object given its class name or an exemplar image. This first end-to-end Transformer-based open-vocabulary detector achieves non-trivial improvements over current state of the arts.
HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling
Z. Cai, D. Ren, A. Zeng, Z. Lin, T. Yu, W. Wang, X. Fan, Y. Gao, Y. Yu, L. Pan, F. Hong, M. Zhang, C. C. Loy, L. Yang, Z. Liu
European Conference on Computer Vision, 2022 (ECCV, Oral)
[arXiv]
[Project Page]
[YouTube]
We contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames. HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes; 2) popular mobile device is included in the sensor suite; 3) a set of 500 actions, designed to cover fundamental movements; 4) multiple tasks such as action recognition, pose estimation, parametric human recovery, and textured mesh reconstruction are supported and evaluated.
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
F. Hong, M. Zhang, L. Pan, Z. Cai, L. Yang, Z. Liu
ACM Transactions on Graphics, 2022 (SIGGRAPH - TOG)
[arXiv]
[Project Page]
We propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.
Text2Human: Text-Driven Controllable Human Image Generation
Y. Jiang, S. Yang, H. Qiu, W. Wu, C. C. Loy, Z. Liu
ACM Transactions on Graphics, 2022 (SIGGRAPH - TOG)
[arXiv]
[Project Page]
[YouTube]
We present a text-driven controllable framework, Text2Human, for high-quality and diverse human generation. We synthesize full-body human images starting from a given human pose with two dedicated steps: 1) With some texts describing the shapes of clothes, the given human pose is first translated to a human parsing map. 2) The final human image is then generated by providing the system with more attributes about the textures of clothes.