FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation

ICCV 2021

Paper

Abstract

Recent methods for long-tailed instance segmentation still struggle on rare object classes with few training data. We propose a simple yet effective method, Feature Augmentation and Sampling Adaptation (FASA), that addresses the data scarcity issue by augmenting the feature space especially for rare classes. Both the Feature Augmentation (FA) and feature sampling components are adaptive to the actual training status --- FA is informed by the feature mean and variance of observed real samples from past iterations, and we sample the generated virtual features in a loss-adapted manner to avoid over-fitting.

FASA does not require any elaborate loss design, and removes the need for inter-class transfer learning that often involves large cost and manually-defined head/tail class groups. FASA is a fast, generic method that can be easily plugged into standard or long-tailed segmentation frameworks, with consistent performance gains and little added cost. FASA is also applicable to other tasks like long-tailed classification with state-of-the-art performance.

Class imbalance and the comparison of Mask R-CNN baseline with and without FASA on LVIS v1.0 dataset. (a) By adaptive feature augmentation and sampling, our method FASA largely alleviates the imbalance issue, especially for rare classes. (b) Compare the prediction results of FASA vs Mask R-CNN baseline regarding average category probability scores. The baseline model predicts near-zero scores for rare classes. While with FASA, rare-class scores are significantly boosted, which merits final performance. (c) FASA brings consistent improvements over different backbone models in mask APr defined on rare classes. Such gains come at a very low cost (training time only increases by 3% on average).

Paper

Code

The Framework

FASA

The proposed framework consists of two components:

Adaptive Feature Augmentation (FA) that generates virtual features to augment the feature space of all classes (especially for rare classes)
Adaptive Feature Sampling (FS) that dynamically adjusts the sampling probability of virtual features for each class

(a) The pipeline of Mask R-CNN combined with the proposed FASA, a standalone module that generates virtual features to augment the classification branch for better performance on long-tailed data. FASA maintains class-wise feature mean and variance online, followed by (b) adaptive feature augmentation and (c) adaptive feature sampling.

Experimental

Results

Table 1: Comparing state-of-the-art methods with and without our FASA on the LVIS v1.0 validation dataset. We compare with the Mask R-CNN baseline, and state-of-the-art re-sampling approach Repeat Factor Sampling (RFS), Equalization Loss (EQL), Classifier Re-training (cRT)}, Balanced Group Softmax (BAGS) and Seesaw Loss. The `Uniform' method indicates random and uniform sample images. These methods are trained under different training schedules (24 or 12+12 epochs) using the public code. All methods use ResNet-50 as the backbone for fair comparison.

Loss	Sampler	#Epoch	FASA	AP	APr	APc	APf
Softmax CE	Uniform	24	✗	19.3	1.2	17.4	29.3
	Uniform	24	✓	22.6	10.2	21.6	29.2
	RFS	24	✗	22.8	12.9	21.6	28.3
	RFS	24	✓	24.1	17.3	22.9	28.5
EQL	Uniform	24	✗	22.1	5.1	22.4	29.3
EQL	Uniform	24	✓	24.4	15.4	23.5	29.4
cRT	Uniform/RFS	12+12	✗	22.4	12.2	20.4	29.1
cRT	Uniform/RFS	12+12	✓	23.6	15.1	22.0	29.1
BaGS	Uniform/RFS	12+12	✗	22.8	12.4	22.2	28.3
BaGS	Uniform/RFS	12+12	✓	24.0	15.2	23.4	28.3
Seesaw	RFS	24	✗	26.4	19.6	26.1	29.8
Seesaw	RFS	24	✓	27.5	21.0	27.5	30.1

Table 2: Comparing state-of-the-art methods with and without FASA, using large backbones (ResNet-101, ResNeXt-101-32x8d) and Cascade Mask R-CNN.

Method	Loss	Sampler	Backbone	FASA	AP	APr	APc	APf
Mask R-CNN	Softmax CE	RFS	R101	✗	24.4	13.2	24.7	30.3
Mask R-CNN	Softmax CE	RFS	R101	✓	26.3	19.1	25.4	30.6
Mask R-CNN	Softmax CE	RFS	X101	✗	26.1	16.1	24.9	32.0
Mask R-CNN	Softmax CE	RFS	X101	✓	27.7	20.7	26.6	32.0
Cascade Mask R-CNN	Softmax CE	RFS	R101	✗	25.4	13.7	24.8	31.4
Cascade Mask R-CNN	Softmax CE	RFS	R101	✓	27.7	19.8	27.3	31.6
Cascade Mask R-CNN	Seesaw	RFS	R101	✗	30.1	21.4	30.0	33.9
Cascade Mask R-CNN	Seesaw	RFS	R101	✓	31.5	24.1	31.9	34.9

Result

Visualization

To better interpret the result, we show the segmentation results of selected rare classes. We observe that without FASA, the prediction scores for rare classes are small or even missed. On the contrary, with the help of our FASA, the classification results of the rare classes become accurate.

Prediction results of Mask R-CNN framework without and with FASA on the LVIS v1.0 validation set. We select six rare classes `saucepan', `crouton', `date (fruit)', `koala', `softball' and `bonnet' to visualize. With the help of FASA, Mask R-CNN yields more correct classification results than the baseline.

Paper

Citation

@InProceedings{zang2021fasa,
author = {Zang, Yuhang and Huang, Chen and Loy, Chen Change},
title = {FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation},
booktitle = {Proceedings of IEEE/CVF International Conference on Computer Vision},
year = {2021}
}

Projects

Open-Vocabulary DETR with Conditional Matching
Y. Zang, W. Li, K. Zhou, C. Huang, C. C. Loy
European Conference on Computer Vision, 2022 (ECCV, Oral)
[arXiv] [Project Page]
Seesaw Loss for Long-Tailed Instance Segmentation
J. Wang, W. Zhang, Y. Zang, Y. Cao, J. Pang, T. Gong, K. Chen, Z. Liu, C. C. Loy, D. Lin
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2021 (CVPR)
[PDF] [Supplementary Material] [arXiv]
Hybrid Task Cascade for Instance Segmentation
K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, C. C. Loy, D. Lin
in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019 (CVPR)
[PDF] [arXiv] [Project Page]

Contact

Yuhang Zang
Email: zang0012 at e.ntu.edu.sg

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation

Paper

Abstract

Paper

Code

The Framework

FASA

Experimental

Results

Result

Visualization

Paper

Citation

Related

Projects