Meta R-CNN

Meta R-CNN : Towards General Solver for Instance-level Low-shot Learning

Xiaopeng Yan^1*

Ziliang Chen^1*

Anni Xu¹

Xiaoxi Wang¹

Xiaodan Liang^1,2

Liang Lin^1,2

¹Sun Yat-sen University

²DarkMatter AI Research

In ICCV 2019

[pdf]

[code]

[bibtex]

[supp]

Resembling the rapid learning capability of human, lowshot learning empowers vision systems to understand new concepts by training with few samples. Leading approaches derived from meta-learning on images with a single visual object. Obfuscated by a complex background and multiple objects in one image, they are hard to promote the research of low-shot object detection/segmentation. In this work, we present a flexible and general methodology to achieve these tasks.

Meta R-CNN

Our Meta R-CNN consists of 1) Faster/MaskR-CNN;2)Predictor-head Remodeling Network (PRN). Faster/ Mask RCNN (module) receives an image to produce RoI features, by taking RoIAlign on the image region proposals extracted by RPN.In parallel,our PRN receives K-shot m-class resized images with their structure labels (bounding boxes/segmentaion masks) to infer m class-attentive vectors. Given a class attentive vector representing class c,it takes a channel-wise soft-attention on each RoI feature,encouraging the Faster/ Mask R-CNN predictor heads to detect or segment class-c objects based on the RoI features in the image. As the class c is dynamically determined by the inputs of PRN, Meta R-CNN is a meta-learner.

Low-shot Object Detection

AP and mAP on VOC2007 test set for novel classes and base classes of the first base/novel split. We evaluate the performance for 3/10-shot novel-class examples with FRCN under ResNet-101. RED/BLUE indicate the SOTA/the second best. (Best viewd in color)

Low-shot Object Segmentation

Low-shot detection and instance segmentation performance on COCO minival set for novel classes under Mask R-CNN with ResNet-50. The evaluation based on 5/10/20-shot-object in novel classes.