Our Meta R-CNN consists of 1) Faster/MaskR-CNN;2)Predictor-head Remodeling Network (PRN). Faster/ Mask RCNN (module) receives an image to produce RoI features, by taking RoIAlign on the image region proposals extracted by RPN.In parallel,our PRN receives K-shot m-class resized images with their structure labels (bounding boxes/segmentaion masks) to infer m class-attentive vectors. Given a class attentive vector representing class c,it takes a channel-wise soft-attention on each RoI feature,encouraging the Faster/ Mask R-CNN predictor heads to detect or segment class-c objects based on the RoI features in the image. As the class c is dynamically determined by the inputs of PRN, Meta R-CNN is a meta-learner.
|