Faster R-CNN for object detection
- Scan the image and generate the proposals(areas likely to contain an object)
- Classify the proposals and generate bounding boxes and masks.
ResNet50/ResNet101: feature extractor(early layers for low-level features, later for high-level features.)
Better representation of objects at multiple scales.
Scan the image(actually backbone feature map) parallely in a sliding-window fashion and find areas that contain objects
Generate classes and BBox refinements
Since ROI boxes can have different sizes, ROI pooling is used to crop a part of feature map and resize to fixed size. ##(Special for MaskR-CNN) Segmentation Masks CNN takes the positive regions from ROI and generates masks for them. The masks are soft, represented by float numbers.