total # positive classes <<< total # negative classes
example: identifying fradulent claims
There may not be many fradulent claims, so the classifier will tend to classify fraudulent claims as genuine.
- Model 1: classified 7/10 fraudulent transactions as genuine. 10/10,000 genuine transactions as fraudulent = 17 "mistakes"
- Model 2: classified 2/10 fraudulent transactions as genuine. 100/10,000 genuine transactions as fraudulent = 102 "mistakes"
Since we want to minimize fraudulent transactions as genuine, model 2 actually performs better even though it made more "mistakes". Therefore, it is good to not base performance on mistakes, but on true positive (TP) rate, true negative (TN) rate, FP rate, FN rate.
| Formula | Performance |
|---|---|
| TP Rate = TP / (TP + FP) | Close to 1 = good |
| TN Rate = TN / (TN + FN) | Close to 1 = good |
| FP Rate = FP / (FP + TN) | Close to 0 = good |
| FN Rate = FN / (FN + TP) | Close to 0 = good |
-
Cost Function Based Approach - think one false negative as worse than one false positive. (weigh false negatives more)
- i.e. thinking a claim was genuine but it was actually a fraud would be weighted with a larger cost than one that thought a claim was a fraud but it was actually geunine is less bad and therefore has lower cost
-
Sampling Based Approach
- oversampling: adding more of the minority class - might have to deal with overfitting of minority class
- undersampling: removing more of the majority class - may risk moving more representative instances of majority class
- Reduces # of pixels in the image, i.e.
shrinking the image. Then, when you want to make the image the same size as it was previously, you will need toincrease the pixel size - Example: reduce a 512x512 image to 256x256 =
factor of 2 downsamplingin horizontal and vertical directions
- Increases the # of pixels in the image, i.e.
enlarging the image. The added pixels are estimated from surrounding samples.
- Used for
recognizing objectsatvastly different scales Scale-Invariantbecause the object's scale change is offset byshifting its level in the pyramid- Feature maps
close to the image layerare composed oflow-level structuresnot effective for accurate object detection
Feature Pyramid Network (FPN)is composed of abottom-upandtop-downpathwaybottom-upis useful forfeature extraction(spatial resolution decreasesas you go up to the top layers of the pyramid and view a smaller version of the object, i.e. thesemantic value increases)
FPNuses atop-down pathwayto constructhigher resolution layersfrom asemantic rich layer- The
bottom-up pathwayusesResNet
- Because a CNN has shared weights, it is
not able to estimate the absolute positionin an image,anchor boxesmake it possible so the CNN only needs topredict the relative transformation for each anchor box(anchor box is thebounding box)
- RetinaNet can match the speed of
one-stage detectorsand surpass the accuracy of thetwo-stage detectors. one-stage detectorshave typically had worse accuracy thantwo-stage detectors- why? ->class imbalance problem- RetinaNet addresses problem that
one-stage detectorshave withclass imbalancebetween foreground and background of the image during training of dense detectors - how? ->reshaping the standard cross entropy loss, i.e. itdown-weightstheloss assigned to well-classified examples. (want to minimize loss, now well-classified examples don't help as much for the loss) - The
losswill focus training on asparse set of hard examplesand prevent the large number of easy negatives from overwhelming the detector. This loss is calledFocal Loss. - Uses a
dense samplingof object locations in an input image and anin-network feature pyramidandanchor boxes
- C_i is just a type of convolution, for example,
conv5 = 256 3x3 filters at stride 1, pad 1 - In the
top-down pathway, apply a1x1 convolution filter
well-classified examples:p_t > 0.5Scaling factordecays to0asconfidencein thecorrectclassincreases(loss low atwell-classified examples)
- gamma = 5, p_t = 0.1
bad classified, then -(1-0.1)^5 * log(0.1) =1.36 loss - gamma = 5, p_t = 0.9
well classified, then -(1-0.9)^5 * log(0.9) =1.05E-6 loss ~ 0 loss
- RetinaNet outperforms Faster R-CNN, a two-stage detector
- SSD does
not select bottom layers of the pyramidfor object detection, since thesemantic valueis not high enough to justify its use as itsignificantly reduces speed(SSD uses upper layers for detection - performsworseonsmall objects)
-
Must process a much larger set of candidate object locations regularly sampled across an image (background part of image still dominates even if using a sampling heuristic)
-
RetinaNet
-
YOLO
-
SSD
-
Stage 1:
Class imbalanceis addressed through the proposal stage (Selective Search, Edge Boxes, DeepMask, RPN) tonarrow down # of candidate object locations, filtering most background samples -
Stage 2: sampling heuristics like a fixed foreground-to-background ratio are performed to maintain a balance between foreground and background
-
Faster R-CNN
-
Mask R-CNN










