Skip to content

Instantly share code, notes, and snippets.

@ZhangMenghe
Last active April 3, 2019 00:30
Show Gist options
  • Select an option

  • Save ZhangMenghe/77a954d4c60edc85b4a46207465af273 to your computer and use it in GitHub Desktop.

Select an option

Save ZhangMenghe/77a954d4c60edc85b4a46207465af273 to your computer and use it in GitHub Desktop.
[NfP] Unsupervised Learning of Depth, Normal and Ego-motion from Video

Tags: Unsupervised Learning, Monocular Video, KITTI

  • Unsupervised learning framework for task of monocular depth and camera motion estimation from unstructured video sequences.

Backgrounds

Warping-based view synthesis

  • create novel views of a specific subject from images fromdifferent view points
  • explicitly reconstruct the accurate 3D model.
  • forced to learn intermediate precictions of geometry and/or correspondences.
  • View Synthesis itself is mostly a graphics problem, and able to work in an end-to-end learning-based framework. However, in this way, geometry correspondences lose.
  • Related papers:
    • Image-based rendering using image-based priors 2005 Fitzgibbon
    • DeepStereo(learning-based)

Unsupervised learning from videos

  • Pre-text tasks to learn visual features from video data that can later be re-purposed for other vison tasks.
  • Those tasks for example: exploiting geometric constraints in the auto-encoder framework..

Summary of Related Papers

  • Zhou et al. 2017 CVPR
  • This is the original idea to bring view-synthesis into unsupervised learning framework. To be unsupervised, they "warp"(after getting the predictions for tranformation between different frames) the source image back to the target one and compare to the original target frame.
  • Other loss terms include:
    • multi-scale and smoothness of the depth map : avoid low-texture effects or far-away from estimation
    • explainability mask
  • Add-ons:pre-compute instance segmentation masks for moving objects(viechles). Static background only fit to previouse ego-motion model, while objects first fit to E and then fit to object motion model(same structure but do it individually).
  • Add-ons: Normal constrains for depth. If the same surface, consider it to be the same normal. One constrain is to force the orthogonality correlation of depth and normal. They proposed the way to infer surface normal from depth map directly by do cross-product of the vectors forming by 8 neighbors. Then calculate depth map from normal map(standard method) for lacking of ground-truth normal map.
  • Edge-awareness: image gradient + edge net
  • fly-out mask: afterwarping, those pixels outof the boundaries should not be in training process.
  • CVPR 2007
  • Really important algorithm to ouput a dense set of rectangular patches covering the surfaces visible in the input images.
  • It was proposed to work on single model with calibrated multi-view stereopsis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment