[1609.03677v1] Unsupervised Monocular Depth Estimation with Left-Right Consistency

Unsupervised Monocular Depth Estimation. Awesome red-eye read!  #deeplearning #depth #stereo

  • Abstract: Learning based methods have shown very promising results for the task of depth estimation in single images.
  • By exploiting epipolar geometry constraints, we generate disparity images by training our networks with an image reconstruction loss.
  • Most existing approaches treat depth prediction as a supervised regression problem and as a result, require vast quantities of corresponding ground truth depth data for training.
  • Our method produces state of the art results for monocular depth estimation on the KITTI driving dataset, even outperforming supervised methods that have been trained with ground truth depth.
  • We propose a novel training objective that enables our convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data.

Continue reading “[1609.03677v1] Unsupervised Monocular Depth Estimation with Left-Right Consistency”

Up to Speed on Deep Learning: July Update, Part 2

Up to Speed on #DeepLearning: July Update, Part 2

  • The series introduces machine learning in four detailed segments: spanning an introduction to machine learning to an in-depth convolutional neural network implementation for face recognition.
  • Part 4 of Adam’s series Machine Learning is Fun.
  • Are the three prior parts: part 1 , part 2 , and part 3 .
  • Isaac’s background is in machine learning & artificial intelligence, having been previously an entrepreneur and data scientist.
  • Learn about artificial neural networks and how they’re being used for machine learning, as applied to speech and object recognition, image segmentation, modeling language and human motion, etc.


Check out this second installation of deep learning stories that made news in July. See if there are any items of note you missed.

Continue reading “Up to Speed on Deep Learning: July Update, Part 2”

DeepHand Tackles Tough Challenge Using Our Hands in VR

Researchers develop a #deeplearning-powered system for interpreting hand movements in #VR.

  • “I’ve always wanted to design and develop our hands as a key part of a user interface element, because we do so much in the real world with our hands so naturally,” Ramani said. “
  • In the real world, our hands are our guides.
  • Parts of the fingers and hands often block the view of the camera, making interpretation of hand motions sometimes impossible.
  • The use of hand gestures offers smart and intuitive communication with 3D objects.”
  • By combining depth-sensing cameras and a convolutional neural network trained on GPUs to interpret 2.5 million hand poses and configurations, the team has taken us a large step closer to being able to use our dexterity while interacting with 3D virtual objects.

DeepHand is a deep learning-powered system for interpreting hand movements in virtual environments.
Continue reading “DeepHand Tackles Tough Challenge Using Our Hands in VR”

Reflectance Modeling by Neural Texture Synthesis

Normal and texture maps from a single photograph by machine learning, @jaakkolehtinen et al.

  • We extend parametric texture synthesis to capture rich, spatially varying parametric reflectance models from a single image.
  • Our input is a single head-lit flash image of a mostly flat, mostly stationary (textured) surface, and the output is a tile of SVBRDF parameters that reproduce the appearance of the material.
  • Our key insight is to make use of a recent, powerful texture descriptor based on deep convolutional neural network statistics for “softly” comparing the model prediction and the examplars without requiring an explicit point-to-point correspondence between them.
  • The work was supported by the Academy of Finland (grant 277833).
  • To appear in ACM Transactions on Graphics (Proc.

Read the full article, click here.


@morgan3d: “Normal and texture maps from a single photograph by machine learning, @jaakkolehtinen et al.”


We extend parametric texture synthesis to capture rich, spatially varying parametric reflectance models from a single image. Our input is a single head-lit flash image of a mostly flat, mostly stationary (textured) surface, and the output is a tile of SVBRDF parameters that reproduce the appearance of the material. No user intervention is required. Our key insight is to make use of a recent, powerful texture descriptor based on deep convolutional neural network statistics for “softly” comparing the model prediction and the examplars without requiring an explicit point-to-point correspondence between them. This is in contrast to traditional reflectance capture that requires pointwise constraints between inputs and outputs under varying viewing and lighting conditions. Seen through this lens, our method is an indirect algorithm for fitting photorealistic SVBRDFs. The problem is severely ill-posed and non-convex. To guide the optimizer towards desirable solutions, we introduce a soft Fourier-domain prior for encouraging spatial stationarity of the reflectance parameters and their correlations, and a complementary preconditioning technique that enables efficient exploration of such solutions by L-BFGS, a standard non-linear numerical optimizer.


Reflectance Modeling by Neural Texture Synthesis