Worldsheet: Wrapping the World in a 3D Sheet
for View Synthesis from a Single Image

Ronghang Hu1         Nikhila Ravi1         Alexander C. Berg1         Deepak Pathak2
1Facebook AI Research (FAIR)          2Carnegie Mellon University ICCV 2021 (Oral)

Worldsheet synthesizes novel views of a scene from a single image using a mesh sheet for scene representation.

Abstract

We present Worldsheet, a method for novel view synthesis using just a single RGB image as input. This is a challenging problem as it requires an understanding of the 3D geometry of the scene as well as texture mapping to generate both visible and occluded regions from new view-points. Our main insight is that simply shrink-wrapping a planar mesh sheet onto the input image, consistent with the learned intermediate depth, captures underlying geometry sufficient enough to generate photorealistic unseen views with arbitrarily large view-point changes. To operationalize this, we propose a novel differentiable texture sampler that allows our wrapped mesh sheet to be textured; which is then transformed into a target image via differentiable rendering. Our approach is category-agnostic, end-to-end trainable without using any 3D supervision and requires a single image at test time. Worldsheet consistently outperforms prior state-of-the-art methods on single-image view synthesis across several datasets. Furthermore, this simple idea captures novel views surprisingly well on a wide range of high resolution in-the-wild images in converting them into a navigable 3D pop-up.



Video

Method

Given an input view, we build a scene mesh by warping a grid sheet onto the scene geometry via grid offset and depth. Then, we sample the UV texture map of the scene mesh differentiably and render it from the target camera pose to output a novel view. Our mesh warping is learned end-to-end using 2D rendering losses on the novel view.


Comparison to prior state-of-the-art

We evaluate and compare with previous approaches on three benchmark datasets: Matterport, Replica, and RealEstate10K. Our model outperforms previous work by a large margin under PSNR and other metrics (see paper for details).

      

Compared to SynSin (Wiles et al. 2020), a prior state-of-the-art based on point cloud, our Worldsheet generalizes better to large viewpoint changes and has fewer artifacts.



Failure cases

Sometimes artifacts occur when the depth is discontinuous around the object boundary. For instance, the boundary of the flower or the tree is blurry. We hope to address these issues in future work by segmenting the Worldsheet around depth boundaries.

        

BibTeX

@inproceedings{hu2021worldsheet,
  title={Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image},
  author={Hu, Ronghang and Ravi, Nikhila and Berg, Alexander C. and Pathak, Deepak},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}