FOCUS - Multi-View Foot Reconstruction from Synthetically Trained Dense Correspondences

3DV 2025

Oliver Boyne Roberto Cipolla

University of Cambridge

arXiv Code Synthetic dataset Reconstruction dataset

At a glance

  • FOCUS introduces 2 methods to reconstruct feet from multiview images. These methods can run on uncalibrated image sets, videos, and can run in under a minute, with no GPU*.
  • The SynFoot dataset is extended to include dense correspondences.
  • A model is trained on synthetic data to predict these correspondences on in-the-wild images.
  • A 3D model from multi-view images using the correspondences by either (i) matching and triangulation or (ii) fitting the FIND model.

*FOCUS-SfM only.

Synthetic data

We extend the synthetic dataset SynFoot to SynFoot2, including more diversity, articulated feet, and dense correspondences (relative to the FIND model).

These were created using our custom library BlenderSynth, and are available for download.

Samples from synthetic dataset (drag the slider to view)

Dense correspondence prediction

We train a network to predict dense correspondences, surface normals, and related uncertainties.

See our predictions on in-the-wild images below.

In-the-wild predictions (drag the slider to view)

3D Reconstruction

We introduce 2 methods to fit to our dense correspondences:

FOCUS-SfM: Collect correspondences across views, triangulate, and run Poisson reconstruction to recover a surface.

FOCUS-O: Fit the parameterized FIND model directly to the dense correspondences.

Our methods produce better surface normal reconstructions than COLMAP, evaluated on the Foot3D benchmark foot reconstruction dataset.

Our methods require as few as 3 views, whereas COLMAP needs 15+.

Our methods provide a better reconstruction than FOUND and COLMAP.
Our methods beat COLMAP on surface normal reconstruction, and FOUND on chamfer error.
Our methods run faster, and do not require differentiable rendering. FOCUS-SfM can even run without a GPU.

Thanks to our dense correspondences, we can reconstruct from a completely uncalibrated imageset - even when backgrounds are untextured and typical camera calibration would fail. Try our demo on Github.

Acknowledgements

We acknowledge the collaboration and financial support of Trya Srl.

If you make use of this project, please cite the following paper:
@inproceedings{boyne2025focus,
            title={FOCUS: Multi-View Foot Reconstruction from Synthetically Trained Dense Correspondences},
            author={Boyne, Oliver and Cipolla, Roberto},
            booktitle={2025 International Conference on 3D Vision (3DV)},
            year={2025}
}