NPC: Neural Point Characters

NPC: Neural Point Characters from Video
ICCV 2023

Shih-Yang Su
The University of British Columbia
Timur Bagautdinov
Reality Labs Research
Helge Rhodin
The University of British Columbia

We propose a hybrid point-based representation for reconstructing animatable characters from a given video without an explicit surface model. Our method automatically produces an explicit set of 3D points representing approximate canonical geometry, and learns an articulated deformation model that produces pose-dependent point transformations. The points serve both as a scaffold for high-frequency neural features and an anchor for efficiently mapping between observation and canonical space.

We blur all faces for anonymity.

Novel Pose Comparisons on H36M and AIST++

Novel View Comparisons on H36M

Ablation Study

Video

Overview

NPC produces a volume rendering of a character with a NeRF Fψ locally conditioned on features aggregated from a dynamically deformed point cloud. Given a raw video, we first estimate a canonical point cloud p with an implicit body model. GNN then deforms canonical points p conditioned on skeleton pose θ, and produces a set of pose-dependent per-point features. Every 3D query point qo in the observation space aggregates the features from k-nearest neighbors in the posed point cloud. The aggregated feature is passed into Fψ for the volume rendering. Our model is supervised directly with input videos.

Point Feature Encoding

Our core idea is to employ a point cloud p as an anchor to carry features from the canonical to the observation space, forming an efficient mapping between the two. (1) Each p carries a learnable feature fp and its position queries features fs from a canonical field. (2) The GNN adds pose-dependent features fθ and deformation Δp. (3) The view direction and distance is added in bone-relative space. (4) The k-nearest neighbors of qo are used to establish forward and backward mapping from a query point to both posed and canonical points.

Citation

@Inproceedings{su2023npc,
    title={NPC: Neural Point Characters from Video},
    author={Su, Shih-Yang and Bagautdinov, Timur and Rhodin, Helge},
    Booktitle      = {ICCV},
    Year           = {2023}
  }

Datasets

All data sourcing, modeling codes, and experiments were developed at University of British Columbia. Meta did not obtain the data/codes or conduct any experiments in this work.

Human3.6M

@article{h36m_pami,
    author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu,  Cristian},
    title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
    journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
    publisher = {IEEE Computer Society},
    volume = {36},
    number = {7},
    pages = {1325-1339},
    month = {jul},
    year = {2014}
}

@inproceedings{IonescuSminchisescu11,
    author = {Catalin Ionescu, Fuxin Li, Cristian Sminchisescu},
    title = {Latent Structured Models for Human Pose Estimation},
    booktitle = {International Conference on Computer Vision},
    year = {2011}
}

MonoPerfCap

@article{xu18monoperfcap,
    author = {W. Xu and A. Chatterjee and M. Zollh{\"o}fer and H. Rhodin and D. Mehta and H.-P. Seidel and C. Theobalt},
    title = {Monoperfcap: Human Performance Capture from Monocular Video},
    journal = TOG,
    volume = "37",
    number = "2",
    pages = "27",
    year = 2018
}

AIST++

@inproceedings{li2021learn,
    title={AI Choreographer: Music Conditioned 3D Dance Generation with AIST++}, 
    author={Li, Ruilong and Yang, Shan and Ross, David A. and Kanazawa, Angjoo},
    year={2021},
    booktitle=ICCV,
}

SURREAL

@inproceedings{varol17surreal,  
    title     = {Learning from Synthetic Humans},  
    author    = {Varol, G{\"u}l and Romero, Javier and Martin, Xavier and Mahmood, Naureen and Black, Michael J. and Laptev, Ivan and Schmid, Cordelia},  
    booktitle = {CVPR},  
    year      = {2017}  
  }

Acknowledgements

We thank Shaofei Wang and Ruilong Li for helpful discussions related to ARAH and TAVA. We thank Luis A. Bolaños for his help and discussions, and Frank Yu, Chunjin Song, Xingzhe He and Eric Hedlin for their insightful feedback. We also thank Advanced Research Computing at the University of British Columbia and Compute Canada for providing computational resources.
The website template was borrowed from Michaël Gharbi.