NPC: Neural Point Characters from Video
ICCV 2023

overview

We propose a hybrid point-based representation for reconstructing animatable characters from a given video without an explicit surface model. Our method automatically produces an explicit set of 3D points representing approximate canonical geometry, and learns an articulated deformation model that produces pose-dependent point transformations. The points serve both as a scaffold for high-frequency neural features and an anchor for efficiently mapping between observation and canonical space.


We blur all faces for anonymity.

Novel Pose Comparisons on H36M and AIST++

overview
overview

Novel View Comparisons on H36M

overview

Ablation Study

overview

Video

Overview

overview

NPC produces a volume rendering of a character with a NeRF Fψ locally conditioned on features aggregated from a dynamically deformed point cloud. Given a raw video, we first estimate a canonical point cloud p with an implicit body model. GNN then deforms canonical points p conditioned on skeleton pose θ, and produces a set of pose-dependent per-point features. Every 3D query point qo in the observation space aggregates the features from k-nearest neighbors in the posed point cloud. The aggregated feature is passed into Fψ for the volume rendering. Our model is supervised directly with input videos.



Point Feature Encoding

overview

Our core idea is to employ a point cloud p as an anchor to carry features from the canonical to the observation space, forming an efficient mapping between the two. (1) Each p carries a learnable feature fp and its position queries features fs from a canonical field. (2) The GNN adds pose-dependent features fθ and deformation Δp. (3) The view direction and distance is added in bone-relative space. (4) The k-nearest neighbors of qo are used to establish forward and backward mapping from a query point to both posed and canonical points.



Citation

Datasets

All data sourcing, modeling codes, and experiments were developed at University of British Columbia. Meta did not obtain the data/codes or conduct any experiments in this work.

Human3.6M

MonoPerfCap

AIST++

SURREAL

Acknowledgements

We thank Shaofei Wang and Ruilong Li for helpful discussions related to ARAH and TAVA. We thank Luis A. Bolaños for his help and discussions, and Frank Yu, Chunjin Song, Xingzhe He and Eric Hedlin for their insightful feedback. We also thank Advanced Research Computing at the University of British Columbia and Compute Canada for providing computational resources.
The website template was borrowed from Michaël Gharbi.