Creating relightable and animatable human avatars from monocular videos is a rising research topic with a range of applications, e.g. virtual reality, sports, and video games. Previous works utilize neural fields together with physically based rendering (PBR), to estimate geometry and disentangle appearance properties of human avatars. However, one drawback of these methods is the slow rendering speed due to the expensive Monte Carlo ray tracing. To tackle this problem, we proposed to distill the knowledge from implicit neural fields (teacher) to explicit 2D Gaussian splatting (student) representation to take advantage of the fast rasterization property of Gaussian splatting. To avoid ray-tracing, we employ the split-sum approximation for PBR appearance. We also propose novel part-wise ambient occlusion probes for shadow computation. Shadow prediction is achieved by querying these probes only once per pixel, which paves the way for real-time relighting of avatars. These techniques combined give high-quality relighting results with realistic shadow effects. Our experiments demonstrate that the proposed student model achieves comparable relighting results with our teacher model while being 370 times faster at inference time, achieving a 67 FPS rendering speed.
Given a monocular video, we first train an implicit teacher model via ray-tracing-based PBR to decompose the intrinsic properties, including geometry, albedo, roughness, and metallic. Then, a point-based (2DGS) explicit student model is optimized under the guidance of the teacher model. In order to avoid the time-consuming ray-tracing-based PBR, we adopt an approximated PBR with part-wise occlusion probes to compute the shading color and model the shadowing effects. We regularize the student model by distilling the implicit property fields from our teacher model.
Our method generalizes to various people with different human shapes and miscellaneous clothing styles and performs well under different environment maps.
Our proposed part-wise occlusion probes enables fast and high-fidelity shadow modeling (Ambient Occlusion, i.e. AO) under novel poses. Our method also disentangles other intrinsic properties, such as albedo, metallic normal, and roughness, that used by our approximated PBR pipeline.
Our experiments demonstrate that the proposed student model achieves comparable or even better relighting results with our teacher model while being 370 times faster at inference time, achieving a 67 FPS rendering speed.
The bias imposed by our implicit teacher model helps our student model to achieve reasonable rendering on out-of-distribution poses. In comparison, state-of-the-art 3DGS-based avatar model tends to fail on out-of-distribution poses especially around joints.
The majority of this work was done when Zeren Jiang was a master student at ETH Zürich.
We thank Zhiyin Qian and Zinuo You for helpful suggestions and discussions. We also thank Angel He for proofreading.
@misc{DNF-Avatar,
title={DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting},
author={Jiang, Zeren and Wang, Shaofei and Tang, Siyu},
year={2025},
archivePrefix={arXiv},
primaryClass={cs.CV}
}