Reconstructing Dynamic Digital Humans from Monocular Videos

Friday 28 February 2025


A team of researchers has developed a new method for reconstructing dynamic, disentangled digital humans from monocular videos. The approach, called D3-Human, uses a combination of explicit and implicit representations to model the decoupled clothed human body.


Traditionally, reconstructing 3D models of people from videos has been a challenging task, as it requires accurately capturing their movements, clothing, and underlying body shape. Monocular videos, which are taken from a single camera angle, add an extra layer of complexity, as they don’t provide the same level of depth information as stereo or multi-view cameras.


D3-Human addresses this challenge by using a novel human manifold signed distance field (hmSDF) to segment the visible clothing and body. This allows the system to reconstruct the decoupled clothed human body in a way that’s both accurate and efficient.


The approach is based on a neural network that takes in a monocular video as input and outputs a 3D model of the person, complete with their clothing and underlying body shape. The network uses a combination of explicit and implicit representations to model the decoupled clothed human body.


Explicit representations are used to capture the detailed geometry of the person’s clothing and body, while implicit representations are used to model the underlying structure of the body and its movements. This allows the system to accurately reconstruct the person’s pose, movement, and clothing in a way that’s both realistic and efficient.


The researchers tested D3-Human using a variety of monocular videos, including those taken from different angles and with varying levels of lighting and occlusion. The results show that the approach is capable of producing high-quality 3D models of people, complete with their clothing and underlying body shape.


One potential application of D3-Human is in virtual try-on technology, where users can virtually try on clothes without having to physically put them on. Another potential use case is in video game development, where the system could be used to create realistic character animations and movements.


Overall, D3-Human represents a significant advance in the field of computer vision and graphics, as it enables the reconstruction of dynamic, disentangled digital humans from monocular videos. The approach has the potential to enable new applications in areas such as virtual try-on technology and video game development.


Cite this article: “Reconstructing Dynamic Digital Humans from Monocular Videos”, The Science Archive, 2025.


Computer Vision, Digital Humans, 3D Modeling, Monocular Videos, Neural Networks, Human Body Shape, Clothing Reconstruction, Virtual Try-On Technology, Video Game Development, Graphics.


Reference: Honghu Chen, Bo Peng, Yunfan Tao, Juyong Zhang, “D$^3$-Human: Dynamic Disentangled Digital Human from Monocular Video” (2025).


Leave a Reply