Neural USD: A Novel Approach to Object-Centric Image Editing

Thursday 04 December 2025

Computer graphics and visual effects have revolutionized the way we experience movies, video games, and even virtual reality. But for creators, editing and manipulating objects in these digital scenes can be a tedious and time-consuming process. A new paper from researchers at Google DeepMind has introduced a novel approach to object-centric image editing, making it easier than ever before to control and manipulate objects in digital environments.

The team’s innovation is called Neural USD, an acronym for Universal Scene Descriptor. This framework represents scenes and objects in a structured, hierarchical manner, allowing for precise control over appearance, geometry, and pose of individual objects within the scene. Think of it like having a virtual puppeteer, where you can manipulate each object independently without affecting the rest of the scene.

To achieve this level of precision, Neural USD uses a combination of advanced computer vision techniques and machine learning algorithms. The system first extracts 2D and 3D bounding boxes from source images, which define the edges and shape of individual objects within the scene. From there, the model can be conditioned to change specific aspects of the object, such as its pose, appearance, or geometry.

The researchers demonstrated Neural USD’s capabilities by testing it on a range of datasets, including videos of everyday objects moving around in real-world environments. They showed that the system could accurately manipulate object poses and appearances while preserving the rest of the scene. For example, they could take an image of a chair and change its pose without affecting the surrounding environment.

The implications of Neural USD are significant for various industries, such as computer-aided design (CAD), special effects in film and television, and even virtual reality applications. With this technology, designers and artists can create complex digital scenes with ease, allowing for more realistic and immersive experiences.

One potential application is the creation of realistic CGI environments for movies and TV shows. Imagine being able to replace a background or swap out objects in a scene without having to re-render an entire animation. Neural USD makes this possible.

The researchers acknowledge that there are still limitations to their approach, particularly when it comes to generalizing to new object categories. However, they believe that co-training on more readily available 2D bounding box datasets could help alleviate these issues.

Overall, Neural USD represents a significant step forward in the field of computer graphics and visual effects. Its potential applications are vast, from creating realistic digital environments for entertainment to enhancing the way we interact with virtual reality.

Cite this article: “Neural USD: A Novel Approach to Object-Centric Image Editing”, The Science Archive, 2025.

Computer Graphics, Visual Effects, Object-Centric Image Editing, Neural Usd, Universal Scene Descriptor, Machine Learning Algorithms, Computer Vision Techniques, 2D And 3D Bounding Boxes, Scene Manipulation, Virtual Reality Applications

Reference: Alejandro Escontrela, Shrinu Kushagra, Sjoerd van Steenkiste, Yulia Rubanova, Aleksander Holynski, Kelsey Allen, Kevin Murphy, Thomas Kipf, “Neural USD: An object-centric framework for iterative editing and control” (2025).

Leave a Reply