EdgeTAM: A Novel Model for Efficient Video Object Segmentation on Mobile Devices

Friday 07 March 2025


Scientists have made a significant breakthrough in the field of video object segmentation, a crucial technology used in various applications such as robotics, surveillance, and autonomous vehicles. The research team, led by Chenchen Zhu at Meta Reality Labs, has developed an innovative model called EdgeTAM that can efficiently segment objects from videos on mobile devices.


Traditional video object segmentation models are often computationally intensive and require powerful hardware to run. This limitation makes it difficult to deploy these models on mobile devices, which have limited processing power and memory. EdgeTAM addresses this issue by introducing a novel 2D spatial perceiver that reduces the computational cost of the model without compromising its accuracy.


The key innovation behind EdgeTAM is its ability to leverage a lightweight transformer architecture to process video frames efficiently. The transformer is designed with a fixed set of learnable queries that are used to encode frame-level memories stored in a memory bank. This approach allows EdgeTAM to preserve the spatial structure of the memories, which is essential for preserving object boundaries and segmentation accuracy.


EdgeTAM has been tested on several benchmark datasets, including DAVIS 2017, MOSE, SA-V val, and SA-V test. The results show that EdgeTAM outperforms previous models in terms of speed and accuracy. On the iPhone 15 Pro Max, for example, EdgeTAM can run at an impressive 16 frames per second (FPS), while maintaining a high segmentation accuracy.


The researchers have also proposed a distillation pipeline to further improve the performance of EdgeTAM without increasing its computational cost. This pipeline uses a teacher-student framework to transfer knowledge from a larger, more accurate model to EdgeTAM. The result is a highly efficient and accurate video object segmentation model that can be deployed on mobile devices.


The potential applications of EdgeTAM are vast. In robotics, for instance, the model could enable robots to quickly and accurately detect and track objects in their environment. In surveillance systems, EdgeTAM could be used to improve object detection and tracking capabilities, allowing for more effective monitoring and response to events. Autonomous vehicles could also benefit from EdgeTAM’s ability to quickly segment objects from videos, enabling them to better navigate complex environments.


Overall, the development of EdgeTAM represents a significant step forward in the field of video object segmentation. Its efficiency and accuracy make it an attractive solution for applications where processing power is limited, such as mobile devices.


Cite this article: “EdgeTAM: A Novel Model for Efficient Video Object Segmentation on Mobile Devices”, The Science Archive, 2025.


Video Object Segmentation, Edgetam, Transformer Architecture, Lightweight Model, Mobile Devices, Robotics, Surveillance, Autonomous Vehicles, Distillation Pipeline, Teacher-Student Framework.


Reference: Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, et al., “EdgeTAM: On-Device Track Anything Model” (2025).


Leave a Reply