AniMer+: A Deep Learning System for Analyzing and Predicting Animal Movements

Monday 25 August 2025

Deep learning has revolutionized many fields, from image recognition to speech synthesis. Now, a team of researchers has applied this technology to a new area: understanding animal movements. By developing an AI system that can analyze and predict the poses and shapes of mammals and birds, scientists hope to improve our understanding of these creatures’ behaviors and habitats.

The challenge lies in the vast diversity of animals and their complex body structures. To tackle this problem, researchers have created a neural network called AniMer+, which is capable of learning from a wide range of species. This system combines two key innovations: a family-aware Vision Transformer (ViT) that can recognize patterns across different families of animals, and a Mixture-of-Experts (MoE) design that enables the network to adapt to specific species.

To train AniMer+, researchers needed large amounts of data. Unfortunately, 3D training data is scarce, especially for birds. To overcome this limitation, they developed a diffusion-based conditional image generation pipeline, which generates synthetic datasets. These virtual images are designed to mimic real-world conditions and provide valuable insights into animal behavior.

The resulting dataset, called CtrlAni3D, consists of over 10,000 quadruped images with pixel-aligned SMAL labels (a type of 3D model). For birds, the researchers created CtrlAVES3D, a dataset of around 7,000 images with pixel-aligned AVES labels. These synthetic datasets are crucial for resolving single-view depth ambiguities and improving the accuracy of AniMer+.

The AI system has been tested on a range of benchmarks, including the challenging Animal Kingdom dataset. Results show that AniMer+ outperforms existing approaches in predicting animal poses and shapes across different species. The researchers also conducted ablation studies to demonstrate the effectiveness of their novel network architecture and synthetic datasets.

This achievement has significant implications for various fields, such as wildlife conservation, veterinary medicine, and even filmmaking. By analyzing animal movements, scientists can better understand habitat destruction, migration patterns, and behavior changes caused by climate change or human activities. In the medical field, AniMer+ could aid in diagnosing and treating animals with injuries or diseases.

Moreover, this technology has potential applications in virtual reality and animation. Imagine being able to create realistic animal simulations for educational programs or feature films. This breakthrough demonstrates the power of deep learning in understanding complex biological systems and highlights its vast potential for improving our daily lives.

Cite this article: “AniMer+: A Deep Learning System for Analyzing and Predicting Animal Movements”, The Science Archive, 2025.

Ai, Animal Movements, Deep Learning, Neural Network, Animer+, Vit, Moe, Synthetic Datasets, Ctrlani3D, Ctrlaves3D

Reference: Jin Lyu, Liang An, Li Lin, Pujin Cheng, Yebin Liu, Xiaoying Tang, “AniMer+: Unified Pose and Shape Estimation Across Mammalia and Aves via Family-Aware Transformer” (2025).

Discussion