Adaptive Image Compression through Content-Adaptive Tokenization

Sunday 02 March 2025


A team of researchers has made a significant breakthrough in the field of image compression, developing a new method that can adaptively compress images based on their complexity. The approach, called Content-Adaptive Tokenization (CAT), uses artificial intelligence to assess the intricacies of an image and adjust its compression ratio accordingly.


Traditional image compression methods often rely on fixed ratios, which can result in subpar quality for complex images or excessive file sizes for simple ones. CAT, however, changes this paradigm by incorporating a caption-based evaluation system that determines the optimal compression ratio for each individual image. This is achieved through the use of large language models (LLMs), which analyze the descriptive text associated with an image and assign a complexity score.


The LLMs are trained on a massive dataset of images and captions, allowing them to learn patterns and relationships between the two. When presented with a new image, the model generates a caption and then uses this output to determine its own compression ratio. This adaptive approach ensures that complex images with intricate details receive more compression, while simpler ones remain relatively unchanged.


To test the efficacy of CAT, researchers trained the system on a dataset of over 10 million images from various sources, including the COCO (Common Objects in Context) and ChartQA datasets. They then evaluated its performance using a range of metrics, including reconstruction fidelity (rFID), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM).


The results were impressive, with CAT outperforming traditional compression methods in terms of both image quality and file size. For instance, on the COCO dataset, CAT achieved an rFID score of 0.65, compared to 0.51 for a fixed 8x compression ratio. Similarly, on the ChartQA dataset, CAT’s SSIM score was 5.27, surpassing the 3.07 achieved by a fixed 16x compression ratio.


In addition to its impressive performance, CAT has several practical applications. For instance, it could be used in web development to optimize image loading times and reduce bandwidth usage. It may also find uses in data storage and retrieval systems, where efficient compression is crucial for large-scale image archives.


To further explore the capabilities of CAT, researchers developed a novel generative model called Diffusion Transformers (DiT- XL). This architecture combines the strengths of transformer-based language models with those of diffusion-based image generation.


Cite this article: “Adaptive Image Compression through Content-Adaptive Tokenization”, The Science Archive, 2025.


Image Compression, Content-Adaptive Tokenization, Artificial Intelligence, Adaptive Compression, Complexity Score, Large Language Models, Image Quality, File Size, Reconstruction Fidelity, Peak Signal-To-Noise Ratio, Structural Similarity Index.


Reference: Junhong Shen, Kushal Tirumala, Michihiro Yasunaga, Ishan Misra, Luke Zettlemoyer, Lili Yu, Chunting Zhou, “CAT: Content-Adaptive Image Tokenization” (2025).


Leave a Reply