Monday 03 March 2025
The quest for a more accurate measure of data complexity has long been a thorn in the side of researchers and developers alike. For too long, metrics like Shannon entropy and Kolmogorov complexity have dominated the field, but they’re woefully inadequate at capturing the nuances of human-readable data.
Enter a new approach, one that seeks to quantify complexity by identifying patterns and structures within the data itself. This framework, dubbed Local Compositional Complexity (LCC), is designed specifically with communicative signals in mind – think text, images, and audio.
The LCC score is calculated by measuring the cost of describing a given piece of data using a combination of codebooks and residual strings. Think of it like this: if you were trying to explain a sentence to someone, you’d start by identifying common patterns or themes (codebooks), and then fill in the gaps with additional information (residual strings). The LCC score reflects how efficiently this process can be done.
Researchers have been testing the LCC framework on a variety of datasets, including text, audio, and images. The results are striking – LCC is able to accurately distinguish between human-readable data and noise or random patterns, even when other metrics would struggle to do so.
One key advantage of LCC is its ability to capture the local nature of complexity. In other words, it’s not just about the overall distribution of a dataset, but also how those patterns change from one region to another. This makes it particularly well-suited for tasks like image and audio processing, where subtle variations in texture or tone can have a huge impact on meaning.
The researchers behind LCC are quick to point out that this is just the beginning – there’s still much work to be done before this framework can be widely adopted. But the potential implications are vast – imagine being able to develop AI systems that can truly understand and generate human-like language, or create algorithms that can identify patterns in medical images with unprecedented accuracy.
As the field continues to evolve, one thing is clear: the search for a more accurate measure of data complexity has finally taken a major leap forward. And with LCC at the forefront, we’re poised to unlock new possibilities in fields from machine learning to natural language processing.
Cite this article: “A New Framework for Measuring Data Complexity”, The Science Archive, 2025.
Data Complexity, Shannon Entropy, Kolmogorov Complexity, Local Compositional Complexity, Codebooks, Residual Strings, Human-Readable Data, Noise, Random Patterns, Machine Learning, Natural Language Processing.
Reference: Louis Mahon, “Local Compositional Complexity: How to Detect a Human-readable Messsage” (2025).







