Breakthrough in Protein Function Prediction Accelerates Understanding of Biological Systems

Friday 07 March 2025


Researchers have made a significant breakthrough in the field of protein function prediction, a crucial task in understanding the intricacies of biological systems. By developing a new architecture for deep learning models, scientists have been able to improve the accuracy of predictions and expand their capabilities to analyze longer sequences.


Protein function prediction is essential for understanding how proteins interact with each other and with their environment. Proteins are complex molecules that perform various functions in cells, such as catalyzing chemical reactions or providing structural support. However, predicting their functions from their primary sequence alone has proven challenging.


The new architecture, called ESM2 long and quantized, is an extension of the original ESM2 model, which was pre-trained on billions of proteins. The updated model can now handle sequences up to 2,048 amino acids in length, more than double the previous limit. This increased capacity allows researchers to analyze larger protein families and identify patterns that may have been missed before.


The key innovation is the use of quantization, a technique that reduces the precision of the model’s weights and activations from floating-point numbers to integers. This reduction in precision enables faster computations and reduced memory usage, making it possible to process longer sequences.


In testing the new architecture, researchers found significant improvements in accuracy compared to the original ESM2 model. For example, on a dataset containing proteins with more than 1,024 amino acids, the ESM2 long and quantized model achieved an Fmax score of 0.516, outperforming the standard ESM2 model by 3.5 percentage points.


The increased accuracy and capacity of the new architecture have far-reaching implications for protein function prediction. Researchers can now analyze larger protein families and identify patterns that may be relevant to understanding complex biological processes. This information can be used to develop novel therapies, improve our understanding of diseases, and design more effective biomolecules.


In addition to its applications in basic research, the ESM2 long and quantized architecture has the potential to transform biotechnology industry. By enabling faster and more accurate protein function prediction, this technology can accelerate the development of new drugs, vaccines, and other biological products.


The advancement also highlights the power of collaboration between researchers from different fields. The development of the ESM2 long and quantized architecture involved expertise in computer science, biology, and engineering, demonstrating the value of interdisciplinary research.


Cite this article: “Breakthrough in Protein Function Prediction Accelerates Understanding of Biological Systems”, The Science Archive, 2025.


Protein Function Prediction, Deep Learning Models, Protein Sequences, Biological Systems, Accuracy, Precision, Memory Usage, Computational Efficiency, Biotechnology Industry, Interdisciplinary Research


Reference: Gabriel Bianchin de Oliveira, Helio Pedrini, Zanoni Dias, “Scaling Up ESM2 Architectures for Long Protein Sequences Analysis: Long and Quantized Approaches” (2025).


Leave a Reply