Tuesday 22 July 2025
A team of scientists has made a significant breakthrough in the field of genomics, developing a new method for improving spliced alignment – a crucial process in understanding the complex relationships between genes and genomes.
Spliced alignment is a technique used to align messenger RNA (mRNA) or protein sequences with eukaryotic genomes. It’s a critical step in gene annotation and the study of gene functions. However, current aligners often rely on simple models that can be inaccurate when dealing with dissimilar sequences.
The new method, developed by researchers at Harvard Medical School and Dana-Farber Cancer Institute, uses deep learning to model splice sites – the specific regions where introns are removed from pre-mRNA molecules. By leveraging this information, the team was able to create a more accurate spliced alignment algorithm that can be used for both gene annotation and the study of gene functions.
The researchers trained their model using a dataset of 7,026 parameters for vertebrate and insect genomes. They then tested its performance on human long-read RNA-seq data and cross-species protein datasets. The results showed significant improvements in junction accuracy, especially for noisy long RNA-seq reads and proteins with distant homology.
One of the key innovations behind this new method is its ability to capture conserved splice signals across phyla. This allows it to identify specific features of mammalian and bird genomes that are not present in other organisms.
The team also developed a software tool called minisplice, which can be used to estimate empirical splicing probability for every GT and AG in genomes. This information can then be leveraged during the alignment process to further improve accuracy.
The implications of this breakthrough are significant. It has the potential to revolutionize our understanding of gene regulation and expression, and could even lead to new treatments for genetic disorders.
In addition to its scientific significance, this research also highlights the power of deep learning in bioinformatics. By leveraging complex neural networks, researchers can develop algorithms that are capable of capturing subtle patterns and relationships within genomic data.
As we continue to push the boundaries of genomics, it’s clear that advances like these will be crucial for unlocking new insights into human biology and disease. With this new method, scientists will have a powerful tool at their disposal to better understand the complex interplay between genes, genomes, and disease.
Cite this article: “Deep Learning Breakthrough in Genomics: Accurate Spliced Alignment for Gene Annotation and Function Study”, The Science Archive, 2025.
Genomics, Spliced Alignment, Deep Learning, Gene Annotation, Gene Functions, Mrna, Protein Sequences, Eukaryotic Genomes, Rna-Seq, Bioinformatics