Normalizing RNA Sequencing Data: A New Approach Using Non-Differentially Expressed Genes

Friday 14 March 2025


Researchers have long sought a way to make sense of the vast amounts of data generated by RNA sequencing, a technique used to analyze the genetic material within cells. The problem is that each platform used for RNA sequencing produces unique variations in the data, making it difficult to compare results across different studies.


To tackle this issue, scientists have developed various normalization methods aimed at adjusting these differences and ensuring that the data can be accurately compared. However, most of these approaches rely on assumptions about the underlying biology or statistical models that may not always hold true.


In a recent study, researchers explored an alternative approach by selecting non-differentially expressed genes (NDEGs) as a reference for normalization. These genes are thought to remain relatively stable across different conditions and platforms, making them ideal candidates for normalizing RNA sequencing data.


The team used breast cancer datasets from The Cancer Genome Atlas (TCGA) to test their method. They first identified the NDEGs using an analysis of variance (ANOVA) approach and then applied these genes as a reference for normalization. The results showed that this method outperformed other popular normalization techniques, such as log-transformed quantile normalization (LOG_QN).


The researchers also compared their approach with other methods using different machine learning models, including neural networks and random forests. These tests revealed that the NDEG-based normalization performed well across a range of classification tasks, including identifying molecular subtypes of breast cancer.


One of the key advantages of this method is its ability to adapt to different datasets and platforms without requiring extensive prior knowledge or assumptions about the biology of the system being studied. This makes it a promising tool for researchers seeking to analyze and compare RNA sequencing data from diverse sources.


The study’s findings also highlight the importance of considering non-differentially expressed genes in normalization methods. By leveraging these stable genes, scientists may be able to improve the accuracy and consistency of their results, ultimately leading to better understanding of complex biological processes.


In addition to its applications in RNA sequencing analysis, this approach could have broader implications for other high-throughput technologies that rely on normalization techniques. As researchers continue to push the boundaries of data-driven discovery, the need for effective normalization methods will only grow more pressing. By developing strategies like NDEG-based normalization, scientists can unlock new insights and improve our understanding of the intricate relationships between genes, environment, and disease.


Cite this article: “Normalizing RNA Sequencing Data: A New Approach Using Non-Differentially Expressed Genes”, The Science Archive, 2025.


Rna Sequencing, Normalization Methods, Non-Differentially Expressed Genes, Data Analysis, Machine Learning, Breast Cancer, The Cancer Genome Atlas, Anova, Neural Networks, Random Forests


Reference: Fei Deng, Catherine H Feng, Nan Gao, Lanjing Zhang, “Normalization and selecting non-differentially expressed genes improve machine learning modelling of cross-platform transcriptomic data” (2025).


Leave a Reply