Correcting Errors in Composite DNA Storage with Run-Length-Limited Codes

Saturday 15 March 2025


DNA data storage has long been touted as a revolutionary way to store vast amounts of information in a compact and durable form. But one major obstacle stands in the way of widespread adoption: errors caused by strand breaks during DNA synthesis. Researchers have proposed various solutions to mitigate these errors, but few have focused on the specific challenges posed by composite DNA storage.


Composite DNA storage involves using multiple copies of each DNA sequence to increase storage density. This approach has several advantages, including improved error correction and reduced storage costs. However, it also introduces new challenges, such as determining which sequences are correct when multiple copies are present.


A recent paper proposes a novel coding scheme specifically designed for composite DNA storage. The authors introduce a channel model that reflects the realistic behavior of strand breaks during DNA synthesis. They then develop a marker-based coding scheme to retrieve information when strands experience single breaks, aligning with experimental observations.


The key innovation here is the use of run-length-limited (RLL) codes to correct errors caused by strand breaks. RLL codes are designed to restrict the number of consecutive identical symbols in a sequence, allowing for efficient error correction. The authors generalize these codes for the composite setting and derive both lower and upper bounds on their redundancy.


The practical implications of this research are significant. By developing coding schemes that can correct errors caused by strand breaks, researchers can improve the reliability and efficiency of DNA data storage systems. This could enable widespread adoption of DNA storage in industries such as medicine, finance, and entertainment.


One potential application of this technology is in medical research. Imagine being able to store vast amounts of genomic data in a compact and durable form, allowing for more efficient analysis and comparison of genetic sequences. With the ability to correct errors caused by strand breaks, researchers could focus on uncovering new insights into human health and disease without worrying about data corruption.


The authors also propose a code construction that minimizes the redundancy required to achieve reliable error correction. This approach involves using a marker sequence at the beginning and end of each DNA sequence to identify prefixes and suffixes. The RLL codes are then used to correct errors in the data section of the sequence.


While this research is still in its early stages, it has significant implications for the development of DNA data storage systems. By addressing the challenges posed by strand breaks, researchers can create more reliable and efficient storage solutions that could revolutionize a wide range of industries.


Cite this article: “Correcting Errors in Composite DNA Storage with Run-Length-Limited Codes”, The Science Archive, 2025.


Dna Data Storage, Strand Breaks, Error Correction, Composite Dna Storage, Coding Scheme, Marker-Based Coding, Run-Length-Limited Codes, Redundancy, Genomic Data, Medical Research


Reference: Frederik Walter, Yonatan Yehezkeally, “Coding for Strand Breaks in Composite DNA” (2025).


Leave a Reply