Collaborative Framework for Secure Synthetic Genomic Data Sharing

Saturday 22 February 2025


The quest for private data sharing has led scientists down a complex path of cryptography and statistical analysis. A new approach, published in a recent study, aims to tackle this challenge by developing an end-to-end collaborative framework for publishing synthetic genomic data.


Synthetic data generation is a technique used to create artificial datasets that mimic the characteristics of real-world data while protecting sensitive information. This method has gained popularity as a way to share data without compromising privacy. However, generating high-quality synthetic genomic data remains an open challenge.


The study proposes a novel framework that enables multiple data custodians to collaborate in publishing synthetic genomic data. The approach involves a secure multiparty computation (MPC) protocol that allows parties to jointly generate and evaluate synthetic data while preserving privacy.


In traditional data sharing scenarios, each party typically has its own dataset and must navigate lengthy administrative processes to access sensitive information. This framework breaks down these barriers by enabling parties to share their individual datasets and work together to create a comprehensive, synthetic dataset.


The MPC protocol is the backbone of this approach, ensuring that each party’s contribution remains private and secure throughout the process. The protocol utilizes cryptographic techniques, such as replicated secret sharing, to protect sensitive information from unauthorized access or manipulation.


Once the parties have contributed their individual datasets, the MPC protocol generates a synthetic dataset that accurately reflects the characteristics of the real-world data. This synthetic dataset can then be shared openly, allowing researchers to conduct studies and analysis without compromising privacy.


The study demonstrates the feasibility of this approach using leukemia genomic data as an example. The results show that the proposed framework can generate high-quality synthetic data that meets the requirements for AI research while preserving privacy.


While there are still challenges to overcome, this innovative approach has significant implications for the field of genomics and beyond. As researchers continue to push the boundaries of data sharing and analysis, this collaborative framework provides a promising solution for ensuring the security and privacy of sensitive information.


Cite this article: “Collaborative Framework for Secure Synthetic Genomic Data Sharing”, The Science Archive, 2025.


Genomics, Data Sharing, Synthetic Data, Cryptography, Statistical Analysis, Collaborative Framework, End-To-End Encryption, Multiparty Computation, Replicated Secret Sharing, Ai Research.


Reference: Sikha Pentyala, Geetha Sitaraman, Trae Claar, Martine De Cock, “End to End Collaborative Synthetic Data Generation” (2024).


Leave a Reply