Thursday 27 March 2025
Real-time Magnetic Resonance Imaging (rtMRI) has revolutionized our understanding of speech production by providing a non-invasive way to visualize the vocal tract in action. By using rtMRI, researchers have been able to track the movement of the tongue, lips, and other articulatory organs as people produce different sounds.
One of the key challenges in analyzing rtMRI data is segmenting the air-tissue boundary (ATB), which is essential for understanding how speech sounds are produced. The ATB is the interface between the air cavity inside the mouth and the surrounding tissues, such as the tongue and lips. Accurate segmentation of this boundary is crucial for reconstructing the movement of the articulatory organs during speech production.
Researchers have developed various techniques to segment the ATB in rtMRI data, including supervised machine learning methods and unsupervised methods that rely on manual annotation. However, these approaches often require extensive training datasets and can be time-consuming and labor-intensive.
In a new study, researchers have proposed an innovative approach to segmenting the ATB using deep neural networks. The method involves fine-tuning pre-trained models with limited data from unseen subjects, allowing for accurate segmentation even in cases where there is limited available data.
The researchers used two different datasets to test their approach. One dataset consisted of rtMRI videos from five female and five male subjects, each speaking 460 sentences. The other dataset was a subset of the USC 75-speaker speech rtMRI video database, which contains videos of various Vowel-Consonant-Vowel (VCV) sequences and sentences.
The results showed that fine-tuning pre-trained models with limited data from unseen subjects can achieve high segmentation accuracy even in cases where there is limited available data. The researchers found that using as few as 15 frames from an unseen subject’s video was sufficient to adapt the model and achieve accurate ATB segmentation.
This approach has significant implications for speech production research, particularly in scenarios where extensive training datasets are not available. By adapting pre-trained models with limited data from unseen subjects, researchers can analyze rtMRI data more efficiently and accurately, leading to a better understanding of how speech sounds are produced.
The study’s findings also have potential applications in other fields, such as speech therapy and language learning. For example, the approach could be used to develop personalized speech training programs for individuals with speech disorders or to create interactive tools for language learners to practice articulation.
Cite this article: “Segmenting Air-Tissue Boundaries in Real-Time Magnetic Resonance Imaging Data Using Deep Neural Networks”, The Science Archive, 2025.
Magnetic Resonance Imaging, Real-Time Mri, Speech Production, Articulatory Organs, Air-Tissue Boundary, Segmentation, Deep Neural Networks, Fine-Tuning, Pre-Trained Models, Unseen Subjects







