Thursday 27 March 2025
Machine translation, a staple of modern communication, has reached an impressive level of sophistication in recent years. But despite its many advances, there’s still one major hurdle to overcome: making it work for technical texts.
Think about it – when you’re trying to troubleshoot a complex software issue or understand the intricacies of a new technology, accuracy is paramount. A single misplaced word or phrase can render an entire document useless. And yet, most machine translation systems are designed with general language in mind, not specialized domains like software engineering.
That’s why a team of researchers has been working on developing more domain-specific machine translation models. Their latest paper presents some promising results, showing that these models can significantly outperform their general-purpose counterparts when translating technical texts.
The researchers used a dataset of bug reports from the popular Visual Studio Code editor to test their system. These reports are perfect examples of technical text – they’re filled with code snippets, jargon, and specific terminology that’s unique to software development. The team trained three different machine translation models on this data: DeepL, AWS Translate, and ChatGPT.
The results were striking. When it came to translating technical terms and preserving the original meaning of the text, DeepL consistently outperformed the other two systems. It was able to accurately translate complex code snippets and domain-specific terminology, which is crucial for software engineers trying to collaborate across languages.
AWS Translate, on the other hand, excelled at handling paraphrases – that is, it could take a sentence written in one style and rewrite it in another while still conveying the same information. This is important because technical texts often require precise language to convey complex ideas.
ChatGPT, unfortunately, lagged behind its competitors. Its translations were less accurate and less context-sensitive than those of DeepL and AWS Translate. This isn’t surprising – ChatGPT is a large language model trained on a wide range of text data, which can make it less effective when dealing with specialized domains like software engineering.
The researchers used multiple evaluation metrics to assess the performance of each system, including BLEU, METEOR, ROUGE, and BERTScore. These metrics provide different insights into how well each system is translating technical texts – for example, BLEU measures n-gram overlap, while METEOR looks at synonyms and paraphrases.
The results showed that DeepL’s strong performance was consistent across multiple metrics, indicating that it’s a robust model that can handle the complexities of technical text.
Cite this article: “Machine Translation in Technical Text: A Step Towards Improved Accuracy and Collaboration”, The Science Archive, 2025.
Machine Translation, Technical Texts, Software Engineering, Domain-Specific Models, Deepl, Aws Translate, Chatgpt, Bug Reports, Code Snippets, Paraphrases







