Sunday 23 February 2025
A team of researchers has developed a novel approach for detecting changes in scanned documents, which could have significant implications for various industries. The method uses a combination of computer vision and machine learning techniques to identify altered text images at the word level.
The problem of document change detection is particularly challenging when dealing with scanned documents, as the text can become distorted or unclear during the scanning process. Traditional OCR (Optical Character Recognition) methods often struggle to accurately recognize characters in these cases, leading to errors and inaccuracies.
To address this issue, the researchers have developed a new approach that focuses on comparing image pairs of scanned documents at the word level. The method uses a combination of convolutional neural networks (CNNs) and attention mechanisms to identify changes between the original and scanned text images.
The first step in the process is to detect layout elements within the document, such as tables, headers, and footers. This is done using a layout detection algorithm that separates the text into distinct regions. The researchers then use a unit detector to identify individual words within each region.
Next, the team uses a CNN-based model to generate feature maps for each word image. These feature maps capture the spatial and contextual information of the characters within each word. An attention mechanism is then applied to these feature maps to focus on specific regions of interest, such as changes in character ratio or position.
The final step is to compare the feature maps between the original and scanned text images using a binary classification approach. This involves generating segmentation maps that identify areas where the text has been modified. The researchers have developed a two-way segmentation approach that generates maps for both directions, from source to target and vice versa.
Experimental results show that the new method outperforms state-of-the-art change detection models on several benchmark datasets. For example, it achieves an F1 score of 76.4% on a remote sensing change detection dataset, compared to 74.8% for BIT-CD and 72.3% for SARAS-Net.
The researchers have also tested their method on a real-world contract document dataset, which consists of over 344 contracts with multiple languages and layouts. The results show that the method is able to detect changes in text images at an average processing time of 2.82 seconds per page.
Overall, this new approach has significant potential for applications in industries such as finance, law, and logistics, where accurate detection of document changes is critical.
Cite this article: “Detecting Changes in Scanned Documents with Computer Vision and Machine Learning”, The Science Archive, 2025.
Computer Vision, Machine Learning, Document Change Detection, Scanned Documents, Ocr, Convolutional Neural Networks, Attention Mechanisms, Layout Detection, Text Images, Binary Classification







