Saturday 08 March 2025
Phishing attacks have become a pervasive threat in the digital age, with scammers using increasingly sophisticated tactics to steal sensitive information and deploy malware. To combat this scourge, researchers have been working on developing more effective detection methods, and a new study offers some promising results.
The study, published recently, focuses on the integration of open-source intelligence (OSINT) tools and machine learning models to improve phishing detection across multilingual datasets. OSINT, for those unfamiliar with the term, refers to the gathering and analysis of publicly available information from sources such as social media, online forums, and other digital platforms.
The researchers began by extracting 17 features from a dataset comprising English and Arabic emails, using tools like Nmap and TheHarvester to gather information on domain names, IP addresses, and open ports. They then trained five machine learning algorithms – decision tree, random forest, support vector machine, XGBoost, and multinomial Naive Bayes – on the enhanced dataset.
The results were impressive: all five models demonstrated improved accuracy after incorporating OSINT features, with random forest emerging as the top performer, achieving an accuracy of 97.37% for both English and Arabic datasets. The study also found that the model’s ability to detect phishing emails was more effective when trained on multilingual data.
One of the most significant findings is the impact of OSINT features on false positives – instances where a legitimate email is incorrectly flagged as malicious. By incorporating these features, the models were able to significantly reduce the number of false positives, making them more reliable and accurate tools for detecting phishing attacks.
The study’s authors also noted that the OSINT-enhanced models performed better in identifying phishing emails with non-English content, which could have significant implications for organizations operating globally. The increased accuracy and reduced false positives make these models particularly valuable in environments where language barriers are a concern.
While the study is promising, there are still limitations to consider. For instance, the dataset used was relatively small, which may limit the generalizability of the results. Additionally, the researchers acknowledged that the performance of the models could be improved further with larger and more diverse datasets.
Despite these caveats, the study’s findings offer a glimpse into the potential of integrating OSINT tools and machine learning models for phishing detection. As cybercriminals continue to evolve their tactics, it’s essential for researchers and organizations alike to stay ahead of the curve by developing innovative solutions that can effectively detect and prevent these attacks.
Cite this article: “Enhancing Phishing Detection with Open-Source Intelligence and Machine Learning”, The Science Archive, 2025.
Phishing, Osint, Machine Learning, Detection, Multilingual, Datasets, Accuracy, False Positives, Cybersecurity, Malware







