Thursday 23 January 2025
A team of researchers has developed a new approach to automatically extracting location coordinates from news articles, which can be used for applications such as tracking disease outbreaks and natural disasters. The system, called RACCOON, uses a combination of natural language processing (NLP) and geospatial knowledge to identify locations mentioned in news articles and determine their corresponding coordinates.
The challenge of geocoding is that location references may be ambiguous or share names with non-locations, making it difficult for computers to accurately identify the correct location. RACCOON addresses this problem by using a retrieval-augmented generation (RAG) framework, which retrieves candidate locations from a database and then uses an LLM (large language model) to generate the final coordinates.
The system consists of five key components: country-assisted retrieval, 20 candidate location entries, GeoNames feature types, population heuristic, and state-level context. The country-assisted retrieval component helps narrow down the search by inferring the country in which a location is mentioned. The LLM then uses this information to generate the final coordinates.
The researchers evaluated RACCOON on three datasets: GeoVirus, GeoWebNews, and Local-Global Lexicon (LGL). They found that RACCOON outperformed two baseline models, including one that used a traditional NLP approach and another that used an LLM without the RAG framework. RACCOON also performed well on all metrics, including mean error, accuracy @161km, country accuracy, and area under the curve (AUC).
One potential limitation of RACCOON is its reliance on pre-trained language models, which may not perform well in certain domains or languages. Additionally, the system may suffer from population bias, where locations with smaller populations are less accurately geocoded.
Despite these limitations, RACCOON represents an important step forward in the field of geocoding, and has potential applications in a wide range of fields, including epidemiology, disaster response, and environmental monitoring. As the amount of digital data continues to grow, developing more accurate and efficient methods for extracting location coordinates from text will become increasingly important.
RACCOON’s ability to retrieve candidate locations from a database and then use an LLM to generate final coordinates makes it particularly well-suited for applications where accuracy is critical. For example, in the event of a natural disaster, quickly and accurately identifying affected areas can be crucial for response efforts.
Cite this article: “RACCOON: An Advanced System for Geocoding News Articles”, The Science Archive, 2025.
Here Are The Keywords: Geocoding, Raccoon, Natural Language Processing, Location Coordinates, News Articles, Disease Outbreaks, Natural Disasters, Retrieval-Augmented Generation, Large Language Model, Geonames.







