Saturday 15 March 2025
In recent years, the accuracy of IP geolocation has become increasingly important in various fields such as cybersecurity, online advertising, and network optimization. However, traditional regression-based methods for predicting a device’s physical location based on its IP address have been plagued by noisy data and inaccurate results.
A new approach to IP geolocation, dubbed HMCGeo, seeks to address these issues by framing the problem as a hierarchical multi-label classification task. Instead of relying solely on regression models, HMCGeo uses a combination of local and global outputs to predict the target host’s region across multiple granularities.
The authors of HMCGeo begin by dividing a city into regions based on administrative boundaries, postal codes, and census blocks. They then map landmark hosts to different granularities and summarize their hierarchical relationships using a topology-based landmark selection process.
In the classification phase, HMCGeo uses residual connection-based feature extraction units to pass coarse-grained features to finer granularities. The model also employs attention prediction units to focus on relevant features at each granularity level. This allows HMCGeo to accurately predict regions at multiple scales and adapt to noisy input data.
To further improve the accuracy of HMCGeo, the authors introduce a probabilistic classification loss function that leverages hierarchical constraints between regions at different granularities. This approach enables the model to learn more robust relationships between regions and reduce errors caused by noisy data.
Experiments on datasets from New York City, Los Angeles, and Shanghai demonstrate the effectiveness of HMCGeo in predicting IP geolocation at various granularities. Compared to traditional regression-based methods, HMCGeo achieves significantly better performance across all three cities, particularly at finer granularities such as zip codes and census blocks.
One of the key advantages of HMCGeo is its ability to adapt to noisy input data, which is a common issue in IP geolocation. By using a combination of local and global outputs, HMCGeo can accurately predict regions even when the input data contains errors or inconsistencies.
Another benefit of HMCGeo is its scalability, as it can be easily extended to larger cities and more granularities without significant computational overhead. This makes it a promising solution for large-scale IP geolocation applications such as network optimization and cybersecurity.
In addition to its technical advantages, HMCGeo has the potential to improve the accuracy of various applications that rely on IP geolocation data.
Cite this article: “Accurate IP Geolocation with Hierarchical Multi-Label Classification”, The Science Archive, 2025.
Ip Geolocation, Hierarchical Multi-Label Classification, Cybersecurity, Online Advertising, Network Optimization, Regression-Based Methods, Noisy Data, Inaccurate Results, Probabilistic Classification Loss Function, Scalability