Friday 07 March 2025
The River Thames, a lifeline for London and its inhabitants, has long been plagued by issues of water quality. In recent years, concerns have grown about the impact of human activity on the river’s ecosystem, leading to a surge in efforts to monitor and improve its health. A new study published today sheds light on the complex dynamics at play, employing cutting-edge machine learning techniques to analyze decades’ worth of data.
The research team, led by Hankun He and Christian Beck, from Queen Mary University of London, set out to better understand the relationships between various water quality indicators, including dissolved oxygen levels, temperature, electrical conductivity, pH, ammonium, turbidity, and rainfall. By examining time series data from 1992 to 2020, they aimed to identify patterns and correlations that could inform more effective management strategies.
Their approach involved combining empirical mode decomposition (EMD) with superstatistical methods, a novel application of this technique in the field of hydrology. EMD is a signal processing algorithm capable of decomposing complex time series into its constituent components, revealing hidden structures and trends. Superstatistics, on the other hand, allows researchers to model non-Gaussian distributions, which are common in natural systems.
By integrating these tools, the team was able to identify key factors influencing dissolved oxygen levels, such as temperature, pH, and rainfall patterns. They also discovered that the distance from the sea played a significant role, with oxygen concentrations decreasing as one moves upstream. This finding highlights the importance of considering geographical context when assessing water quality.
To further explore these relationships, the researchers employed machine learning algorithms to predict dissolved oxygen levels based on historical data. Their results showed that the Light Gradient Boosting Machine (LGBM) outperformed other models in terms of accuracy and precision. SHapley Additive exPlanations (SHAP), a technique for explaining complex model predictions, revealed that temperature, pH, and time of year were crucial factors driving these predictions.
The study’s findings have significant implications for policymakers and water quality management authorities. By better understanding the dynamics at play in the River Thames, they can develop targeted strategies to mitigate pollution and improve overall ecosystem health. This approach can be applied to other rivers and aquatic systems worldwide, informing more effective conservation efforts.
In addition to its practical applications, this research demonstrates the potential of machine learning and data-driven approaches in hydrology.
Cite this article: “River Thames Water Quality Dynamics Revealed Through Machine Learning Analysis”, The Science Archive, 2025.
Machine Learning, River Thames, Water Quality, Ecosystem Health, Hydrology, Data Analysis, Time Series, Empirical Mode Decomposition, Superstatistics, Light Gradient Boosting Machine







