Saturday 01 February 2025
The development of Artificial Intelligence (AI) in the military domain has raised concerns about its responsible use and potential harm to humans. A recent session at the Responsible AI in the Military Domain conference aimed to address these concerns by highlighting the need for human-centred test and evaluation (T&E) of AI systems.
Traditionally, T&E protocols have focused on testing AI algorithms without considering the human component. However, as AI systems are increasingly being designed to work with humans, it is essential to incorporate human considerations into the testing process. This requires a shift from solely focusing on algorithmic performance to understanding how humans interact with and respond to AI systems.
One of the key challenges in human-centred T&E is predicting real-world performance of AI-enabled capabilities. The community faces significant gaps in understanding and predicting human behavior, even with simple systems. As AI systems become more complex, it becomes increasingly difficult to test their performance in all possible scenarios.
Another challenge is that typical AI performance measures focused on algorithmic performance are inadequate for evaluating the effectiveness of AI systems in operational contexts. Human factors such as training, mental models, and understanding of the system play a crucial role in determining how well an AI system performs. Therefore, T&E protocols must take into account these human factors to ensure that AI systems are designed and tested with humans in mind.
The session emphasized the need for a comprehensive approach to T&E that includes the user as part of the system. This requires ongoing testing and evaluation across the lifecycle of the system, including training, deployment, and maintenance. Moreover, it is essential to communicate the results of T&E to those using and making decisions regarding the use of AI-based systems.
To address these challenges, the session proposed several strategies. Firstly, digital engineering can prepare test and evaluation environments for emerging AI-enabled systems. Secondly, traditional military governance and assurance models may need to be adapted to accommodate the dynamic and unpredictable nature of AI systems. Finally, robust governance frameworks must be in place at every level to ensure that the autonomy granted to AI systems is carefully managed and aligned with operational objectives.
The development of human-centred T&E protocols will require collaboration between technologists, policymakers, and operators. It is essential to establish standards and requirements for testing and evaluating AI systems that take into account human factors. Moreover, effective communication between technical and non-technical communities must be improved to ensure that operators and policymakers understand the risks associated with system use.
Cite this article: “Human-Centred Testing and Evaluation of Artificial Intelligence Systems in Military Contexts”, The Science Archive, 2025.
Artificial Intelligence, Military, Responsible Ai, Human-Centred Test And Evaluation, Algorithmic Performance, Human Behavior, Operational Contexts, Digital Engineering, Governance Frameworks, Autonomy Management.







