Saturday 15 March 2025
Researchers have made significant progress in understanding how humans communicate with each other, but a crucial aspect of this process has been largely overlooked – identifying who is being addressed during a conversation. In a recent study, scientists have developed a benchmark for recognizing addressees in multi-party dialogues, shedding light on the complexities involved.
Traditionally, research has focused on dyadic interactions between two people, ignoring the intricate dynamics that arise when three or more individuals engage in conversations. To bridge this gap, researchers created a unique corpus of spontaneous, multi-modal, triadic discussions, capturing conversations among three participants as they debated topics such as which city would be best for Japan’s alternative capital.
The team then annotated a subset of these conversations to identify who was being addressed during each turn. The results revealed that in approximately 20% of instances, the addressee is explicitly specified, while in most cases (around 80%), no specific individual is targeted, and anyone can take the next turn.
To evaluate the task’s complexity, researchers tested a large language model (LLM) on addressee recognition. The model was given prompts with context information from the conversations, including textual and visual cues such as gaze behavior. However, the results were disappointing – the LLM struggled to identify addressees, achieving an accuracy only marginally above chance.
Intriguingly, when researchers added simple gaze features to the prompt, the performance actually decreased. This suggests that current LLMs may not be equipped to effectively incorporate contextual information, including non-verbal cues like eye gaze.
The study’s findings have significant implications for developing advanced dialogue systems capable of navigating complex, real-world conversations. Current models may be limited in their ability to recognize addressees and respond accordingly, potentially disrupting the natural flow of discussions.
To move forward, researchers will need to explore more sophisticated methods for incorporating contextual information and develop new models that can better capture the intricate interplay between participants in multi-party conversations. By doing so, they can create systems that are more intuitive and effective in understanding human communication.
Cite this article: “Deciphering Whos Being Addressed: The Elusive Task of Addressee Recognition in Multi-Party Dialogues”, The Science Archive, 2025.
Human Communication, Dialogue Systems, Addressee Recognition, Multi-Party Dialogues, Gaze Behavior, Language Models, Non-Verbal Cues, Contextual Information, Conversation Dynamics, Triadic Discussions







