Advances in Faithful Forgetting for Language Models

Sunday 30 March 2025


The quest for faithful forgetting in language models has been a longstanding challenge in the field of artificial intelligence. These powerful tools have revolutionized our ability to generate text, respond to questions, and even create art, but they often struggle to forget information that’s no longer relevant or accurate. This issue can lead to biases, inaccuracies, and even security risks if left unchecked.


Researchers at Seoul National University and Adobe Research have made significant progress in addressing this problem with the development of a new framework called FAITHUN (Faithful Unlearning Evaluation Process). The core idea behind FAITHUN is to create a benchmark that evaluates the faithfulness of unlearning in language models, which is essential for ensuring that these tools can accurately forget information when needed.


The team’s approach involves constructing a dataset consisting of Wikidata triples, which are essentially knowledge statements in the form of subject-predicate-object relationships. These triples are then used to generate questions and answers, allowing researchers to evaluate the performance of language models on tasks such as question answering, paraphrasing, and multi-hop reasoning.


One of the key innovations behind FAITHUN is the concept of superficial unlearning, which refers to the phenomenon where a language model fails to erase interconnected knowledge that should be removed. This can lead to the retention of irrelevant information, making it difficult for the model to accurately forget what’s no longer relevant.


To address this issue, the researchers developed a novel unlearning method called Knowledge-Localized Unlearning (KLUE). KLUE identifies knowledge-relevant neurons in a language model and updates them using a general gradient ascent algorithm with an auxiliary retention loss term. This approach allows KLUE to selectively erase irrelevant information while preserving important knowledge.


The team conducted extensive experiments on various language models, including Llama-3.2 and Gemma-2, to evaluate the effectiveness of FAITHUN and KLUE. The results showed that existing unlearning methods often struggle with superficial unlearning, whereas KLUE successfully mitigates this issue by selectively erasing irrelevant information.


The researchers also performed ablation studies to understand the relative importance of each component in KLUE. These experiments revealed that regularization, localization, and sample selection are all crucial for enhancing the faithfulness of unlearning. Regularization helps to reduce overfitting, while localization enables the model to focus on relevant knowledge neurons. Sample selection, meanwhile, allows the model to selectively update information based on its relevance.


Cite this article: “Advances in Faithful Forgetting for Language Models”, The Science Archive, 2025.


Language Models, Artificial Intelligence, Faithful Forgetting, Faithun, Language Processing, Unlearning, Knowledge Graph, Wikidata Triples, Superficial Unlearning, Klue


Reference: Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung, “FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge” (2025).


Leave a Reply