About the Joint Workshop

LEGAL 2026 and CALD-pseudo 2026

Access to text and speech data is essential for research, yet personal and sensitive information often prevents open sharing. Techniques such as pseudonymization and anonymization offer potential solutions, but their effectiveness, limitations, and impact on data utility require deeper investigation. Balancing privacy protection with meaningful scientific use remains a key challenge.

At the same time, legal and ethical requirements increasingly shape how language resources can be created, processed, and distributed. Regulatory frameworks, such as the GDPR, the Data Act, and the Artificial Intelligence Act, affect access, reuse, and documentation duties for both text and speech data, creating a complex environment that demands interdisciplinary insight.

The workshop brings these two perspectives together by addressing both the technical and practical aspects of de-identification as well as the legal and ethical obligations governing data handling. Topics include anonymization and pseudonymization methods, compliance in practical workflows, provenance and rights tracking, and emerging approaches to legal metadata. The goal is to foster responsible, legally sound, and technically robust innovation in human language technologies.

Contact

For inquiries, please contact ingo.siegert@ovgu.de for questions about LEGAL2026 or mormor.karl@svenska.gu.se for questions about CALD-pseudo 2026.

Keynote: Maja Bogataj Jančič (Open Data and Intellectual Property Institute ODIPI)

Dr Maja Bogataj Jančič is the founder and head of the Open Data and Intellectual Property Institute ODIPI. She has also been the head of the Institute for Intellectual Property (IPI) since its establishment in 2004. Maja is a copyright expert; her recent work focuses on open science, open data, data governance and artificial intelligence, as well as open science issues and the legal framework of copyright and data for research and science. Maja is the National Coordinator for Slovenia and the Regional Coordinator for the six Western Balkan countries of the Knowledge Rights 21 project. In 2020-2024, she co-chaired the Data Governance Working Group of The Global Partnership on Artificial Intelligence (GPAI).

Maja is a member of the Expert Council of the Slovenian Open Science Community (SSOZ) and the head of the newly established Expert Body for Legal Issues Related to Copyright and Data Governance. She is a a member of the Advisory Committee on Copyright and other Legal Matters (CLM) of the International Federation of Library Associations and Institutions (IFLA). Maja is a president of the supervisory board of the National and University Library (NUK). She is a Senior Research Fellow at the Centre on Knowledge Governance. She is a Vice President of COMMUNIA. She Creative Commons Slovenia since 2003.

Maja graduated from the Faculty of Law in Ljubljana (1996), obtained her LL.M. from the Faculty of Law in Ljubljana (1999, Economics), Harvard Law School (2000, Law) and Facoltà di Giurisprudenza di Torino (2005, Intellectual Property), and her Ph.D. from the Faculty of Law in Ljubljana (2006, Copyright).

Invited Talk: Prof. Dr. Ivan Habernal (Ruhr University Bochum, Germany)

Abstract:

Privacy and anonymization in NLP: Are we barking up the wrong tree? LLMs dominate the world, they are claimed to be the mightiest privacy destroyers, and no anonymized text can escape their capabilities. That’s for sure, but maybe we are barking up the wrong tree. What if our attacks on privacy are flawed? What if we overstate our privacy attack strengths? What if we never defined privacy, anonymity, anonymization, and attackers in the first place? In this talk, I’ll try to address the elephant in the room, both from an empirical and theoretical perspective.

Bio:

Prof. Ivan Habernal holds a full professorship of trustworthy human language technologies at the Ruhr-University Bochum, Germany, and is affiliated with the Research Center for Trustworthy Data Science and Security. Before that, he held academic positions at University of Paderborn, Ludwig Maximilian University of Munich, and Technical University of Darmstadt. His research group focuses on privacy-preserving methods in natural language processing and on legal natural language processing. He's been an active member of the NLP community by serving as an editor of TACL or ARR, an has co-organized the series of the PrivateNLP workshops co-located with the major ACL conferences over the last few years.

Submission

Authors are invited to submit original and unpublished research papers in the following categories:

Long papers (up to 8 pages) for substantial contributions

Short papers (up to 4 pages) for:

Small, focused contributions or ongoing or preliminary work
Extended abstracts for non-technical submissions only, such as conceptual, theoretical, legal, ethical, policy-oriented, or position papers. Extended abstract submissions are expected to be developed into regular papers by the camera-ready submission deadline.

The full papers will be published as workshop proceedings along with the LREC main conference. They should follow the LREC stylesheet, which is available on the conference website on the Author’s kit page. Unlike the main conference, we allow appendices of up to 10 pages already in the review phase. However, the reviewers will not be required to look in the appendices and must be able to review the paper based on everything contained within the main body of the paper (as if there were no appendices).

Submission deadline: 20th of February 2026

Submission link: https://softconf.com/lrec2026/LEGAL2026/

When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

Topics of interest include:

1. Legal Aspects of Language Data (LEGAL2026)

Regulatory frameworks and global governance: Impact of the GDPR, EU Data Act, Data Governance Act, Digital Services Act, AI Act, and international regulations (e.g., China’s 2023 Draft Rules on Generative AI, U.S. AI Bill of Rights) on access, circulation, and reuse of language and speech data; statutory exceptions for text and data mining.
Intellectual property, data protection, and LLM governance: Legal issues surrounding training data, derivative datasets, and model outputs; copyright, data governance, and data protection obligations in the development and deployment of Large Language Models.
Ethics, fairness, trust, and transparency: Ethical considerations in personal data collection and reuse; ensuring fairness, transparency, and accountability in language and speech technologies.
Compliance in practice: Legal metadata, provenance, consent documentation, usage rights, and machine-readable licensing; practical workflows for lawful data collection, annotation, and sharing.
Operationalizing compliance: Tools and methods that support automated compliance checking, risk detection, consent tracking, and policy-aware data filtering; language technologies assisting in legal compliance.
Emerging and grey areas: Legal uncertainties around synthetic or augmented data, LLM-generated content, and cross-modal leakage; evolving interpretations of anonymization thresholds.
Interdisciplinary and cross-border coordination: Global harmonization of legal and technical approaches; collaboration models between researchers, legal experts, and infrastructure providers; navigating jurisdictional inconsistencies.

2. Pseudonymization, Anonymization, and De-identification: Theoretical, Methodological, and Technical Aspects (CALD-pseudo 2026)

Detection and classification of personal information (PI): Automatic identification of PI in text, speech, and multimodal data; context-dependent and indirect indicators of identity.
Replacement and transformation of PI: Context-sensitive pseudonymization and anonymization methods; substitution, masking, obfuscation; maintaining coherence across discourse and modalities.
Utility and bias after de-identification: Effects of de-identification on downstream task performance, linguistic research validity, readability, and bias amplification or reduction.
Approaches to evaluation and adversarial testing: Metrics and frameworks for assessing de-identification quality; adversarial re-identification attempts; robustness and failure-mode analysis.
Dataset creation for de-identification research: Methodological, ethical, and annotation-related considerations in building corpora for training or evaluating de-identification systems.
Low-resource scenarios: Techniques for de-identification in settings with limited data, scarce annotations, or underrepresented languages; transfer and multilingual approaches.
Speech-specific challenges: Removing speaker identity cues in audio; voice anonymization; cross-modal leakage between text, transcripts, and acoustic features.
Cross-disciplinary applications and challenges: Integrating de-identification techniques into real-world workflows in areas such as linguistics, social sciences, digital humanities, healthcare, and other private- or public-sector data environments.

Important Dates

F̶e̶b̶r̶u̶a̶r̶y̶ ̶2̶0̶ February 22, 2026, 23:59 CET

Deadline for submission

March 16, 2026 (tentative)

Notification of acceptance

March 30, 2026

Submission of final version of accepted papers (strict)

May 12, 2026

Workshop day

Progam

Tuesday, May 12, 2026

9:00 - 10:10

Welcome Session

9:00 - 9:15

Welcome and basic information from the organizers
Workshop Organizers

9:15 - 10:10

Introductory Lecture
Paweł Kamocki

10:10 - 10:30

Oral Presentations I: LEGAL

10:10 - 10:30

Transparency as Architecture: Structural Compliance Gaps in EU AI Act Article 50 II
Vera Schmitt¹, Niklas Kruse², Premtim Sahitaj³, Julius Schöning²
¹TU Berlin, ²Hochschule Osnabrück, ³Technical University of Berlin

10:30 - 11:00

Coffee Break

11:00 - 11:55

Keynote

11:00 - 11:55

Keynote Speech
Maja Bogataj Jančič

11:55 - 13:00

Oral Presentations II: LEGAL

11:55 - 12:15

Towards Robust Evaluation for Privacy QA Systems
Anna Leschanowsky¹, Zahra Kolagar¹, Erion Çano², Ivan Habernal², Dara Hallinan³, Emanuël Habets³, Birgit Popp¹
¹Fraunhofer IIS, ²Ruhr-University Bochum, ³FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, ⁵International Audio Laboratories Erlangen

12:15 - 12:35

LDS Contractual Framework: Principles, Status and Implementation
Penny Labropoulou¹, Kossay Talmoudi², Dimitrios Gkoumas³, Katerina Gkirtzou⁴, Miltos Deligiannis¹, Leon Voukoutis¹, Athanasia Kolovou¹, Khalid Choukri⁵, Stelios Piperidis⁶, Dimitrios Galanis⁷
¹ILSP / Athena RC, ²ELDA, ³Institute for Language and Speech Processing (ILSP), Athena Research Center (ATHENA RC), ⁴ILSP/Athena Research Center, ⁵ELRA/ELDA, ⁶Athena RC/ILSP, ⁷Institute for Language and Speech Processing, Athena Research Center

12:35 - 12:55

Authorship Attribution in the Times of LLMs within the Framework of the CRediT Taxonomy
Pawel Kamocki¹ and Andreas Witt²
¹Leibniz Institute for German Language, ²Leibniz Institute for the German Language

13:00 - 14:00

Lunch Break

14:00 - 14:55

Invited Talk

14:00 - 14:55

Privacy and anonymization in NLP: Are we barking up the wrong tree?
Prof. Dr. Ivan Habernal

14:55 - 16:00

Oral Presentations III: CALD-pseudo

14:55 - 15:15

DeID-Clinic: A Risk-Aware Pseudonymization Framework for Clinical Text De-identification and Re-identification Risk Assessment
Angel Paul¹, Dhivin Shaji¹, Lifeng Han², Warren Del-Pinto¹, Goran Nenadic¹, Suzan Verberne³
¹University of Manchester, ²The University of Manchester, ³LIACS, Leiden University

15:15 - 15:35

Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models
Gabriel Loiseau¹, Damien Sileo¹, Damien Riquet², Maxime Meyer², Marc Tommasi³
¹Inria, ²Hornetsecurity, ³Lille University

15:35 - 15:55

Birds of a Feather: Do Embedding Representations of Personal Information Flock Together?
Maria Irena Szawerna and Simon Dobnik
University of Gothenburg

16:00 - 16:30

Coffee Break

16:30 - 17:45

Poster Session

16:30 - 17:45

Modelling Legal Compliance in a Consent Wizard Application as Part of a Research-Centered and User-Oriented Data Infrastructure
Aliena Strathmann¹, Marc-Levin Joppek¹, Maryam Mohammadi¹, Katja Politt², Paul T. Schrader¹, Annett B. Jorschick¹, Hendrik Buschmeier¹
¹Bielefeld University, ²Rostock University

16:30 - 17:45

Balancing FAIR and GDPR: A Governance Framework for Oral Archives
Elvira Mercatanti¹, Monica Monachini², Giovanni Abete³, Silvia Calamai⁴, Sergio Canazza⁵, Alessandro Casellato⁶, Virginia Niri⁴, Cesarina Vecchia³, Giulia Zitelli Conti⁶, Giada Zuccolo⁵
¹CNR-ILC, ²Institute of Computational Linguistics "A. Zampolli" - CNR, ³Università degli Studi di Napoli Federico II, Italia, ⁴Università degli Studi di Siena, Italia, ⁵Università degli Studi di Padova, Italia, ⁶Università Ca' Foscari Venezia, Italia

16:30 - 17:45

Legal Considerations in the Use of Synthetic Data for AI Development and Finetuning: The Case of LLMs4EU
Kossay Talmoudi¹, Khalid Choukri², Amélie Gourgeot¹, Florine Astruc³
¹ELDA, ²ELRA/ELDA, ³ELT-EDIC

16:30 - 17:45

Evaluating Encoder- and LLM-Based Approaches for Robust Indirect Personal Identifier Detection
Christoph Otto¹, Ibrahim Baroud², Akiko Aizawa³, Sebastian Möller⁴, Roland Roller⁵, Lisa Raithel⁶
¹University of Potsdam, ²Technische Universität Berlin, ³National Institute of Informatics, ⁴Quality and Usability Lab, TU Berlin, ⁵DFKI SLT Lab, ⁶Technische Universitaet Berlin, BIFOLD, DFKI GmbH, Charité-IKIM

16:30 - 17:45

VEIL: A Benchmark for Value-Preserving Entity Identification Limitation
Darina Gold¹, Shadi Rastegar¹, Alina Liebel¹, Alessandra Zarcone²
¹Fraunhofer IIS, ²Technische Hochschule Augsburg

17:45 - 18:00

Closing Session

17:45 - 18:00

Closing Remarks
Workshop Organizers

Organizers and Contact of the LEGAL Workshop:

Ingo Siegert, Otto-von-Guericke-Universität Magdeburg, Germany

Kossay Talmoudi, ELRA/ELDA, France

Khalid Choukri, ELRA/ELDA, France

Pawel Kamocki, IDS Mannheim, Germany

Organizers and Contact of the CALD-pseudo Workshop:

Maria Irena Szawerna, University of Gothenburg, Sweden

Simon Dobnik, University of Gothenburg, Sweden

Therese Lindström Tiedemann, University of Helsinki, Finland

Pierre Lison, Norwegian Computing Center & University of Oslo, Norway

Ildikó Pilán, Norwegian Computing Center, Norway

Ricardo Muñoz Sánchez, University of Gothenburg, Sweden

Lisa Södergård, University of Helsinki, Finland

Elena Volodina, University of Gothenburg, Sweden

Xuan-Son Vu, Lund University, Sweden

Program Committee

Khalid Choukri

Claudia Cevenini

Erik Ketzan

Prodromos Tsiavos

Andreas Witt

Paweł Kamocki

Kim Nayyer

Krister Lindèn

Ingo Siegert

Catherine Jasserand

Isabel Trancoso

Henrik Bushschmeier

Annett Jorschick

Lars Ahrenberg

Terhi Ainiala

Emilia Aldrin

Lucas Georges Gabriel Charpentier

Simon Dobnik

Emilie Francis

Linnea Gustafsson

Ivan Habernal

Udo Hahn

Aron Henriksson

Nikolai Ilinykh

Dimitrios Kokkinakis

Herb Lange

Tomas Lehecka

Therese Lindström Tiedemann

Pierre Lison

Peter Ljunglöf

Ricardo Muñoz Sánchez

Ildikó Pilán

Tatjana Scheffler

Maria Irena Szawerna

Lisa Södergård

Vicenç Torra

Thomas Vakili

Shubham Vatsal

Elena Volodina

Xuan-Son Vu

Jan-Ola Östman

LEGAL2026 and

CALD-pseudo 2026

Joint Workshop on Legal and Ethical Issues in Human Language Technologies (LEGAL2026) and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (CALD-pseudo 2026)

About the Joint Workshop

LEGAL 2026 and CALD-pseudo 2026

Contact

Keynote: Maja Bogataj Jančič (Open Data and Intellectual Property Institute ODIPI)

Invited Talk: Prof. Dr. Ivan Habernal (Ruhr University Bochum, Germany)

Abstract:

Bio:

Submission

Topics of interest include:

1. Legal Aspects of Language Data (LEGAL2026)

2. Pseudonymization, Anonymization, and De-identification: Theoretical, Methodological, and Technical Aspects (CALD-pseudo 2026)

Important Dates

F̶e̶b̶r̶u̶a̶r̶y̶ ̶2̶0̶ February 22, 2026, 23:59 CET

March 16, 2026 (tentative)

March 30, 2026

May 12, 2026

Progam

Organizers and Contact of the LEGAL Workshop:

Organizers and Contact of the CALD-pseudo Workshop:

Program Committee