LEGAL2026 and

CALD-pseudo 2026

Joint Workshop on Legal and Ethical Issues in Human Language Technologies (LEGAL2026) and Computational Approaches to Language Data Pseudonymization, Anonymization, De-identification, and Data Privacy (CALD-pseudo 2026)

(Image AI generated)

About the Joint Workshop

LEGAL 2026 and CALD-pseudo 2026

Access to text and speech data is essential for research, yet personal and sensitive information often prevents open sharing. Techniques such as pseudonymization and anonymization offer potential solutions, but their effectiveness, limitations, and impact on data utility require deeper investigation. Balancing privacy protection with meaningful scientific use remains a key challenge.

At the same time, legal and ethical requirements increasingly shape how language resources can be created, processed, and distributed. Regulatory frameworks, such as the GDPR, the Data Act, and the Artificial Intelligence Act, affect access, reuse, and documentation duties for both text and speech data, creating a complex environment that demands interdisciplinary insight.

The workshop brings these two perspectives together by addressing both the technical and practical aspects of de-identification as well as the legal and ethical obligations governing data handling. Topics include anonymization and pseudonymization methods, compliance in practical workflows, provenance and rights tracking, and emerging approaches to legal metadata. The goal is to foster responsible, legally sound, and technically robust innovation in human language technologies.

Contact

For inquiries, please contact mail@legal2026.mobiles.de.

Submission

Authors are invited to submit original and unpublished research papers in the following categories:

Long papers (up to 8 pages) for substantial contributions

Short papers (up to 4 pages) for:

  • Small, focused contributions or ongoing or preliminary work
  • Extended abstracts for non-technical submissions only, such as conceptual, theoretical, legal, ethical, policy-oriented, or position papers. Extended abstract submissions are expected to be developed into regular papers by the camera-ready submission deadline.

The full papers will be published as workshop proceedings along with the LREC main conference. They should follow the LREC stylesheet, which is available on the conference website on the Author’s kit page.

Submission deadline: 20th of February 2026

The submission link will be provided in due time.

When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

Topics of interest include:

1. Legal Aspects of Language Data (LEGAL2026)

  • Regulatory frameworks and global governance: Impact of the GDPR, EU Data Act, Data Governance Act, Digital Services Act, AI Act, and international regulations (e.g., China’s 2023 Draft Rules on Generative AI, U.S. AI Bill of Rights) on access, circulation, and reuse of language and speech data; statutory exceptions for text and data mining.

  • Intellectual property, data protection, and LLM governance: Legal issues surrounding training data, derivative datasets, and model outputs; copyright, data governance, and data protection obligations in the development and deployment of Large Language Models.

  • Ethics, fairness, trust, and transparency: Ethical considerations in personal data collection and reuse; ensuring fairness, transparency, and accountability in language and speech technologies.

  • Compliance in practice: Legal metadata, provenance, consent documentation, usage rights, and machine-readable licensing; practical workflows for lawful data collection, annotation, and sharing.

  • Operationalizing compliance: Tools and methods that support automated compliance checking, risk detection, consent tracking, and policy-aware data filtering; language technologies assisting in legal compliance.

  • Emerging and grey areas: Legal uncertainties around synthetic or augmented data, LLM-generated content, and cross-modal leakage; evolving interpretations of anonymization thresholds.

  • Interdisciplinary and cross-border coordination: Global harmonization of legal and technical approaches; collaboration models between researchers, legal experts, and infrastructure providers; navigating jurisdictional inconsistencies.

2. Pseudonymization, Anonymization, and De-identification: Theoretical, Methodological, and Technical Aspects (CALD-pseudo 2026)

  • Detection and classification of personal information (PI): Automatic identification of PI in text, speech, and multimodal data; context-dependent and indirect indicators of identity.

  • Replacement and transformation of PI: Context-sensitive pseudonymization and anonymization methods; substitution, masking, obfuscation; maintaining coherence across discourse and modalities.

  • Utility and bias after de-identification: Effects of de-identification on downstream task performance, linguistic research validity, readability, and bias amplification or reduction.

  • Approaches to evaluation and adversarial testing: Metrics and frameworks for assessing de-identification quality; adversarial re-identification attempts; robustness and failure-mode analysis.

  • Dataset creation for de-identification research: Methodological, ethical, and annotation-related considerations in building corpora for training or evaluating de-identification systems.

  • Low-resource scenarios: Techniques for de-identification in settings with limited data, scarce annotations, or underrepresented languages; transfer and multilingual approaches.

  • Speech-specific challenges: Removing speaker identity cues in audio; voice anonymization; cross-modal leakage between text, transcripts, and acoustic features.

  • Cross-disciplinary applications and challenges: Integrating de-identification techniques into real-world workflows in areas such as linguistics, social sciences, digital humanities, healthcare, and other private- or public-sector data environments.

Important Dates

February 20, 2026

Deadline for submission

March 11, 2026 (tentative)

Notification of acceptance

March 30, 2026

Submission of final version of accepted papers (strict)

May 11, 12, or 16, 2026

Workshop day

Organizers and Contact of the LEGAL Workshop:

Ingo Siegert, Otto-von-Guericke-Universität Magdeburg, Germany

Kossay Talmoudi, ELRA/ELDA, France

Khalid Choukri, ELRA/ELDA, France

Pawel Kamocki, IDS Mannheim, Germany

CALD-pseudo 2026

Maria Irena Szawerna, University of Gothenburg, Sweden

Simon Dobnik, University of Gothenburg, Sweden

Therese Lindström Tiedemann, University of Helsinki, Finland

Pierre Lison, Norwegian Computing Center & University of Oslo, Norway

Ildikó Pilán, Norwegian Computing Center, Norway

Ricardo Muñoz Sánchez, University of Gothenburg, Sweden

Lisa Södergård, University of Helsinki, Finland

Elena Volodina, University of Gothenburg, Sweden

Xuan-Son Vu, Lund University, Sweden

Program Committee

Khalid Choukri,

Claudia Cevenini,

Erik Ketzan,

Prodromos Tsiavos,

Andreas Witt,

Paweł Kamocki,

Kim Nayyer,

Krister Lindèn,

Ingo Siegert,

Catherine Jasserand,

Isabel Trancoso

Henrik Bushschmeier

Annett Jorschick

Lars Ahrenberg

Terhi Ainiala

Emilia Aldrin

Lucas Georges Gabriel Charpentier

Simon Dobnik

Emilie Francis

Linnea Gustafsson

Ivan Habernal

Udo Hahn

Aron Henriksson

Nikolai Ilinykh

Dimitrios Kokkinakis

Herb Lange

Therese Lindström Tiedemann

Pierre Lison

Peter Ljunglöf

Ricardo Muñoz Sánchez

Ildikó Pilán

Tatjana Scheffler

Maria Irena Szawerna

Lisa Södergård

Vicenç Torra

Elena Volodina

Thomas Vakili

Xuan-Son Vu

Jan-Ola Östman