About me

News

Bio

I’m an Associate Professor in the Data Science Section at the IT University of Copenhagen, where I’m a part of the NLPnorth natural language processing group. Until 2022, I was also a part of the Computational Linguistics Group at Uppsala University as a Researcher and Associate Professor (Docent) in Computational Linguistics. In 2019-2021, I spent two years as a visitor and Senior Researcher at the School of Informatics of the University of Edinburgh. From 2009 to 2011, I was a member of the machine translation group at Fondazione Bruno Kessler in Trento.

I hold a PhD in Computational Linguistics from Uppsala University. My PhD thesis was on Discourse in Statistical Machine Translation and received the Best Thesis Award of the European Association for Machine Translation in 2015. My supervisors were Joakim Nivre, Jörg Tiedemann and Marcello Federico. I also have an MA in Nordic Philology from the University of Basel.

I am co-lead of the Speech and Language Collaboratory at the Danish Pioneer Centre for Artificial Intelligence and an ELLIS member.

I’m one of the organisers of the Workshop on Computational Approaches to Discourse (CODI) at EMNLP 2020, EMNLP 2021, COLING 2022, ACL 2023, EACL 2024, ACL 2025 and ACL 2026. I also co-organised the Workshops on Gender Bias in Natural Language Processing (GeBNLP) from 2019-2022 and the Workshop on Discourse in MT (DiscoMT), last held at EMNLP-IJCNLP 2019. I was one of the programme chairs of NODALIDA 2023, a senior area chair for Discourse and Pragmatics at ACL 2023, EMNLP 2023 and ACL 2024 and an area chair at COLING 2025 and IJCAI 2025.

Research

My work is in computational linguistics/natural language processing (NLP), and I am particularly interested in pragmatic aspects of large language models, where pragmatics means the study of context in language and of how to achieve things with language. My goal is to create NLP systems with a higher awareness of linguistic context and non-linguistic aspects of communicative situations, so that they interpret and generate language appropriately according to the communicative situation in which they are applied. To achieve this, they should language in the same way humans do (anthropomimesis), without however pretending to be human (anthropomorphism).

Topics I am currently interested in include the following:

Uncertainty quantification and communication

This is the core of my current research programme, and its principles are laid out in a TACL paper on anthropomimetic uncertainty (1). How can we measure how certain or confident large language models are of the output they produce (2, 3, 4)? How can we ensure that their measured confidence is realistic and not overblown, and that the text they produce correctly reflects this confidence? How can large language models effectively convey their level of confidence to diverse user groups? How can we deal with ambiguity in human communication without unnecessarily forcing disambiguation when it isn’t necessary (5), and how can we strategically exploit ambiguity to reduce human overreliance on large language models, particularly in educational contexts?

Healthcare applications

I am a part of several exciting projects and collaborations aiming to apply large languages responsibly in the area of healthcare:

  • Citizen-facing solutions in the context of a Danish regional urgent care helpline (1813AI).
  • Developing solutions for large-scale analysis of therapy conversations to improve our understanding of how psychotherapy works (ALF).
  • Efficient agent-based solutions for medical coding (6).

Toxicity and bias

I am keen on understanding how different identities get affected by NLP systems (7), and I want to model in an explainable way what makes specific forms of toxic language toxic, for instance in the context of dehumanising language (8) and threats. I was also a part of the SafeNet project, in which we studied the response of social media platforms to reports of unsafe language across 19 European countries and have studied gender bias in NLP, particularly in the interpretation and generation of referring expressions (9, 10).

Reference

I’m particularly curious about how referring expressions such as pronouns (like she, it, they or this) and lexical noun phrases (like a cat, the house, scrambled eggs or my research) are used across languages, how human translators treat them, what machine translation systems should do with them and how we can use multilingual data to help us interpret them automatically.

These are problems I’ve studied for many years and have approached from many angles:

  • Discourse-level MT (11) and its evaluation (12, 13, 14)
  • Cross-lingual coreference resolution (15), cross-lingual pronoun prediction (16, 17) and neural language modelling (18)
  • Automatic discovery of discourse-related language contrasts in human-translated parallel corpora (19, 20)
  • Cross-lingual studies of the generation and interpretation of referring expressions with human subjects (21, 22, 23)
  • Coreference annotation of multilingual corpora (24, 25)