This is a list of projects that can serve as a basis for student theses and similar coursework. If you are a student at the IT University of Copenhagen and are interested in working on one of these topics for the degree you’re pursuing, please feel free to get in touch with me.

Research-Oriented Projects

Inference with very large language models under resource constraints (added 3/2023)

Recently, very large pretrained language models such as GPT-3 and BLOOM have achieved a performance leap in text generation. These models are very resource-hungry, and ITU currently doesn’t have the hardware to run BLOOM in its recommended setup (8 V100 GPUs). There are various ways to work around this and run the model on more limited configurations with different trade-offs. The goal of this thesis would be to explore these and develop recommendations for running very large models under strong resource constraints.
Multifaceted representation of people in discourse

Bias in machine learning is often studied with respect to isolated traits such as gender, but humans have multifaceted identities and affiliate with multiple interacting traits at the same time. The goal of this project would be to learn multifaceted representations of human identities from text corpora with a view to the automatic recognition of biases, stereotypes or bullying.
Recognising veiled toxicity with persona models

Veiled toxicity refers to hurtful things people say to each other without using explicit slurs or foul language. The goal of this project is to replicate prior work on the automatic detection of veiled toxicity and extend it based on a representation of the character traits of the discourse participants.
Politeness and formality in machine translation

Different languages have different means to address people in a polite way or make polite requests, for instance by using special polite pronouns and word forms (such as German du vs. Sie, French tu vs. vous, Danish du vs. De), by using polite words such as sir or please, or by using more elaborate, polite ways of phrasing requests. These linguistic means are language-specific and evolve over time, which poses challenges for translation. Formality transfer in machine translation has mostly relied on explicit coding/annotation of formality levels in the input, but can we also create models that automatically recognise what forms are appropriate in a specific context?
Cross-lingual alignment of referring expressions

Texts contain referring expressions that point to things in the real world. Given parallel text in two languages, find ways to match referring expressions having the same referents and the same functions in the text. This project is ideal for a student with a background or interest in linguistics in addition to good knowledge of NLP and machine learning.
Visualisation of complex linguistic structures across languages

When you work with texts in multiple languages that are marked up for complex linguistic structures, it quickly becomes very difficult to see what’s going on across languages. This project aims to develop a user interface to work efficiently with this type of multidimensional data. It would be a good match for a student with an interest in and experience of user interface development.