Multilingual Annotated Corpus Resources

  • ParCorFull: A Parallel Corpus Annotated with Full Coreference
    ParCorFull is a parallel corpus of texts in English and German manually annotated for coreference. The corpus consists of TED talks and some news articles. We’re currently working on annotating the same texts in French.

    Citation:
    Ekaterina Lapshinova-Koltunski, Christian Hardmeier and Pauline Krielke. ParCorFull: a Parallel Corpus Annotated with Full Coreference. Proc. 11th Conference on Linguistic Resources and Evaluation (LREC), Miyazaki JP, May 2018, pp. 423–428.

  • ParCor 1.0: A Parallel Pronoun-Coreference Corpus
    ParCor is a parallel corpus of texts in English and German annotated for pronoun coreference (partial coreference chains linking pronouns to their closest antecedents). The TED talks in this corpus and their annotations are a subset of those in ParCorFull. Additionally, the corpus contains a number of publications from the EU Bookshop which are not included in ParCorFull.

    Citation: Liane Guillou, Christian Hardmeier, Aaron Smith, Jörg Tiedemann and Bonnie Webber. ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT. Proc. 10th International Conference on Language Resources and Evaluation (LREC), Reykjavík IS, May 2014, pp. 3191–3198.

Datasets for Pronoun Translation and Cross-lingual Pronoun Prediction

Pronoun-Focused MT Evaluation

Document-Level Decoding for Phrase-Based SMT