Resources
Multilingual Annotated Corpus Resources
-
ParCorFull: A Parallel Corpus Annotated with Full Coreference
ParCorFull is a parallel corpus of texts in English and German manually annotated for coreference. The corpus consists of TED talks and some news articles. We’re currently working on annotating the same texts in French.Citation:
Ekaterina Lapshinova-Koltunski, Christian Hardmeier and Pauline Krielke. ParCorFull: a Parallel Corpus Annotated with Full Coreference. Proc. 11th Conference on Linguistic Resources and Evaluation (LREC), Miyazaki JP, May 2018, pp. 423–428. -
ParCor 1.0: A Parallel Pronoun-Coreference Corpus
ParCor is a parallel corpus of texts in English and German annotated for pronoun coreference (partial coreference chains linking pronouns to their closest antecedents). The TED talks in this corpus and their annotations are a subset of those in ParCorFull. Additionally, the corpus contains a number of publications from the EU Bookshop which are not included in ParCorFull.Citation: Liane Guillou, Christian Hardmeier, Aaron Smith, Jörg Tiedemann and Bonnie Webber. ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT. Proc. 10th International Conference on Language Resources and Evaluation (LREC), Reykjavík IS, May 2014, pp. 3191–3198.
Datasets for Pronoun Translation and Cross-lingual Pronoun Prediction
-
DiscoMT 2015 Shared Task on Pronoun Translation
Citation:
Christian Hardmeier, Preslav Nakov, Sara Stymne, Jörg Tiedemann, Yannick Versley and Mauro Cettolo. Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation.. Proc. 2nd Workshop on Discourse and Machine Translation (DiscoMT), Lisbon PT, September 2015, pp. 1–16. -
WMT 2016 Shared Task on Cross-Lingual Pronoun Prediction
Citation:
Liane Guillou, Christian Hardmeier, Preslav Nakov, Sara Stymne, Jörg Tiedemann, Yannick Versley, Mauro Cettolo, Bonnie Webber and Andrei Popescu-Belis. Findings of the 2016 WMT shared task on cross-lingual pronoun prediction. Proc. 1st Conference on Machine Translation (WMT), Berlin DE, August 2016, pp. 525–542. -
DiscoMT 2017 Shared Task on Cross-Lingual Pronoun Predictions
Citation:
Sharid Loáiciga, Sara Stymne, Preslav Nakov, Christian Hardmeier, Jörg Tiedemann, Mauro Cettolo and Yannick Versley. Findings of the 2017 DiscoMT shared task on cross-lingual pronoun prediction. Proc. 3rd Workshop on Discourse in Machine Translation (DiscoMT), Copenhagen DK, September 2017, pp. 1–16.
Pronoun-Focused MT Evaluation
-
PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation
This is a test suite for manual or semi-automatic evaluation of pronouns in English-French MT.Citation:
Liane Guillou and Christian Hardmeier. PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation. Proc. 10th International Conference on Language Resources and Evaluation (LREC), Portorož SI, May 2016, pp. 636–643. -
Graphical User Interface for the PROTEST Test Suite
This is a user interface to facilitate manual evaluation of pronouns using the PROTEST test suites. Contact me if you’d like to use it so I can help you set it up.Citation:
Christian Hardmeier and Liane Guillou. A graphical pronoun evaluation tool for the PROTEST pronoun evaluation test suite. Proc. 19th Annual Conference of the European Association for Machine Translation (EAMT), Riga LV. Baltic Journal of Modern Computing 4 (2), May 2016, pp. 318–330. -
AutoPRF Pronoun Evaluation Tool
This is a tool to calculate the precision and recall of pronoun translations in MT output by scoring automatically against a reference translation. See also our paper at EMNLP 2018 for a discussion of this method and a comparison with other approaches.Citation:
Christian Hardmeier and Marcello Federico. Modelling Pronominal Anaphora in Statistical Machine Translation. Proc. 7th International Workshop on Spoken Language Translation (IWSLT), Paris FR, December 2010, pp. 283–289.
Document-Level Decoding for Phrase-Based SMT
-
Docent: Document-Level Local Search Decoder for Phrase-Based SMT
Docent is a decoder for phrase-based statistical machine translation that translates entire documents at a time and allows you to implement feature functions with dependencies across sentences.Citations:
Christian Hardmeier, Sara Stymne, Jörg Tiedemann and Joakim Nivre. Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation. Proc. 51st Annual Meeting of the Association for Computational Linguistics (ACL), Sofia BG, August 2013, pp. 193–198.Christian Hardmeier, Joakim Nivre and Jörg Tiedemann. Document-Wide Decoding for Phrase-Based Statistical Machine Translation. Proc. 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Jeju Island KR, July 2012, pp. 1179–1190.