ParCorFull: a Parallel Corpus Annotated with Full Coreference

TitleParCorFull: a Parallel Corpus Annotated with Full Coreference
Publication TypeConference Paper
Year of Publication2018
AuthorsLapshinova-Koltunski E, Hardmeier C, Krielke M-P
EditorCalzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S, Tokunaga T
Conference NameProceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Date Publishedmay/2018
PublisherEuropean Language Resources Association (ELRA)
Conference LocationParis, France
ISBN Number979-10-95546-00-9
Keywordscoreference, coreference annotation, cross-lingual coreference resolution, full coreference, linguistic annotation, machine translation, multilingual NLP

In this paper, we describe a parallel corpus annotated with full coreference chains that has been created to address an important problem that machine translation and other multilingual natural language processing (NLP) technologies face – translation of coreference across languages. Recent research in multilingual coreference and automatic pronoun translation has led to important insights into the problem and some promising results. However, its scope has been restricted to pronouns, whereas the phenomenon is not limited to anaphoric pronouns. Our corpus contains parallel texts for the language pair English-German, two major European languages. Despite being typologically very close, these languages still have systemic differences in the realisation of coreference, and thus pose problems for multilingual coreference resolution and machine translation. Our parallel corpus with full annotation of coreference will be a valuable resource with a variety of uses not only for NLP applications, but also for contrastive linguists and researchers in translation studies. This resource supports research on the mechanisms involved in coreference translation in order to develop a better understanding of the phenomenon. The corpus is available from the LINDAT repository at