TED-MBD is a corpus of TED talks transcripts annotated for DRDs together with their binary arguments and senses in the PDTB style.
Authors:
Deniz Zeyrek, Amalia Mendes, Sam Gibbon, Yulia Grishina, Maciej Ogrodniczuk, Murathan Kurfalı
Languages covered:
English, German, Polish, Portuguese, Russian, Turkish
Corpus size (documents):
36 Transcripts (6 TED-Talks in 6 languages)
Text type:
instructive
narrative
descriptive
Years of the data origin:
Unit of segmentation used:
DRDs and their binary arguments (Arg1 - Arg2)
Types of DSDs annotated:
Explicit, Implicit, AltLex, EntRel, NoRel
Style/theory of annotation:
Citation (text format):
Zeyrek, D. Mendes, A. Kurfalı, M. (2018) Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank. In the Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC2018).
Citation (bibTeX format):
@InProceedings{Zeyrek2018, author = {Deniz Zeyrek and Amalia Mendes and Murathan Kurfal{\i}}, title = {Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank}, booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC2018)}, year = {2018}, language = {english} }
Notes:
Recently contributed annotations:
- 3 Lithuanian transcripts have been annotated and contributed by Giedre Valunaite Oleskevicienė.
- 1 Spanish transcript have been annotated and contributed by Julia Lavid Lopez.
Further info about the discourse relations:
information about arguments of each relation is available
senses/semantic labels are annotated for the relations