TED Multilingual Discourse Bank

TED-MBD is a corpus of TED talks transcripts annotated for DRDs together with their binary arguments and senses in the PDTB style. 

Corpus acronym: 
TED-MDB
Authors: 
Deniz Zeyrek, Amalia Mendes, Sam Gibbon, Yulia Grishina, Maciej Ogrodniczuk, Murathan Kurfalı
Contact person(s): 
Deniz Zeyrek
Languages covered: 
English, German, Polish, Portuguese, Russian, Turkish
Corpus size (hours): 
-
Corpus size (documents): 
36 Transcripts (6 TED-Talks in 6 languages)
Corpus size (sentences): 
N/A
Corpus size (tokens): 
N/A
Corpus size (other): 
37851 words (in total)
Mode: 
written
Genre: 
Transcripts
Genre (detailed): 
Transcripts of TED Talks
Register: 
semi-formal
formal
Register (2): 
semi-spontaneous
Text type: 
instructive
narrative
descriptive
Years of the data origin: 
2013 - 2014
Document structure: 
N/A
Unit of segmentation used: 
DRDs and their binary arguments (Arg1 - Arg2)
Tools for annotation: 
PDTB Annotator
Types of DSDs annotated: 
Explicit, Implicit, AltLex, EntRel, NoRel
Number of DSD instances: 
3649 (for 6 languages)
Method of annotation: 
manual
Style/theory of annotation: 
PDTB style
Format: 
Pipe delimited file
Citation (text format): 

Zeyrek, D. Mendes, A. Kurfalı, M. (2018) Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank. In the Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC2018).

Citation (bibTeX format): 

@InProceedings{Zeyrek2018, author = {Deniz Zeyrek and Amalia Mendes and Murathan Kurfal{\i}}, title = {Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank}, booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC2018)}, year = {2018}, language = {english} }

Notes: 

Recently contributed annotations:

  • 3 Lithuanian transcripts have been annotated and contributed by Giedre Valunaite Oleskevicienė.
  • 1 Spanish transcript have been annotated and contributed by Julia Lavid Lopez. 
Further info about the discourse relations: 
information about arguments of each relation is available
senses/semantic labels are annotated for the relations