Penn Discourse TreeBank 2.0

Corpus acronym: 
PDTB 2.0
Developer: 
University of Pennsylvania, School of Computer & Information Science
Authors: 
Rashmi Prasad, Aravind Joshi, Eleni Miltsakaki, Alan Lee, Nikhil Dinesh, Livio Robaldo, Geraud Campion, Bonnie Webber
Contact person(s): 
Bonnie Webber
Availability: 
Contact the Linguistics Data Consortium, http:///www.ldc.upenn.edu
Languages covered: 
English
Available translations: 
Penn TreeBank corpus (over whose raw text the PDTB has been annotated) has been translated into Czech, available as the PCEDT.
Corpus size (documents): 
~2400 documents
Corpus size (tokens): 
~1m words, ~40K annotation tokens
Mode: 
written
Genre: 
journalistic
Genre (detailed): 
Corpus contains news, essays, reviews, letters to the editor, errata
Register: 
formal
Register (2): 
non-spontaneous
Text type: 
narrative
expository
Years of the data origin: 
1989
Document structure: 
Raw text preserves sentence and paragraph boundaries; PDTB 2.0 recovers divisions between letters and between "news summaries" found in the original WSJ documents
Tools for annotation: 
Annotation tool available at http://www.seas.upenn.edu/~pdtb
Tools for browsing: 
Browser available at http://www.seas.upenn.edu/~pdtb
Types of DSDs annotated: 
+ explicit inter-sentential discourse connectives and alternative lexicalizations of discourse connectives.+ explicit intra-sentential discourse connectives+ implicit discourse relations between adjacent sentences within the same paragraph
Number of DSD instances: 
~40K
Method of annotation: 
Manual, with manual adjudication
Style/theory of annotation: 
PDTB style
Format: 
Either pipe-delimited fields or multi-line format
Version number, release date: 

PDTB 2.0, released 2008

Previous versions and their release dates: 

PDTB 1.0, March 2006

Citation (text format): 

Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi and Bonnie Webber. The Penn Discourse Treebank 2.0. Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC). Marrakech, Morocco, 2008.Rashmi Prasad, Bonnie Webber and Aravind Joshi.Reflections on the Penn Discourse TreeBank, Comparable Corpora and Complementary Annotation. Computational Linguistics 40(4), December 2014.

Citation (bibTeX format): 

@inproceedings{prasad08,author = "Rashmi Prasad and Nikhil Dinesh and Alan Lee and Eleni Miltsakaki and Livio Robaldo and Aravind Joshi and Bonnie Webber",title = "{The Penn Discourse TreeBank 2.0}",booktitle = "{Proceedings, 6th International Conference on Language Resources and Evaluation}",address = "Marrakech, Morocco",year = "2008",pages = "2961--2968"}@article{prasad-etal14,author = {Rashmi Prasad and Bonnie Webber and Aravind Joshi},year = {2014},title = {Reflections on the Penn Discourse TreeBank, Comparable Corpora and Complementary Annotation},journal = {Computational Linguistics},volume = {40(4)},pages = {921-950},doi = {10.1162/COLI_a_00204}}

Notes: 
Further info about the discourse relations: 
information about arguments of each relation is available
senses/semantic labels are annotated for the relations
Other annotation layers: 
sentence morphosyntax, parse structure
anaphora (coreference, bridging)
semantic roles, but all distributed separately, as Penn TreeBank, PropBank, and OntoNotes