Prague Dependency Treebank 3.0

Corpus acronym: 
PDT 3.0
Developer: 
Charles University in Prague, Institute of Formal and Applied Linguistics
Authors: 
Eduard Bejček, Eva Hajičová, Jan Hajič, Pavlína Jínová, Václava Kettnerová, Veronika Kolářová, Marie Mikulová, Jiří Mírovský, Anna Nedoluzhko, Jarmila Panevová, Lucie Poláková, Magda Ševčíková, Jan Štěpánek, Šárka Zikánová
Contact person(s): 
Jiří Mírovský
Availability: 
publicly available in LINDAT-CLARIN repository: http://hdl.handle.net/11858/00-097C-0000-0023-1AAF-3
Languages covered: 
Czech
Available translations: 
none
Corpus size (documents): 
3,165
Corpus size (sentences): 
49,431
Corpus size (tokens): 
833,124
Mode: 
written
Genre: 
journalistic
Genre (detailed): 
newspaper news, journal articles, interviews, and others
Register: 
formal
Register (2): 
semi-spontaneous
non-spontaneous
Text type: 
instructive
narrative
expository
descriptive
argumentative
Years of the data origin: 
1991-1995
Document structure: 
documents, paragraph boundaries, headings
Tools for annotation: 
TrEd (http://ufal.mff.cuni.cz/tred/)
Tools for browsing: 
TrEd (http://ufal.mff.cuni.cz/tred/)
Tools for querying: 
PML-TQ (http://ufal.mff.cuni.cz/pmltq/)
Types of DSDs annotated: 
Relations marked by explicit connectives, both intra- and inter-sentential.
Number of DSD instances: 
20,556
Method of annotation: 
manual for inter-sentential, automatic with manual correction for intra-sentential
Style/theory of annotation: 
PDTB style adapted for dependency trees
Format: 
XML
Version number, release date: 

3.0, December 2013

Previous versions and their release dates: 

PDiT 1.0 (Nov. 2012), PDT 2.5 (2011, no discourse relations yet)

Citation (text format): 

Bejček Eduard, Hajičová Eva, Hajič Jan, Jínová Pavlína, Kettnerová Václava, Kolářová Veronika, Mikulová Marie, Mírovský Jiří, Nedoluzhko Anna, Panevová Jarmila, Poláková Lucie, Ševčíková Magda, Štěpánek Jan, Zikánová Šárka: Prague Dependency Treebank 3.0. Data/software, Univerzita Karlova v Praze, MFF, ÚFAL, Prague, Czech republic, http://ufal.mff.cuni.cz/pdt3.0/, Dec 2013

Citation (bibTeX format): 

@misc{ biblio:BeHaPragueDependency2013, title = {Prague Dependency Treebank 3.0}, author = {Eduard Bej{\v{c}}ek and Eva Haji{\v{c}}ov{\'{a}} and Jan Haji{\v{c}} and Pavl{\'{i}}na J{\'{i}}nov{\'{a}} and V{\'{a}}clava Kettnerov{\'{a}} and Veronika Kol{\'{a}}{\v{r}}ov{\'{a}} and Marie Mikulov{\'{a}} and Ji{\v{r}}{\'{i}} M{\'{i}}rovsk{\'{y}} and Anna Nedoluzhko and Jarmila Panevov{\'{a}} and Lucie Pol{\'{a}}kov{\'{a}} and Magda {\v{S}}ev{\v{c}}{\'{i}}kov{\'{a}} and Jan {\v{S}}t{\v{e}}p{\'{a}}nek and {\v{S}}{\'{a}}rka Zik{\'{a}}nov{\'{a}}}, year = {2013}, publisher = {Univerzita Karlova v Praze, {MFF}, {\'{U}}{FAL}}, organization = {Univerzita Karlova v Praze, {MFF}, {\'{U}}{FAL}}, address = {Prague, Czech republic}, }

Notes: 

A new version of the discourse annotation of the data was published in 2016 as the Prague Discourse Treeebank 2.0 (PDiT 2.0). The main addition in comparison with PDT 3.0 is annotation of secondary connectives (e.g. in English "for this reason", "due to this", "under these conditions" etc.).<br>We realize the titles (Prague Discourse vs. Dependency Treebank, i.e. PDiT vs. PDT) is a mess. Unfortunately, we were not allowed to publish the new discourse annotation as a new version of PDT, henceforth PDiT 2.0...<br>

Further info about the discourse relations: 
information about arguments of each relation is available
senses/semantic labels are annotated for the relations
Other annotation layers: 
sentence morphosyntax, parse structure
anaphora (coreference, bridging)
information structure, tectogrammatics - deep syntax