Developer:
Charles University, Institute of Formal and Applied Linguistics
Authors:
Rysová Magdaléna
Synková Pavlína
Mírovský Jiří
Hajičová Eva
Nedoluzhko Anna
Ocelák Radek
Pergler Jiří
Poláková Lucie
Scheller Veronika
Zdeňková Jana
Zikánová Šárka
Availability:
publicly available in LINDAT-CLARIN repository: http://hdl.handle.net/11234/1-1905
Genre (detailed):
newspaper news, journal articles, interviews, and others
Register (2):
semi-spontaneous
non-spontaneous
Text type:
instructive
narrative
expository
descriptive
argumentative
Years of the data origin:
Document structure:
documents, paragraph boundaries, headings
Tools for annotation:
TrEd (http://ufal.mff.cuni.cz/tred/)
Tools for browsing:
TrEd (http://ufal.mff.cuni.cz/tred/)
Tools for querying:
PML-TQ (http://ufal.mff.cuni.cz/pmltq/)
Direct link to the PML-TQ search server: https://lindat.mff.cuni.cz/services/pmltq/#!/treebank/pdit20/help
Types of DSDs annotated:
Relations marked by explicit connectives (primary and secondary), both intra- and inter-sentential.
Method of annotation:
primary connectives: manual for inter-sentential, automatic with manual correction for intra-sentential; secondary connectives: manual at places semi-automatically identified in the texts
Style/theory of annotation:
PDTB style adapted for dependency trees
Version number, release date:
Previous versions and their release dates:
PDT 3.0 (2013), PDiT 1.0 (2012)
Citation (text format):
Rysová Magdaléna, Synková Pavlína, Mírovský Jiří, Hajičová Eva, Nedoluzhko Anna, Ocelák Radek, Pergler Jiří, Poláková Lucie, Scheller Veronika, Zdeňková Jana, Zikánová Šárka: Prague Discourse Treebank 2.0. Data/software, ÚFAL MFF UK, Prague, Czech Republic, Lindat/Clarin: http://hdl.handle.net/11234/1-1905, Dec 2016
Citation (bibTeX format):
@misc{ biblio:RySyMiPragueDiscourse2016, title = {Prague Discourse Treebank 2.0}, author = {Magdal{\'{e}}na Rysov{\'{a}} and Pavl{\'{i}}na J{\'{i}}nov{\'{a}} and Ji{\v{r}}{\'{i}} M{\'{i}}rovsk{\'{y}} and Eva Haji{\v{c}}ov{\'{a}} and Anna Nedoluzhko and Radek Ocel{\'{a}}k and Ji{\v{r}}{\'{i}} Pergler and Lucie Pol{\'{a}}kov{\'{a}} and Jana Zde{\v{n}}kov{\'{a}} and Veronika Scheller and {\v{S}}{\'{a}}rka Zik{\'{a}}nov{\'{a}} }, year = {2016}, publisher = {{\'{U}}{FAL} {MFF} {UK}}, address = {Prague, Czech Republic}, url = {Lindat/Clarin: http://hdl.handle.net/11234/1-1905}, }
Notes:
Regarding annotation of discourse relations, PDiT 2.0 (Prague Discourse Treebank 2.0) is an update to the PDT 3.0 (Prague Dependency Treebank 3.0), which in turn was an update to PDiT 1.0 (Prague Discourse Treebank 1.0) . The main addition
in comparison with PDT 3.0 is annotation of secondary connectives (e.g. in English "for this reason", "due to this", "under these conditions" etc.).<br>We realize that the changing titles (Prague Discourse vs.
Dependency Treebank, i.e. PDiT vs. PDT) is a mess. Unfortunately, we
were not allowed to publish the new discourse annotation as a new
version of PDT, henceforth PDiT 2.0...
Further info about the discourse relations:
information about arguments of each relation is available
senses/semantic labels are annotated for the relations
Other annotation layers:
sentence morphosyntax, parse structure
anaphora (coreference, bridging)
information structure
tectogrammatics - deep syntax