Authors:
Toni Badia, Roser Saurí, Teresa Suñol
Availability:
Currently negotiating distribution procedure. Please contact via email.
Years of the data origin:
Unit of segmentation used:
Tools for annotation:
For delimiting discourse segments: Brandeis Annotation Tool (BAT, http://www.timeml.org/site/bat/). For setting relations among discourse segments: Cytoscape (http://www.cytoscape.org)
Types of DSDs annotated:
Explicit and implicit. Inter- and intra-sentential.
Style/theory of annotation:
A combination of SDRT and GraphBank.
Format:
XGMML (eXtensible Graph Markup and Modelling Language), XML.
Version number, release date:
Previous versions and their release dates:
Pointers to related corpora:
The CatDiG texts are part of the AnCora Corpus (http://clic.ub.edu/corpus/en). Currently we are annotating a Spanish version of CatDiG: the Spanish Discourse GraphBank (in preparation).
Citation (text format):
Badia, T., R. Saurí, T. Suñol (2015). The Catalan Discourse GraphBank. Proceedings of the TextLink First Action Conference. Louvain-La-Neuve, Belgium. 26-28 January 2015.
Citation (bibTeX format):
@InProceedings{badiaEtAl_2015,author = {Toni Badia and Roser Saur\'{\i} and Teresa Su\~{n}ol},title = {The Catalan Discourse GraphBank},booktitle ={Proceedings of the TextLink First Action Conference},address = {Louvain-La-Neuve, Belgium},year = {2015}}.
Further info about the discourse relations:
information about arguments of each relation is available
senses/semantic labels are annotated for the relations
Other annotation layers:
sentence morphosyntax, parse structure
anaphora (coreference, bridging)
Argument structure and thematic roles, semantic classes of verbs, type of deverbal nouns, word net synsets for nouns, named entities