RST Spanish Treebank

The RST Spanish Treebank is the first corpus annotated with RST rhetorical relations for this language. The corpus includes several textual written genres related to 9 specialized domains: Astrophysics, Earthquake Engineering, Economy, Law, Linguistics, Mathematics, Medicine, Psychology and Sexuality. It has been annotated with the RSTTool. It is possible to consult, query and download the corpus (or a subcorpus) from the project URL.

Corpus acronym: 
RST-SP
Developer: 
Universidad Nacional Autónoma de México (Instituto de Ingeniería), Universitat Pompeu Fabra (Institute for Applied Linguistics), Université d'Avignon et des Pays de Vaucluse (Laboratoire Informatique d’Avignon)
Authors: 
Iria da Cunha
Juan-Manuel Torres-Moreno
Gerardo Sierra
Contact person(s): 
Iria da Cunha
Availability: 
Free download at: http://www.corpus.unam.mx/rst/corpus.html
Languages covered: 
Spanish
Corpus size (documents): 
267
Corpus size (sentences): 
2,256
Corpus size (tokens): 
52,746
Mode: 
written
Genre: 
science
Genre (detailed): 
Abstracts of research papers
Sections of scientific articles
Sections of conference proceedings
Parts of textbooks
Sections of articles and reports from magazines
Sections of associations’ websites
Abstracts of PhDs
Register: 
formal
Text type: 
instructive
expository
descriptive
argumentative
Years of the data origin: 
1985-2010
Tools for annotation: 
RSTTool (http://www.wagsoft.com/RSTTool/)
Tools for browsing: 
http://www.corpus.unam.mx/rst/corpus.html
Tools for querying: 
See querying tools at: http://www.corpus.unam.mx/rst/corpus.html
Types of DSDs annotated: 
intra and inter-sentential
Number of DSD instances: 
3,115
Method of annotation: 
manual
Style/theory of annotation: 
RST style
Format: 
rs3, txt, jpg
Version number, release date: 

June 2011

Citation (text format): 

da Cunha, Iria; Torres-Moreno, Juan-Manuel; Sierra, Gerardo (2011). «On the Development of the RST Spanish Treebank». In Proceedings of the 5th Linguistic Annotation Workshop. 49th Annual Meeting of the Association for Computational Linguistics (ACL). Portland, Oregon, USA.

Citation (bibTeX format): 

@inproceedings{daCunha:2011:DRS:2018966.2018967, author = {da Cunha, Iria and Torres-Moreno, Juan-Manuel and Sierra, Gerardo}, title = {On the Development of the RST Spanish Treebank}, booktitle = {Proceedings of the 5th Linguistic Annotation Workshop}, series = {LAW V '11}, year = {2011}, isbn = {978-1-932432-93-0}, location = {Portland, Oregon}, pages = {1--10}, numpages = {10}, url = {http://dl.acm.org/citation.cfm?id=2018966.2018967}, acmid = {2018967}, publisher = {Association for Computational Linguistics}, address = {Stroudsburg, PA, USA}, }

Notes: