RST Signalling Corpus

Corpus acronym: 
RST-SC
Developer: 
Simon Fraser University
Authors: 
Debopam Das, Maite Taboada
Contact person(s): 
Maite Taboada
Availability: 
Linguistic Data Consortium
Languages covered: 
English
Corpus size (documents): 
385
Corpus size (other): 
176,000 words, about 21,400 relations
Mode: 
written
Genre: 
journalistic
Genre (detailed): 
Newspaper news, editorials
Register: 
formal
Text type: 
expository
Years of the data origin: 
1990s
Document structure: 
Documents, paragraph boundaries
Tools for annotation: 
UAM CorpusTool
Tools for browsing: 
UAM CorpusTool
Tools for querying: 
UAM CorpusTool
Types of DSDs annotated: 
Discourse markers (intra and inter-sentential, explicit). Plus referential and syntactic features, and some punctuation.
Number of DSD instances: 
21,400
Method of annotation: 
Manual
Style/theory of annotation: 
RST-style
Format: 
XML
Version number, release date: 

1.0, to be released June 2015

Citation (text format): 

Das, D. and M. Taboada (2015, forthcoming) RST Signalling Corpus. Distributed through the Linguistic Data Consortium.

Notes: 
Further info about the discourse relations: 
information about arguments of each relation is available