ANNODIS

Corpus acronym: 
ANNODIS
Developer: 
CLLE-ERSS, University of Toulouse Jean Jaurès
Authors: 
Stergos D. Afantenos, Nicholas Asher, Farah Benamara, Myriam Bras, Cécile Fabre, Lydia-Mai Ho-Dac, Anne Le Draoulec, Philippe Muller, Marie-Paule Péry-Woodley, Laurent Prévot, Josette Rebeyrolle, Ludovic Tanguy, Marianne Vergez-Couret, laure Vieu
Contact person(s): 
Lydia-Mai Ho-Dac
Availability: 
Creative Commons By-NC-SA 3.0
Languages covered: 
French
Corpus size (documents): 
156
Corpus size (tokens): 
687,000
Mode: 
written
Genre: 
journalistic
science
reports and enyclopaedia articles
Genre (detailed): 
news in brief (from the daily newspaper Est Républicain), encyclopaedia articles (from wikipedia), research papers in linguistics, reports and articles from the french think tank IFRI
Register: 
formal
Text type: 
narrative
expository
descriptive
argumentative
Years of the data origin: 
1999-2008
Document structure: 
documents, paragraph boudaries, headings and subheadings, bulleted and numbered lists, examples, citations
Tools for annotation: 
Glozz (http://www.glozz.org/)
Types of DSDs annotated: 
two DSDs are marked-up:- rhetorical relations annotation including Elementary Discourse Units (EDU) and Complex Discourse Units (CDU) linked by rhetorical relations (e.g. contrast, elaboration, result, attribution, etc.)- multi-level structures annotion including Enumerative Structures (ES) and Topical Chains (TC) with their clues
Number of DSD instances: 
3,188 EDU, 1,395 CDU, 3,355 rhetorical relations, 991 ES and 4,649 ES cues, 588 TC and 3,456 TC cues
Method of annotation: 
manual for discourse relations and assisted for multi-level structures
Style/theory of annotation: 
SDRT and Systemic Functional Linguistics
Format: 
Glozz format and XML (TEIP5 encoding)
Version number, release date: 

1.0 Septembre 2012

Citation (text format): 

Péry-Woodley M.-P., Afantenos S. D., Ho-Dac L.-M., Asher N. (2011). La ressource ANNODIS, un corpus enrichi d'annotations discursives. TAL 52(3), pp 71-101. [http://www.atala.org/La-ressource-ANNODIS-un-corpus]

Citation (bibTeX format): 

@ARTICLE{ANNODIStal2011, author = {P{\'e}ry-Woodley, M.-P. and Afantenos S.D. and Ho-Dac, L.-M. and Asher, N.}, title = {Le corpus ANNODIS, un corpus enrichi d'annotations discursives}, journal = {TAL}, year = {2011}, volume = {52}, pages = {71--101}, number = {3}}

Notes: 
Further info about the discourse relations: 
senses/semantic labels are annotated for the relations