STAC - Linguistic Corpus

The STAC dataset is a corpus of strategic chat conversations manually annotated with negotiation-related information, dialogue acts and discourse structures in the framework of Segmented Discourse Representation Theory (SDRT). This dataset was developed within the context of the STAC (Strategic Conversation) project supported by the European Research Council, Grant n. 269427.This dataset consists of 45 games segmented into Elementary Discourse Units and then annotated using the Glozz tool

Developer: 
IRIT - Université Paul Sabatier
Authors: 
Nicholas Asher
J. Hunter
M. Morey
F. Benamara
S. Afantenos
Contact person(s): 
Nicholas Asher
Availability: 
publicly available
Languages covered: 
English
Corpus size (documents): 
45 games, 1137 dialogues
Corpus size (sentences): 
N/A
Corpus size (tokens): 
N/A
Corpus size (other): 
12588 elementary discourse units (EDUs), 1450 complex discourse units (CDUs), 14038 discourse units (DUs = elementary + complex)
Genre: 
interactional (social networks, sms, everyday conversation, etc.)
Genre (detailed): 
chats
Tools for annotation: 
Glozz
Tools for querying: 
Glozz
Method of annotation: 
manual
Style/theory of annotation: 
SDRT
Format: 
XML
Citation (text format): 

Asher, N., Hunter, J., Morey, M., Benamara, F. & S. Afantenos (2016). Discourse structure and dialogue acts in multiparty dialogue: the STAC corpus. In The Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association, pp. 2721-2727, Portorož.

Notes: 
Other annotation layers: 
dialogue acts