STAC - Situated corpus

The STAC dataset is a corpus of strategic chat conversations manually annotated with negotiation-related information, dialogue acts and discourse structures in the framework of Segmented Discourse Representation Theory (SDRT). This dataset was developed within the context of the STAC (Strategic Conversation) project supported by the European Research Council, Grant n. 269427.This dataset consists of 45 games segmented into Elementary Discourse Units and then annotated using the Glozz tool. The STAC-Situated adds a new layer of annotation to the STAC-Linguistic Corpus, by annotating publicly observable game events.

 

Developer: 
IRIT - Université Paul Sabatier
Authors: 
Nicholas Asher
J. Hunter
M. Morey
F. Benamara
S. Afantenos
Contact person(s): 
Nicholas Asher
Availability: 
publicly available
Languages covered: 
English
Corpus size (documents): 
45 games, 2595 dialogues
Corpus size (sentences): 
N/A
Corpus size (tokens): 
N/A
Corpus size (other): 
12588 elementary discourse units (EDUs) + 31810 EEUs (elementary nonlinguistic event units) = 44398 elementary units; 7651 complex discourse units (CDUs); 52049 discourse units (DUs = elementary + complex)
Genre: 
interactional (social networks, sms, everyday conversation, etc.)
Genre (detailed): 
chats
Tools for annotation: 
Glozz
Tools for querying: 
Glozz
Method of annotation: 
manual
Style/theory of annotation: 
SDRT
Format: 
XML
Citation (text format): 

Asher, N., Hunter, J., Morey, M., Benamara, F. & S. Afantenos (2016). Discourse structure and dialogue acts in multiparty dialogue: the STAC corpus. In The Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association, pp. 2721-2727, Portorož.

Notes: 
Other annotation layers: 
dialogue acts