HuComTech Multimodal Corpus

Corpus acronym: 
HMC
Developer: 
Department of General and Applied Linguistics, University of Debrecen, Hungary
Authors: 
László Hunyadi with the Department of General and Applied Linguistics, University of Debrecen
Contact person(s): 
Ágnes Abuczki
Availability: 
Available for Academic - Non Commercial Use (no derivatives) in the Metashare repository: http://metashare.nytud.hu/repository/browse/hucomtech-multimodal-corpus-and-database/80230f6e6ba811e2aa7c68b599c26a066e7e04f01c6043b485f6bf2f65945880/
Languages covered: 
Hungarian
Available translations: 
N/A
Corpus size (hours): 
approx. 50 hours
Corpus size (documents): 
222
Corpus size (sentences): 
N/A
Corpus size (tokens): 
430382 tokens
Corpus size (other): 
222 mp4 video files, 222 x-wav audio files, 222 Praat textgrids, 222 .eaf files (of 111 informal conversations and 111 simulated job interviews)
Mode: 
spoken
Genre: 
interactional (social networks, sms, everyday conversation, etc.)
Genre (detailed): 
2 genres: (1) informal, semi-guided conversation; (2) formal, guided simulated job interview
Register: 
casual
formal
Register (2): 
spontaneous
Text type: 
narrative
descriptive
argumentative
Years of the data origin: 
2010-2011
Document structure: 
N/A
Unit of segmentation used: 
intonational units, utterances
Tools for annotation: 
Praat, ELAN 4.6.1 https://tla.mpi.nl/tools/tla-tools/elan/
Tools for browsing: 
ELAN 4.6.1 https://tla.mpi.nl/tools/tla-tools/elan/
Tools for querying: 
ELAN 4.6.1 https://tla.mpi.nl/tools/tla-tools/elan/
Types of DSDs annotated: 
Taking a semasiological approach, the discourse-pragmatic functions of a few selected DSDs are annotated (those multifunctional Hungarian DSDs that have not been studied earlier in detail), including intra- and intersentential types.
Number of DSD instances: 
557 (currently in the latest version)
Method of annotation: 
manual
Style/theory of annotation: 
a combination and adaptation of Speech Act Theory (Bach & Harnisch 1979), the discourse-pragmatic models of Schiffrin (1987, 2006) and a computational pragmatic model proposed by Petukhova & Bunt (2009)
Format: 
EAF (most annotations also available in XML)
Version number, release date: 

January 2013

Previous versions and their release dates: 

N/A

Citation (text format): 

Hunyadi, L. - Bertok, K. - Nemeth T., E. - Szekrenyes, I. - Abuczki, A. - Nagy, G. - Nagy, N. - Nemeti, P. - Bodog, A. 2011. The outlines of a theory and technology of human-computer interaction as represented in the model of the HuComTech project. In: 2nd International Conference on Cognitive Infocommunications (CogInfoCom). Budapest, 7-9 July, 2011. Budapest: IEEE. (E-ISBN: 978-963-8111-78-4, Print ISBN: 978-1-4577-1806-9)

Citation (bibTeX format): 

N/A

Notes: 

N/A

Audio/video annotation: 
alignment of the audio/video to the transcriptions
annotation of prosody
Further info about the discourse relations: 
senses/semantic labels are annotated for the relations
Other annotation layers: 
intonation/prosody
speech acts; discourse-pragmatic functions of the selected DSDs; nonverbal behaviour of the speaker: hand movement, deictic gestures, gaze direction, head nods, facial expressions, emblems