Developer:
Université Catholique de Louvain
Availability:
annotation complete; extracted annotations available for the database
Available translations:
the corpus is comparable, not translated
Genre:
journalistic
science
interactional (social networks, sms, everyday conversation, etc.)
Genre (detailed):
interview (face-to-face and radio), conversation, phone calls, news broadcast, political speech, classroom lessons, sports commentaries
Register (2):
spontaneous
semi-spontaneous
non-spontaneous
Years of the data origin:
Unit of segmentation used:
Tools for annotation:
EXMARaLDA (Partitur Editor)
Types of DSDs annotated:
explicit only relational & non-relational (e.g. because & well) take scope over at least one unit which is equal to or bigger than a clause (excludes intra-sentential conjunctions)
Style/theory of annotation:
Format:
.exb (EXMARaLDA), Praat TextGrid, XML ...
Version number, release date:
will be version 1.0, finished by end of 2015
Previous versions and their release dates:
Pointers to related corpora:
ICE-GB, Valibel, CLAPI, Backbone, C-Phonogenre, LOCAS-F, Rhapsodie
Citation (text format):
Crible, L. 2017. "Discourse markers and (dis)fluency across registers: A contrastive usage-based study in English and French". PhD thesis, Université catholique de Louvain.
Citation (bibTeX format):
@phdthesis{crib17, author = "Discourse markers and (dis)fluency across registers: A contrastive usage-based study in English and French", year = "2017"}
Notes:
The transcription and audio files are not mine (either freely available corpora, or available by convention). Most of them underwent technical treatment to be homogenized in the same format ; some of them were sound-aligned. All annotations are mine.
Audio/video annotation:
alignment of the audio/video to the transcriptions
Further info about the discourse relations:
senses/semantic labels are annotated for the relations
Other annotation layers:
syntactic position of the discourse markers ; POS ; disfluency markers