University of Haifa, Israel, Department of Hebrew Language
Corpus size (documents):
325 audio files + 325 Word files
interactional (social networks, sms, everyday conversation, etc.)
radio phone-in programs
Genre (detailed):
face-to-face conversation, political radio phone-ins
Years of the data origin:
1993-2014 (and continuing into the present)
Document structure:
Prosody: intonation unit boundaries, intonation contour type, length of pauses, primary and secondary stress, etc.
Unit of segmentation used:
Tools for annotation:
manual annotation,occasional use of PRAAT
Tools for querying:
SketchEngine (for part of the corpus)
Types of DSDs annotated:
All inter-sentential explicit discourse markers (textual, interpersonal, and cognitive) in approximately 40 minutes of the 17.5 hours have been identified manually (but not annotated in the corpus itself).
Number of DSD instances:
574 tokens, 92 types in 40 minutes (out of the 17.5 hours)
Style/theory of annotation:
Transcription conventions, University of California at Santa Barbara Linguistics Department (Bu Bois, forthcoming).
Word files, XML format available for the majority of the data.
Version number, release date:
Previous versions and their release dates:
Pointers to related corpora:
Citation (bibTeX format):