COST (European Cooperation in Science and Technology) is a pan-European intergovernmental framework. Its mission is to enable break-through scientific and technological developments leading to new concepts and products and thereby contribute to strengthening Europe’s research and innovation capacities.

ISCH COST Action IS1312

Similar COST Actions

Latest News

The Haifa Corpus of Spoken Hebrew

Primary tabs

Submitted by Anonymous (not verified) on Tue, 20/10/2015 - 11:57

Corpus acronym:

N/A

Developer:

University of Haifa, Israel, Department of Hebrew Language

Authors:

Yael Maschler

Contact person(s):

Yael Maschler

Contact person e-mail(s):

maschler@research.haifa.ac.il

Project URL:

http://weblx2.haifa.ac.il/~corpus/corpus_website/

Availability:

by request

Languages covered:

Hebrew

Available translations:

N/A

Corpus size (hours):

Over 17.5 hours

Corpus size (documents):

325 audio files + 325 Word files

Corpus size (sentences):

N/A

Corpus size (tokens):

N/A

Corpus size (other):

N/A

Mode:

spoken

Genre:

interactional (social networks, sms, everyday conversation, etc.)

radio phone-in programs

Genre (detailed):

face-to-face conversation, political radio phone-ins

casual

spontaneous

Text type:

narrative

argumentative

Years of the data origin:

1993-2014 (and continuing into the present)

Document structure:

Prosody: intonation unit boundaries, intonation contour type, length of pauses, primary and secondary stress, etc.

Unit of segmentation used:

intonation unit

Tools for annotation:

manual annotation,occasional use of PRAAT

Tools for browsing:

Any web browser

Tools for querying:

SketchEngine (for part of the corpus)

Types of DSDs annotated:

All inter-sentential explicit discourse markers (textual, interpersonal, and cognitive) in approximately 40 minutes of the 17.5 hours have been identified manually (but not annotated in the corpus itself).

Number of DSD instances:

574 tokens, 92 types in 40 minutes (out of the 17.5 hours)

Method of annotation:

manual

Style/theory of annotation:

Transcription conventions, University of California at Santa Barbara Linguistics Department (Bu Bois, forthcoming).

Annotation manual URL:

http://www.linguistics.ucsb.edu/projects/transcription/representing

Format:

Word files, XML format available for the majority of the data.

Version number, release date:

N/A

Previous versions and their release dates:

N/A

Citation (text format):

Maschler Yael, 2014. The Haifa Corpus of Spoken Hebrew. http://weblx2.haifa.ac.il/~corpus/corpus_website/

Citation (bibTeX format):

Maschler Yael, 2014. The Haifa Corpus of Spoken Hebrew. http://weblx2.haifa.ac.il/~corpus/corpus_website/

Notes:

Audio/video annotation:

annotation of prosody

Other annotation layers:

intonation/prosody

Main menu

Secondary menu

Similar COST Actions

Latest News

The Haifa Corpus of Spoken Hebrew

Primary tabs

Main menu

Secondary menu

You are here

Similar COST Actions

Latest News

Search form

The Haifa Corpus of Spoken Hebrew

Primary tabs

User login