COST (European Cooperation in Science and Technology) is a pan-European intergovernmental framework. Its mission is to enable break-through scientific and technological developments leading to new concepts and products and thereby contribute to strengthening Europe’s research and innovation capacities.

ISCH COST Action IS1312

Similar COST Actions

Latest News

Corpus for the Analysis of German-English Contrasts in Cohesion

Primary tabs

Submitted by Anonymous (not verified) on Tue, 20/10/2015 - 11:57

Corpus acronym:

GECCo

Developer:

GECCo Team at Saarland University, Department of Applied Linguistics, Translation and Interpreting

Authors:

Ekaterina Lapshinova-Koltunski, Kerstin Kunz, Katrin Menzel, Erich Steiner, Jose Manuel Martinez Martinez, Stefania Degaetano-Ortlieb, Marilisa Amoia

Contact person(s):

Ekaterina Lapshinova-Koltunski, Kerstin Kunz

Contact person e-mail(s):

mailto:e.lapshinova@mx.uni-saarland.de, kerstin.kunz@iued.uni-heidelberg.de

Project URL:

http://www.gecco.uni-saarland.de

Availability:

The corpus is available via CLARIN-DE repository (http://hdl.handle.net/11858/00-246C-0000-0023-8CF7-A). The URL for querying http://corpora.clarin-d.uni-saarland.de/cqpweb/; there is restricted access to this corpus - please contact the authors.

Languages covered:

English, German

Available translations:

English-to-German translations for the written part of the English original subcorpusGerman-to-English translations for the written part of the German original subcorpus

Corpus size (documents):

604

Corpus size (sentences):

78,942

Corpus size (tokens):

1,693,386

Mode:

written

spoken

Genre:

fiction

science

interactional (social networks, sms, everyday conversation, etc.)

Genre (detailed):

academic speeches, political essays, fictional texts, interviews, instruction manuals, popular-scientific articles, letters to shareholders, prepared political speeches, tourism leaflets, texts from various corporate websites

casual

semi-formal

formal

spontaneous

semi-spontaneous

non-spontaneous

Text type:

instructive

narrative

expository

descriptive

argumentative

Years of the data origin:

written subcorpora: 1992 - 2006; spoken subcorpora: 2008 - 2012

Document structure:

sentence boundaries, turns in the spoken part and text boundaries all over the corpus (not always full texts)

Unit of segmentation used:

turns

Tools for annotation:

mpro (morphological analysis), TreeTagger, Stanford Parser, MATE parser, CWB modules, CQP, MMAX2

Tools for browsing:

CQP and CQP Web

Tools for querying:

CQP and CQP Web, also MMAX2

Types of DSDs annotated:

explicit inter-sentential explicit (cohesive) relations and intra-sentential relations between clauses

Number of DSD instances:

51,790

Method of annotation:

semi-automatic (automatic with manual post-correction)

Style/theory of annotation:

Systemic Functional Linguistics

Annotation manual URL:

http://fedora.clarin-d.uni-saarland.de/gecco

Format:

CQP XML

Version number, release date:

GECCO2013, GECCO-SPOKEN2014 (spoken only)

Previous versions and their release dates:

previous GECCo versions

Citation (text format):

Lapshinova-Koltunski, E., K. Kunz and M. Amoia (2012). Compiling a Multilingual Corpus. In Heliana Mello, Massimo Pettorino and Tommaso Raso (eds). Proceedings of the VIIth GSCP-2012 International Conference: Speech and Corpora. Firenze: Firenze University Press. pp. 29-34.Lapshinova-Koltunski, E. and K. Kunz (2014). Annotating Cohesion for Multillingual Analysis. In Proceedings of the 10th Joint ACL - ISO Workshop on Interoperable Semantic Annotation, Reykjavik, May 26, 2014

Citation (bibTeX format):

@INPROCEEDINGS{LapshinovaEtal2012,author = {Lapshinova-Koltunski, Ekaterina and Kerstin Kunz and Marilisa Amoia},editor = {Heliana Mello, Massimo Pettorino, Tommaso Raso},title = {Compiling a Multilingual Spoken Corpus},booktitle = {Proceedings of the VIIth GSCP International Conference: Speech and corpora},publisher = {Firenze University Press},address = {Firenze}, pages = {79--84},year = {2012},url = {http://store.torrossa.it/resources/9788866553519}}@InProceedings{LapshinovaKunz:2014:ISA, author = {Lapshinova-Koltunski, Ekaterina and Kunz, Kerstin}, title = {Annotating Cohesion for Multillingual Analysis}, booktitle = {Proceedings of the 10th Joint ACL - ISO Workshop on Interoperable Semantic Annotation}, month = {May}, year = {2014}, address = {Reykjavik, Iceland}, publisher = {LREC}}

Notes:

Further info about the discourse relations:

senses/semantic labels are annotated for the relations

Other annotation layers:

sentence morphosyntax, parse structure

anaphora (coreference, bridging)

substitution, ellipsis, lexical cohesion

Main menu

Secondary menu

Similar COST Actions

Latest News

Corpus for the Analysis of German-English Contrasts in Cohesion

Primary tabs

Main menu

Secondary menu

You are here

Similar COST Actions

Latest News

Search form

Corpus for the Analysis of German-English Contrasts in Cohesion

Primary tabs

User login