Extending Automatic Discourse Segmentation for Texts in Spanish to Catalan.

TitleExtending Automatic Discourse Segmentation for Texts in Spanish to Catalan.
Publication TypeConference Paper
Year of Publication2016
Authorsda Cunha I, SanJuan E, Torres-Moreno J-M, Castellón I, Lloberes M
Conference NameCEUR Proceedings of the First Workshop on Modeling, Learning and Mining for Cross/Multilinguality (MultiLingMine 2016), 38th European Conference on Information Retrieval (ECIR 2016). Vol. 1589. 36-45.
ISBN NumberISSN 1613-0073

At present, automatic discourse analysis is a relevant research
topic in the field of NLP. However, discourse is one of the phenomena
most difficult to process. Although discourse parsers have been
already developed for several languages, this tool does not exist for Catalan.
In order to implement this kind of parser, the first step is to develop
a discourse segmenter. In this article we present the first discourse
segmenter for texts in Catalan. This segmenter is based on Rhetorical
Structure Theory (RST) for Spanish, and uses lexical and syntactic information
to translate rules valid for Spanish into rules for Catalan. We
have evaluated the system by using a gold standard corpus including
manually segmented texts and results are promising.