Predicting Discourse Connectives in Parallel Texts

During his mission, Dr Hardmeier plans to extend to the prediction of discourse connectives a discourse element prediction model originally developed for anaphoric pronouns. The extended model will then be used for experiments related to cross-lingual annotation of discourse properties. This extension to discourse connectives pursues two goals related to the scientific objectives of the TextLink COST Action:
1. While developing the classifier, it will be tested on different feature configurations and context windows. This should allow an estimate on how much context is needed to make reliable predictions for discourse connectives. Having this information will be useful when designing semi-automatic annotation methods for discourse connectives, which is one goal of the COST Action.
2. Once the classifier is ready, it can be used to predict discourse connectives in parallel texts. Of particular interest are those situations where one language lacks an explicit connective, expressing the discourse relation implicitly instead. In these cases, the classifier will be used to predict an explicit discourse connective, which can be used in disambiguating the implicit relation. This would result in a method for automatic large-scale annotation of implicit discourse relations, another goal of the COST Action.
Based on these goals, the following work will be carried out:
1. Adaptation of the structure and the input features of the pronoun prediction neural network of Hardmeier et al. (EMNLP 2013) for prediction of discourse connectives (3 weeks).
2. Running of systematic experiments with different feature sets and context sizes to determine the impact of these choices on prediction performance (1 week).
3. Testing of the resulting predictor on corpus examples where there target language lacks an explicit discourse connective and evaluate how well these predictions reflect the nature of the implicit discourse relation (2 weeks).