Workshop Proceedings
Invited talks
Serge Sharoff
University of Leeds, UK
-
Towards Pan-Slavonic NLP: some experiments with Language Adaptation Abstract:There is great variation in the amount of the NLP resources available for Slavonic languages. For example, the Universal Dependency treebank has about 2 million words of training resources for Czech and for Russian, only 950 words for Ukrainian and nothing for Belarussian, Bosnian or Macedonian. Similarly, the Autodesk Machine Translation dataset only covers three Slavonic languages (Czech, Polish and Russian).
In this talk I will discuss a general approach, which can be called Language Adaptation, similarly to Domain Adaptation. In this approach, a model for a particular language processing task is built by lexical transfer of cognate words and by learning a new feature representation for a lesser-resourced language starting from a better-resourced one. More specifically, I will demonstrate how language adaptation works in such training scenarios as Part-of- Speech tagging, syntactic parsing and translation quality estimation.
Workshop Schedule
09:00 - 09:10 | Welcome remarks |
09:10 - 10:00 | Toward Pan-Slavic NLP: Some Experiments with Language Adaptation |
Invited talk by Serge Sharoff | |
Session I: Lexical Semantics chair: Tanja Samardžic | |
10:10 - 10:35 | Clustering of Russian Adjective-Noun Constructions using Word Embeddings |
Andrey Kutuzov, Elizaveta Kuzmenko and Lidia Pivovarova | |
10:35 - 11:00 | A Preliminary Study of Croatian Lexical Substitution |
Domagoj Alagic and Jan Šnajder | |
11:00 - 11:30 | Coffee break |
Session II: Development of Linguistic Resources chair: Lidia Pivovarova | |
11:30 – 11:55 | Projecting Multiword Expression Resources on a Polish Treebank |
Agata Savary and Jakub Waszczuk | |
11:55 – 12:20 | Lexicon Induction for Spoken Rusyn – Challenges and Results |
Achim Rabus and Yves Scherrer | |
12:20 - 12:45 | The Universal Dependencies Treebank for Slovenian |
Kaja Dobrovoljc, Tomaž Erjavec and Simon Krek | |
12:45 - 13:10 | Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages |
Tanja Samardžic, Mirjana Starovic, Željko Agic and Nikola Ljubešic | |
13:10 - 14:30 | Lunch |
Session III: Processing Non-Standard Language and User-Generated Content chair: Serge Sharoff | |
14:30 - 14:55 | Spelling Correction for Morphologically Rich Language: a Case Study of Russian |
Alexey Sorokin | |
14:55 - 15:20 | Debunking Sentiment Lexicons: A Case of Domain-Specific Sentiment Classification for Croatian |
Paula Gombar, Zoran Medic, Domagoj Alagic and Jan Šnajder | |
15:20 - 15:45 | Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text |
Nikola Ljubešic, Tomaž Erjavec and Darja Fišer | |
15:45 - 16:10 | Comparison of Short-Text Sentiment Analysis Methods for Croatian |
Leon Rotim and Jan Šnajder | |
16:10 - 16:30 | Coffee break |
Session IV: Shared Task on Multilingual Named Entity Recognition chair: Jakub Piskorski, Josef Steinberger | |
16:30 - 16:45 | The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in Slavic Languages |
Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger and Roman Yangarber | |
16:45 - 16:55 | Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation |
James Mayfield, Paul McNamee and Cash Costello | |
16:55 - 17:05 | Liner2 — a Generic Framework for Named Entity Recognition |
Michał Marcińczuk, Jan Kocoń and Marcin Oleksy | |
17:05 - 17:15 | Discussion |
Session V: Information Filtering, Retrieval, and Extraction chair: Jan Snajder | |
17:20 - 17:40 | Comparison of String Similarity Measures for Obscenity Filtering |
Ekaterina Chernyak | |
17:40 - 18:00 | Stylometric Analysis of Parliamentary Speeches: Gender Dimension |
Justina Mandravickaite and Tomas Krilavicius | |
18:00 - 18:20 | Towards Never Ending Language Learning for Morphologically Rich Languages |
Kseniya Buraya, Lidia Pivovarova, Sergey Budkov and Andrey Filchenkov | |
18:20 - 18:40 | Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style |
Ben Verhoeven, Iza Škrjanec and Senja Pollak | |
END OF WORKSHOP |