Balto-Slavic Natural Language Processing 2013

Workshop Proceedings

Invited talks

University of Leeds, UK

Towards Pan-Slavonic NLP: some experiments with Language Adaptation

Abstract:

There is great variation in the amount of the NLP resources available for Slavonic languages. For example, the Universal Dependency treebank has about 2 million words of training resources for Czech and for Russian, only 950 words for Ukrainian and nothing for Belarussian, Bosnian or Macedonian. Similarly, the Autodesk Machine Translation dataset only covers three Slavonic languages (Czech, Polish and Russian).

In this talk I will discuss a general approach, which can be called Language Adaptation, similarly to Domain Adaptation. In this approach, a model for a particular language processing task is built by lexical transfer of cognate words and by learning a new feature representation for a lesser-resourced language starting from a better-resourced one. More specifically, I will demonstrate how language adaptation works in such training scenarios as Part-of- Speech tagging, syntactic parsing and translation quality estimation.

Workshop Schedule

09:00 - 09:10	Welcome remarks
09:10 - 10:00	Toward Pan-Slavic NLP: Some Experiments with Language Adaptation
	Invited talk by Serge Sharoff
	Session I: Lexical Semantics chair: Tanja Samardžic
10:10 - 10:35	Clustering of Russian Adjective-Noun Constructions using Word Embeddings
	Andrey Kutuzov, Elizaveta Kuzmenko and Lidia Pivovarova
10:35 - 11:00	A Preliminary Study of Croatian Lexical Substitution
	Domagoj Alagic and Jan Šnajder
11:00 - 11:30	Coffee break
	Session II: Development of Linguistic Resources chair: Lidia Pivovarova
11:30 – 11:55	Projecting Multiword Expression Resources on a Polish Treebank
	Agata Savary and Jakub Waszczuk
11:55 – 12:20	Lexicon Induction for Spoken Rusyn – Challenges and Results
	Achim Rabus and Yves Scherrer
12:20 - 12:45	The Universal Dependencies Treebank for Slovenian
	Kaja Dobrovoljc, Tomaž Erjavec and Simon Krek
12:45 - 13:10	Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages
	Tanja Samardžic, Mirjana Starovic, Željko Agic and Nikola Ljubešic
13:10 - 14:30	Lunch
	Session III: Processing Non-Standard Language and User-Generated Content chair: Serge Sharoff
14:30 - 14:55	Spelling Correction for Morphologically Rich Language: a Case Study of Russian
	Alexey Sorokin
14:55 - 15:20	Debunking Sentiment Lexicons: A Case of Domain-Specific Sentiment Classification for Croatian
	Paula Gombar, Zoran Medic, Domagoj Alagic and Jan Šnajder
15:20 - 15:45	Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text
	Nikola Ljubešic, Tomaž Erjavec and Darja Fišer
15:45 - 16:10	Comparison of Short-Text Sentiment Analysis Methods for Croatian
	Leon Rotim and Jan Šnajder
16:10 - 16:30	Coffee break
	Session IV: Shared Task on Multilingual Named Entity Recognition chair: Jakub Piskorski, Josef Steinberger
16:30 - 16:45	The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in Slavic Languages
	Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger and Roman Yangarber
16:45 - 16:55	Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation
	James Mayfield, Paul McNamee and Cash Costello
16:55 - 17:05	Liner2 — a Generic Framework for Named Entity Recognition
	Michał Marcińczuk, Jan Kocoń and Marcin Oleksy
17:05 - 17:15	Discussion
	Session V: Information Filtering, Retrieval, and Extraction chair: Jan Snajder
17:20 - 17:40	Comparison of String Similarity Measures for Obscenity Filtering
	Ekaterina Chernyak
17:40 - 18:00	Stylometric Analysis of Parliamentary Speeches: Gender Dimension
	Justina Mandravickaite and Tomas Krilavicius
18:00 - 18:20	Towards Never Ending Language Learning for Morphologically Rich Languages
	Kseniya Buraya, Lidia Pivovarova, Sergey Budkov and Andrey Filchenkov
18:20 - 18:40	Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style
	Ben Verhoeven, Iza Škrjanec and Senja Pollak
	END OF WORKSHOP

Workshop Proceedings

Invited talks

Serge Sharoff

Workshop Schedule