Discriminating Between Similar Nordic Languages

Abstract

Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.

Publication
Proceedings of the Workshop on NLP for Similar Languages, Varieties and Dialects
René Haas
René Haas
ITU Copenhagen

René worked on sample-efficient discrimination between similar languages.

Leon Derczynski
Leon Derczynski
Associate professor

My research interests include NLP for misinformation detection and verification, clinical record processing, online harms, and efficient AI.