Online misogyny, a category of online abusive language, has serious and harmful social consequences. Automatic detection of misogynistic language online, while imperative, poses complicated challenges to both data gathering, data annotation, and bias …
In this dataset paper we introduce PROCAT, a novel e-commerce dataset containing expertly designed product catalogues consisting of individual product offers grouped into complementary sections. We aim to address the scarcity of existing datasets in …
Training large language models can consume a large amount of energy. We hypothesize that the language model's configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To …
There is an acute need for large-scale help digesting scientific literature. In 2018, the total number of published scientific articles was estimated at 2.52 million and the number of scientific journals at around 30.000 . With such vast amounts of …
This paper presents a framework of opportunities and barriers/risks between the two research fields Natural Language Processing (NLP) and Human-Computer Interaction (HCI). The framework is constructed by following an interdisciplinary research-model …
Automatic detection of false claims is a difficult task. Existing data to support this task has largely been limited to English. We present a dataset, DANFEVER, intended for claim verification in Danish. The dataset builds upon the task framing of …
Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion …
Abusive phenomena are commonplace in language on the web. The scope of recognizing abusive language is broad, covering many behaviours and forms of expression. This work addresses automatic detection of abusive language in Russian. The lexical, …
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, …
The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with …