Hyperparameter Power Impact in Transformer Language Model Training

Abstract

Training large language models can consume a large amount of energy. We hypothesize that the language model’s configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To investigate these claims, we introduce a power consumption factor to the objective function, and explore the range of models and hyperparameter configurations that affect power. We identify multiple configuration factors that can reduce power consumption during language model training while retaining model quality.

Publication
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing
Lucas Høyberg Puvis de Chavannes
Lucas Høyberg Puvis de Chavannes
M47 Labs

Lucas worked on energy efficiency factors in transformer language models

Mads Guldborg Kjeldgaard Kongsbak
Mads Guldborg Kjeldgaard Kongsbak
Researcher

Mads' work focuses on efficient neural language model architectures

Timmie Mikkel Lagermann Nielsen
Timmie Mikkel Lagermann Nielsen
Alumni

Timmie worked on energy efficiency in transformer language models

Leon Derczynski
Leon Derczynski
Associate professor

My research interests include NLP for misinformation detection and verification, clinical record processing, online harms, and efficient AI.