Optimal Size-Performance Tradeoffs: Weighing PoS Tagger Models

Abstract

Improvements in machine learning-based NLP performance are often presented with bigger models and more complex code. This presents a trade-off: better scores come at the cost of larger tools; bigger models tend to require more during training and inference time. We present multiple methods for measuring the size of a model, and for comparing this with the model’s performance. In a case study over part-of-speech tagging, we then apply these techniques to taggers for eight languages and present a novel analysis identifying which taggers are size-performance optimal. Results indicate that some classical taggers place on the size-performance skyline across languages. Further, although the deep models have highest performance for multiple scores, it is often not the most complex of these that reach peak performance.

Publication
arXiv preprint arXiv:2104.07951
Magnus Malthe Jacobsen
Magnus Malthe Jacobsen
Grad student

Magnus works on efficient, small machine learning

Mikkel Hooge Sørensen
Mikkel Hooge Sørensen
Grad student

Mikkel works on efficient, small machine learning

Leon Derczynski
Leon Derczynski
Associate professor

My research interests include NLP for misinformation detection and verification, clinical record processing, online harms, and efficient AI.