Clara Meister will be giving a virtual talk on Wednesday, July 14th, from 10am — 11am. Zoom information will be distributed via the Berkeley NLP Seminar listserv. Please note this differs from our usual time slot.

Title: Evidence for the uniform information density hypothesis in modern NLP models

Abstract: In this talk, I will review two recent works that have operationalized the uniform information density (UID) hypothesis for use in models of natural language processing. In machine translation, it has been frequently observed that texts assigned high probability (i.e., low surprisal) are not necessarily what humans perceive to be high quality language. Alternatively, text decoded using beam search, a popular heuristic decoding method, often scores well in terms of both qualitative and automatic evaluation metrics, such as BLEU. We show that beam search can be framed as a UID-enforcing decoding objective and that there exists a strong relationship between BLEU and the extent to which UID is adhered to in natural language text.

In a follow up work, we explore the effects of directly incorporating an operationalization of UID into a language model’s training objective. Specifically, we augment the canonical MLE objective with a regularizer that encodes UID. In experiments on ten languages spanning five language families, we find that using UID regularization consistently improves perplexity in language models, having a larger effect when training data is limited. Moreover, via an analysis of generated sequences, we find that UID-regularized language models have other desirable properties, e.g., they generate text that is more lexically diverse.

arxiv.org/abs/2010.02650
arxiv.org/abs/2105.07144

Bio: Clara is a second year PhD in Computer Science with Professor Ryan Cotterell at ETH Zürich. She received her Master’s and Bachelor’s degrees in Computational and Mathematical Engineering from Stanford University. Her research focuses include decoding methods for language generators, analysis techniques for language models, and the general application of statistical methods to NLP.