Page 2 of 8

Maria Antoniak will be giving a virtual talk on Friday, December 3, from 11am-noon. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Modeling Personal Experiences Shared in Online Communities

Abstract: Written communications about personal experiences—and the emotions, narratives, and values that they contain—can be both rhetorically powerful and statistically difficult to model. My research uses natural language processing methods to represent complex personal experiences and self-disclosures communicated in online communities. Two fruitful sites for this research are online communities grounded in structured cultural experiences (books, games) and online communities grounded in healthcare experiences (childbirth, contraception, pain management). These communities situate personal opinions and stories in social contexts of reception, expectation, and judgment. In two case studies, I’ll show how quantifying textual patterns reveals community reframings and redefinitions of established narratives and hierarchies.

Bio: Maria Antoniak is a PhD candidate in Information Science at Cornell University. Her research focuses on unsupervised natural language processing methods and applications to computational social science and cultural analytics. Her work translates methods from natural language processing to insights about communities and self-disclosure by modeling personal experiences shared in online communities. She has a master’s degree in computational linguistics from the University of Washington and a bachelor’s degree in humanities from the University of Notre Dame, and she has completed research internships at Microsoft, Facebook, Twitter, and Pacific Northwest National Laboratory.


Ashique KhudaBukhsh will be giving a virtual talk on Friday, August 20th, from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Novel Frameworks for Quantifying Political Polarization and Mitigating Hate Speech

Abstract: The first part of the talk presents a new methodology that offers a fresh perspective on interpreting and understanding political polarization through machine translation. I begin with a novel proposition that two sub-communities viewing different US cable news networks are speaking in two different languages. Next, I demonstrate that with this assumption, modern machine translation methods can provide a simple yet powerful and interpretable framework to understand the differences between two (or more) large-scale social media discussion data sets at the granularity of words.

The second part of the talk presents a new direction for mitigating online hate. Much of the existing research geared toward making the internet a safer place involves identifying hate speech as the first step. However, little or no attention is given to the possibility that the not-hate-speech subset of the corpus may contain content with potentially positive societal impact. I introduce two new tasks, namely hope speech detection — detecting hostility-diffusing, peace- seeking content — and help speech detection — detecting content supportive of a disenfranchised minority. I illustrate applications of these two new tasks in the context of the most-recent India-Pakistan conflict triggered by the 2019 Pulwama terror attack, and the longstanding Rohingya refugee crisis that rendered more than 700,000 people homeless. Beyond the framework novelty of focusing on the positive content, this work addresses several practical challenges that arise from multilingual texts in a noisy, social media setting.

Bio: Ashique KhudaBukhsh is an assistant professor at the Golisano College of Computing and Information Sciences, Rochester Institute of Technology (RIT). His current research lies at the intersection of NLP and AI for Social Impact as applied to: (i) globally important events arising in linguistically diverse regions requiring methods to tackle practical challenges involving multilingual, noisy, social media texts; and (ii) polarization in the context of the current US political crisis. In addition to having his research been accepted at top artificial intelligence conferences and journals, his work has also received widespread international media attention that includes multiple coverage from BBC, Wired, Salon, The Independent, VentureBeat, and Digital Trends.

Prior to joining RIT, Ashique was a Project Scientist at the Language Technologies Institute, Carnegie Mellon University (CMU) mentored by Prof. Tom Mitchell. Prior to this, he was a postdoc mentored by Prof. Jaime Carbonell at CMU. His PhD thesis (Computer Science Department, CMU, also advised by Prof. Jaime Carbonell) focused on distributed active learning.

Clara Meister will be giving a virtual talk on Wednesday, July 14th, from 10am — 11am. Zoom information will be distributed via the Berkeley NLP Seminar listserv. Please note this differs from our usual time slot.

Title: Evidence for the uniform information density hypothesis in modern NLP models

Abstract: In this talk, I will review two recent works that have operationalized the uniform information density (UID) hypothesis for use in models of natural language processing. In machine translation, it has been frequently observed that texts assigned high probability (i.e., low surprisal) are not necessarily what humans perceive to be high quality language. Alternatively, text decoded using beam search, a popular heuristic decoding method, often scores well in terms of both qualitative and automatic evaluation metrics, such as BLEU. We show that beam search can be framed as a UID-enforcing decoding objective and that there exists a strong relationship between BLEU and the extent to which UID is adhered to in natural language text.

In a follow up work, we explore the effects of directly incorporating an operationalization of UID into a language model’s training objective. Specifically, we augment the canonical MLE objective with a regularizer that encodes UID. In experiments on ten languages spanning five language families, we find that using UID regularization consistently improves perplexity in language models, having a larger effect when training data is limited. Moreover, via an analysis of generated sequences, we find that UID-regularized language models have other desirable properties, e.g., they generate text that is more lexically diverse.

Bio: Clara is a second year PhD in Computer Science with Professor Ryan Cotterell at ETH Zürich. She received her Master’s and Bachelor’s degrees in Computational and Mathematical Engineering from Stanford University. Her research focuses include decoding methods for language generators, analysis techniques for language models, and the general application of statistical methods to NLP.

Yanai Elazar will be giving a virtual talk on Tuesday, June 22nd, from 10am — 11am. Zoom information will be distributed via the Berkeley NLP Seminar listserv. Please note this differs from our usual time slot.

Title: Causal Attributions in Language Models

Abstract: The outstanding results of enormous language models are largely unexplained, and different methods in interpretability aim to interpret and analyze these models to understand their working mechanisms. Probing, one of these tools, suggests that properties that can be accurately predicted from these models’ representations are likely to explain some of the features or concepts that these models make use of in their predictions. In the first part of this talk, I’ll propose a new interpretability method that takes inspiration from counterfactuals – what would have been the prediction if the model had not accessed certain information – and claim it is a more suitable method for asking causal questions about how certain attributes are used by models. In the second part, I’ll talk about a different kind of probing that treats the model as a black box and uses cloze patterns to query the model for world knowledge, under the LAMA framework. I will first describe a new framework that measures consistency — invariance of a model’s behavior under meaning preserving alternation of its input — of language models to knowledge and show that current LMs are generally not consistent. Then, I will conclude with an ongoing work where we develop a causal diagram and highlight different concepts, like co-occurrences, that cause the model’s predictions (as opposed to true and robust knowledge acquisition).

Bio: Yanai Elazar is a third year PhD student at Bar-Ilan University, working with Prof. Yoav Goldberg. His main interests involve model interpretation, analysis, biases in datasets and models, and commonsense reasoning. Yanai was awarded with multiple scholarships, including the PBC fellowship for outstanding PhD candidates in Data Science, and the Google PhD Fellowship.

Sam Bowman will be giving a virtual talk on Friday, June 4th, from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: What Will it Take to Fix Benchmarking in Natural Language Understanding?

Abstract: Evaluation for many natural language understanding (NLU) tasks is broken: Unreliable and biased systems score so highly on standard benchmarks that there is little room for researchers who develop better systems to demonstrate their improvements. The recent trend to abandon IID benchmarks in favor of adversarially-constructed, out-of-distribution test sets ensures that current models will perform poorly, but ultimately only obscures the abilities that we want our benchmarks to measure. In this talk, based primarily on a recent position paper with George Dahl, I lay out four criteria that I argue NLU benchmarks should meet. I claim most current benchmarks fail at these criteria, and that adversarial data collection does not meaningfully address the causes of these failures. Instead, restoring a healthy evaluation ecosystem will require significant progress in the design of benchmark datasets, the reliability with which they are annotated, their size, and the ways they handle social bias.

Bio: Sam Bowman has been on the faculty at NYU since 2016, when he completed PhD with Chris Manning and Chris Potts at Stanford. At NYU, he is a member of the Center for Data Science, the Department of Linguistics, and Courant Institute’s Department of Computer Science. His research focuses on data, evaluation techniques, and modeling techniques for sentence and paragraph understanding in natural language processing, and on applications of machine learning to scientific questions in linguistic syntax and semantics. He is the senior organizer behind the GLUE and SuperGLUE benchmark competitions; he organized a twenty-three-person research team at JSALT 2018; and he received a 2015 EMNLP Best Resource Paper Award, a 2019 *SEM Best Paper Award, and a 2017 Google Faculty Research Award.

The Berkeley NLP Seminar will continue to be held virtually for Summer 2021, hosted virtually on Zoom. Currently, we have two talks scheduled:

  • Friday, June 4th: Sam Bowman. New York University. 11am – 12pm (PDT). “What Will it Take to Fix Benchmarking in Natural Language Understanding?”
  • Tuesday, June 22nd. Yanai Elazar. Bar-Ilan University. 10am – 11am (PDT).
  • Wednesday, July 14th: Clara Meister. ETH Zürich. 10am – 11am (PDT).

If you are interested in joining our mailing list, please contact

Su Lin Blodgett will be giving a virtual talk on Friday, March 12th, from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Language and Justice: Reconsidering Harms in NLP Systems and Practices

Abstract: Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these language technologies do not accrue evenly to all users, and they can be harmful; NLP systems reproduce stereotypes, prevent speakers of ‘’non-standard’’ language varieties from participating fully in public discourse, and reinscribe historical patterns of linguistic stigmatization and discrimination. In this talk, I will draw together literature from sociolinguistics, linguistic anthropology, education, and more to provide an account of some of the relationships between language and social justice, paying attention to how grounding ourselves in these relationships can help us understand what system behaviors and research practices are harmful, who is harmed, and how. I will argue that orienting ourselves to this literature opens up many directions for thinking about the social implications of NLP systems, and share some of my early thinking on some of these directions.

Bio: Su Lin Blodgett is a postdoctoral researcher in the Fairness, Accountability, Transparency, and Ethics (FATE) group at Microsoft Research Montréal. She is broadly interested in examining the social implications of NLP technologies, and in using NLP approaches to examine language variation and change (computational sociolinguistics). Previously, she completed her Ph.D. in computer science at the University of Massachusetts Amherst, where she was also supported by the NSF Graduate Research Fellowship.

Naomi Saphra will be giving a virtual talk on Friday, November 13th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Learning Dynamics of Language Models

Abstract: When people ask why a neural network is so effective at solving some task, some researchers mean, “How does training impose bias towards effective representations?” This approach can lead to inspecting loss landscapes, analyzing convergence, or identifying critical learning periods. Other researchers mean, “How does this model represent linguistic structure?” This approach can lead to model probing, testing on challenge sets, or inspecting attention distributions. The work in this talk instead considers the question, “How does training impose bias towards linguistic structure?”

This question is of interest to NLP researchers as well as general machine learning researchers. Language has well-studied compositional behavior, offering a realistic but intuitive domain for studying the gradual development of structure over the course of training. Meanwhile, studying why existing models and training procedures are effective by relating them to language offers insights into improving them. I will discuss how models that target different linguistic properties diverge over the course of training, and relate their behavior to current practices and theoretical proposals. I will also propose a new method for analyzing hierarchical behavior in LSTMs, and apply it in synthetic experiments to illustrate how LSTMs learn like classical parsers.

Bio: Naomi Saphra is a current PhD student at University of Edinburgh, studying the emergence of linguistic structure during neural network training: the intersection of linguistics, interpretability, formal grammars, and learning theory. They have a surprising number of awards and honors, but all are related to an accident of nature that has limited their ability to type and write. They have been called “the Bill Hicks of radical AI-critical comedy” exactly once.

Tom McCoy will be giving a virtual talk on Friday, October 16th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Analyzing the Syntactic Inductive Biases of Sequence-to-Sequence Networks

Abstract: Current NLP models reach human-level performance on many benchmarks but are not very human-like. They require far more training data than humans, and they generalize much less robustly than humans. Both problems have the same cause: our models have poor inductive biases, which are the factors that guide learning and generalization. 

In this talk, I will discuss one inductive bias that is especially relevant for language, namely a bias for making generalizations based on hierarchical structure rather than linear order. I analyze this bias by training sequence-to-sequence models on two syntactic tasks. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. At test time, by evaluating on examples that disambiguate the two possible generalizations, we can see whether each model has a hierarchical bias. Using this methodology, I will show that a wide array of factors can qualitatively affect a model’s inductive biases, often in surprising ways. For example, adding parse information to the input fails to impart a hierarchical bias. The only factor that consistently contributes a hierarchical bias is the use of a tree-structured model, suggesting that human-like syntactic generalization requires architectural syntactic structure. I will close by discussing the implications for a longstanding debate in linguistics (the poverty-of-the-stimulus debate) about which innate biases guide human language acquisition.

Bio: Tom McCoy is a PhD student in the Department of Cognitive Science at Johns Hopkins University, advised by Tal Linzen and Paul Smolensky. He studies the linguistic abilities of neural networks and humans, focusing on inductive biases and representations of compositional structure. He also creates computational linguistic puzzles for NACLO, a contest that introduces high school students to linguistics.

Chris Kennedy will be giving a virtual talk on Friday, September 25th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv. [Update: The preprint for this talk is now available.]

Title: Constructing Interval Variables via Faceted Rasch Measurement and Multitask Deep Learning: A Hate Speech Application

Abstract: We propose a general method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT). We decompose the target construct, hate speech, into multiple constituent components that are labeled as ordinal survey items by human reviewers. Those survey responses are transformed via an IRT nonlinear activation into a debiased, continuous outcome measure. Our method estimates the survey interpretation bias of the human labelers and eliminates that influence on the generated continuous measure. We further estimate the response quality of individual labelers using faceted IRT, allowing low-quality labels to be removed or down-weighted.

Our faceted Rasch scaling procedure integrates naturally with a multitask, weight-sharing deep learning architecture for automated prediction on new data. The ratings on the theorized components of the target outcome are used as supervised, ordinal latent variables for the neural networks’ internal concept learning, improving adversarial robustness and promoting generalizability. We test the use of a neural activation function (ordinal softmax) and loss function (ordinal cross-entropy) designed to exploit the structure of ordinal outcome variables. Our multitask architecture leads to a new form of model interpretability because each continuous prediction can be directly explained by the constituent components in the penultimate layer.

We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 10,000 United States-based Amazon Mechanical Turk workers to measure a continuous spectrum from hate speech to counterspeech. We evaluate Universal Sentence Encoders, BERT, and RoBERTa as language representation models for the comment text, and compare our predictive accuracy to Google Jigsaw’s Perspective API models, showing significant improvement over this standard benchmark.

BioChris Kennedy is a postdoctoral fellow in biomedical informatics at Harvard Medical School; he received his PhD in biostatistics from UC Berkeley in 2020. His research interests include targeted causal inference and medical AI. Chris chaired the TextXD: Text Analysis Across Domains conference in 2019 & 2018 as a Berkeley Institute for Data Science fellow. He remains a research associate of D-Lab, the Integrative Cancer Research Group, and Kaiser Permanente’s Division of Research. In 2018 he led data science for the election campaigns of Governor Gavin Newsom and Congresswoman Katie Porter.

« Older posts Newer posts »