Su Lin Blodgett will be giving a virtual talk on Friday, March 12th, from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Language and Justice: Reconsidering Harms in NLP Systems and Practices

Abstract: Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these language technologies do not accrue evenly to all users, and they can be harmful; NLP systems reproduce stereotypes, prevent speakers of ‘’non-standard’’ language varieties from participating fully in public discourse, and reinscribe historical patterns of linguistic stigmatization and discrimination. In this talk, I will draw together literature from sociolinguistics, linguistic anthropology, education, and more to provide an account of some of the relationships between language and social justice, paying attention to how grounding ourselves in these relationships can help us understand what system behaviors and research practices are harmful, who is harmed, and how. I will argue that orienting ourselves to this literature opens up many directions for thinking about the social implications of NLP systems, and share some of my early thinking on some of these directions.

Bio: Su Lin Blodgett is a postdoctoral researcher in the Fairness, Accountability, Transparency, and Ethics (FATE) group at Microsoft Research Montréal. She is broadly interested in examining the social implications of NLP technologies, and in using NLP approaches to examine language variation and change (computational sociolinguistics). Previously, she completed her Ph.D. in computer science at the University of Massachusetts Amherst, where she was also supported by the NSF Graduate Research Fellowship.

Naomi Saphra will be giving a virtual talk on Friday, November 13th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Learning Dynamics of Language Models

Abstract: When people ask why a neural network is so effective at solving some task, some researchers mean, “How does training impose bias towards effective representations?” This approach can lead to inspecting loss landscapes, analyzing convergence, or identifying critical learning periods. Other researchers mean, “How does this model represent linguistic structure?” This approach can lead to model probing, testing on challenge sets, or inspecting attention distributions. The work in this talk instead considers the question, “How does training impose bias towards linguistic structure?”

This question is of interest to NLP researchers as well as general machine learning researchers. Language has well-studied compositional behavior, offering a realistic but intuitive domain for studying the gradual development of structure over the course of training. Meanwhile, studying why existing models and training procedures are effective by relating them to language offers insights into improving them. I will discuss how models that target different linguistic properties diverge over the course of training, and relate their behavior to current practices and theoretical proposals. I will also propose a new method for analyzing hierarchical behavior in LSTMs, and apply it in synthetic experiments to illustrate how LSTMs learn like classical parsers.

Bio: Naomi Saphra is a current PhD student at University of Edinburgh, studying the emergence of linguistic structure during neural network training: the intersection of linguistics, interpretability, formal grammars, and learning theory. They have a surprising number of awards and honors, but all are related to an accident of nature that has limited their ability to type and write. They have been called “the Bill Hicks of radical AI-critical comedy” exactly once.

Tom McCoy will be giving a virtual talk on Friday, October 16th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Analyzing the Syntactic Inductive Biases of Sequence-to-Sequence Networks

Abstract: Current NLP models reach human-level performance on many benchmarks but are not very human-like. They require far more training data than humans, and they generalize much less robustly than humans. Both problems have the same cause: our models have poor inductive biases, which are the factors that guide learning and generalization. 

In this talk, I will discuss one inductive bias that is especially relevant for language, namely a bias for making generalizations based on hierarchical structure rather than linear order. I analyze this bias by training sequence-to-sequence models on two syntactic tasks. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. At test time, by evaluating on examples that disambiguate the two possible generalizations, we can see whether each model has a hierarchical bias. Using this methodology, I will show that a wide array of factors can qualitatively affect a model’s inductive biases, often in surprising ways. For example, adding parse information to the input fails to impart a hierarchical bias. The only factor that consistently contributes a hierarchical bias is the use of a tree-structured model, suggesting that human-like syntactic generalization requires architectural syntactic structure. I will close by discussing the implications for a longstanding debate in linguistics (the poverty-of-the-stimulus debate) about which innate biases guide human language acquisition.

Bio: Tom McCoy is a PhD student in the Department of Cognitive Science at Johns Hopkins University, advised by Tal Linzen and Paul Smolensky. He studies the linguistic abilities of neural networks and humans, focusing on inductive biases and representations of compositional structure. He also creates computational linguistic puzzles for NACLO, a contest that introduces high school students to linguistics.

Chris Kennedy will be giving a virtual talk on Friday, September 25th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv. [Update: The preprint for this talk is now available.]

Title: Constructing Interval Variables via Faceted Rasch Measurement and Multitask Deep Learning: A Hate Speech Application

Abstract: We propose a general method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT). We decompose the target construct, hate speech, into multiple constituent components that are labeled as ordinal survey items by human reviewers. Those survey responses are transformed via an IRT nonlinear activation into a debiased, continuous outcome measure. Our method estimates the survey interpretation bias of the human labelers and eliminates that influence on the generated continuous measure. We further estimate the response quality of individual labelers using faceted IRT, allowing low-quality labels to be removed or down-weighted.

Our faceted Rasch scaling procedure integrates naturally with a multitask, weight-sharing deep learning architecture for automated prediction on new data. The ratings on the theorized components of the target outcome are used as supervised, ordinal latent variables for the neural networks’ internal concept learning, improving adversarial robustness and promoting generalizability. We test the use of a neural activation function (ordinal softmax) and loss function (ordinal cross-entropy) designed to exploit the structure of ordinal outcome variables. Our multitask architecture leads to a new form of model interpretability because each continuous prediction can be directly explained by the constituent components in the penultimate layer.

We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 10,000 United States-based Amazon Mechanical Turk workers to measure a continuous spectrum from hate speech to counterspeech. We evaluate Universal Sentence Encoders, BERT, and RoBERTa as language representation models for the comment text, and compare our predictive accuracy to Google Jigsaw’s Perspective API models, showing significant improvement over this standard benchmark.

BioChris Kennedy is a postdoctoral fellow in biomedical informatics at Harvard Medical School; he received his PhD in biostatistics from UC Berkeley in 2020. His research interests include targeted causal inference and medical AI. Chris chaired the TextXD: Text Analysis Across Domains conference in 2019 & 2018 as a Berkeley Institute for Data Science fellow. He remains a research associate of D-Lab, the Integrative Cancer Research Group, and Kaiser Permanente’s Division of Research. In 2018 he led data science for the election campaigns of Governor Gavin Newsom and Congresswoman Katie Porter.

Abigail See will be giving a virtual talk on Friday, September 18th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations

Abstract: In this talk I will present Chirpy Cardinal, an open-domain dialogue agent built by the Stanford NLP team in 2019 Alexa Prize competition. Building an open-domain socialbot that talks to real people is challenging – such a system must meet multiple user expectations such as broad world knowledge, conversational style, and emotional connection. Our socialbot engages users on their terms – prioritizing their interests, feelings and autonomy. As a result, our socialbot provides a responsive, personalized user experience, capable of talking knowledgeably about a wide variety of topics, as well as chatting empathetically about ordinary life. Neural generation plays a key role in achieving these goals, providing the backbone for our conversational and emotional tone. Chirpy Cardinal ultimately won 2nd place in the competition final, with a 3.6/5.0 average customer rating.

Bio: Abigail See is a PhD student in the Stanford Natural Language Processing group, advised by Professor Christopher Manning. Her research focuses on improving the controllability, interpretablility and coherence of neural NLG in open-ended settings such as story generation and chitchat dialogue. At Stanford, she has been the co-instructor of CS224n: Natural Language Processing with Deep Learning, and the organizer of AI Salon, a discussion forum for AI. Twitter: @abigail_e_see.

The Berkeley NLP Seminar will be a virtual event for the Fall 2020 semester, hosted via Zoom on Fridays at 11am. Currently, we have four talks scheduled:

September 18th: Abigail See. Stanford University. 11am – 12pm (PDT).

September 25th: Chris Kennedy. Harvard Medical School. 11am – 12pm (PDT).

October 16th: Tom McCoy. Johns Hopkins University. 11am-12pm (PDT).

November 13th: Naomi Saphra, University of Ediburgh. 11am-12pm (PST).

If you are interested in joining our mailing list or giving a talk, please contact

Due to COVID-19, there are no seminars for the remainder of spring semester. Like previous years, there are also no seminars over the summer, as many students will be interning off-campus.

No decision has been made yet on whether seminars will continue in the fall or when they will start. This information will be posted on this website in a timely manner once we have determined a plan.

Thank you!

Roy Schwartz is giving a talk on Friday, March 6 at 11am – 12pm in South Hall 210.

Title: Green NLP

Abstract: The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018. These computations have a surprisingly large carbon footprint. Moreover, the financial cost of the computations can make it difficult for researchers, in particular those from emerging economies, to engage in deep learning research, as well as for customers to use this technology for their applications.

In the first part of this talk I will demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best; I will show that simply pouring more resources into hyperparameter and/or random seed tuning can lead to massive improvements, e.g., making BERT performance comparable to models like XLNet and RoBERTa. I will then present a novel technique for improved reporting: expected validation performance as a function of computation budget (e.g., the number of hyperparameter search trials). Our approach supports a fairer comparison across models, and allows to estimate the amount of computation required to obtain a given accuracy.

In the second part I will present a method to substantially reduce the inference cost of NLP models. Our method modifies the BERT fine-tuning process, and allows, during inference, for early (and fast) “exit” from neural network calculations for simple instances and late (and accurate) exit for hard instances. Our method presents a favorable speed/accuracy tradeoff on several datasets, producing models which are up to four times faster than the state of the art, while preserving their accuracy. Moreover, our method requires no additional training resources (in either time or parameters) compared to the baseline BERT model.

This is joint work with Dallas Card, Jesse Dodge, Ali Farhadi, Suchin Gururangan, Hannaneh Hajishirzi, Gabriel Ilharco, Oren Etzioni, Noah A. Smith, Gabi Stanovsky and Swabha Swayamdipta.

Bio: Roy Schwartz is a research scientist at the Allen institute for AI and the University of Washington. Roy’s research focuses on improving deep learning models for natural language processing, as well as making them more efficient, by gaining mathematical and linguistic understanding of these models. He received his Ph.D. in Computer Science from the Hebrew University of Jerusalem. He will be rejoining the school of Computer Science at the Hebrew University as an assistant professor in the fall of 2020.

Marco Ribeiro is giving a talk on Friday, Feb 28 at 11am – 12pm in South Hall 210.

Title: What is wrong with my model? Detection and analysis of bugs in NLP models

Abstract: I will present two projects that deal with evaluation and analysis of NLP models beyond cross validation accuracy. First, I will talk about Errudite (ACL2019), a tool and set of principles for model-agnostic error analysis that is scalable and reproducible. Instead of manually inspecting a small set of examples, we propose systematically grouping of instances with filtering queries and counterfactual analysis (if possible).

Then, I will talk about ongoing work in which we borrow insights from software engineering (unit tests, etc) to propose a new testing methodology for NLP models. Our tests reveal a variety of critical failures in multiple tasks and models, and we show via a user study that the methodology can be used to easily detect previously unknown bugs.

Bio: Marco Tulio Ribeiro is a Senior Researcher at Microsoft Research and an Affiliate Assistant Professor at the University of Washington. His work is on facilitating the communication between humans and machine learning models, which includes interpretability, trust, debugging, feedback and robustness. He received his PhD from the University of Washington.

Maarten Sap is giving a talk on Friday, Feb 7 at 11am – 12pm in South Hall 210.

Title: Reasoning about Social Dynamics in Language

Abstract: Humans reasons about social dynamics when navigating everyday situations. Due to limited expressivity of existing NLP approaches, reasoning about the biased and harmful social dynamics in language remains a challenge, and can backfire against certain populations.

In the first part of the talk, I will analyze a failure case of NLP systems, namely, racial bias in automatic hate speech detection. We uncover severe racial skews in training corpora, and show that models trained on hate speech corpora acquire and propagate these racial biases. This results in tweets by self-identified African Americans being up to two times more likely to be labelled as offensive compared to others. We propose ways to reduce these biases, by making a tweet’s dialect more explicit during the annotation process.

Then, I will introduce Social Bias Frames, a conceptual formalism that models the pragmatic frames in which people project social biases and stereotypes onto others to reason about biased or harmful implications in language. Using a new corpus of 150k structured annotations, we show that models can learn to reason about high-level offensiveness of statements, but struggle to explain why a statement might be harmful. I will conclude with future directions for better reasoning about biased social dynamics.

Bio: Maarten Sap is a 5th year PhD student at the University of Washington advised by Noah Smith and Yejin Choi. He is interested in natural language processing (NLP) for social understanding; specifically in understanding how NLP can help us understand human behavior, and how we can endow NLP systems with social intelligence, social commonsense, or theory of mind. He’s interned on project Mosaic at AI2, working on social commonsense for artificial intelligence systems, and at Microsoft Research working on long-term memory and storytelling with Eric Horvitz.


« Older posts