Category: seminars (Page 3 of 8)

Chris Kennedy will be giving a virtual talk on Friday, September 25th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv. [Update: The preprint for this talk is now available.]

Title: Constructing Interval Variables via Faceted Rasch Measurement and Multitask Deep Learning: A Hate Speech Application

Abstract: We propose a general method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT). We decompose the target construct, hate speech, into multiple constituent components that are labeled as ordinal survey items by human reviewers. Those survey responses are transformed via an IRT nonlinear activation into a debiased, continuous outcome measure. Our method estimates the survey interpretation bias of the human labelers and eliminates that influence on the generated continuous measure. We further estimate the response quality of individual labelers using faceted IRT, allowing low-quality labels to be removed or down-weighted.

Our faceted Rasch scaling procedure integrates naturally with a multitask, weight-sharing deep learning architecture for automated prediction on new data. The ratings on the theorized components of the target outcome are used as supervised, ordinal latent variables for the neural networks’ internal concept learning, improving adversarial robustness and promoting generalizability. We test the use of a neural activation function (ordinal softmax) and loss function (ordinal cross-entropy) designed to exploit the structure of ordinal outcome variables. Our multitask architecture leads to a new form of model interpretability because each continuous prediction can be directly explained by the constituent components in the penultimate layer.

We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 10,000 United States-based Amazon Mechanical Turk workers to measure a continuous spectrum from hate speech to counterspeech. We evaluate Universal Sentence Encoders, BERT, and RoBERTa as language representation models for the comment text, and compare our predictive accuracy to Google Jigsaw’s Perspective API models, showing significant improvement over this standard benchmark.

BioChris Kennedy is a postdoctoral fellow in biomedical informatics at Harvard Medical School; he received his PhD in biostatistics from UC Berkeley in 2020. His research interests include targeted causal inference and medical AI. Chris chaired the TextXD: Text Analysis Across Domains conference in 2019 & 2018 as a Berkeley Institute for Data Science fellow. He remains a research associate of D-Lab, the Integrative Cancer Research Group, and Kaiser Permanente’s Division of Research. In 2018 he led data science for the election campaigns of Governor Gavin Newsom and Congresswoman Katie Porter.

Abigail See will be giving a virtual talk on Friday, September 18th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations

Abstract: In this talk I will present Chirpy Cardinal, an open-domain dialogue agent built by the Stanford NLP team in 2019 Alexa Prize competition. Building an open-domain socialbot that talks to real people is challenging – such a system must meet multiple user expectations such as broad world knowledge, conversational style, and emotional connection. Our socialbot engages users on their terms – prioritizing their interests, feelings and autonomy. As a result, our socialbot provides a responsive, personalized user experience, capable of talking knowledgeably about a wide variety of topics, as well as chatting empathetically about ordinary life. Neural generation plays a key role in achieving these goals, providing the backbone for our conversational and emotional tone. Chirpy Cardinal ultimately won 2nd place in the competition final, with a 3.6/5.0 average customer rating.

Bio: Abigail See is a PhD student in the Stanford Natural Language Processing group, advised by Professor Christopher Manning. Her research focuses on improving the controllability, interpretablility and coherence of neural NLG in open-ended settings such as story generation and chitchat dialogue. At Stanford, she has been the co-instructor of CS224n: Natural Language Processing with Deep Learning, and the organizer of AI Salon, a discussion forum for AI. Twitter: @abigail_e_see.

The Berkeley NLP Seminar will be a virtual event for the Fall 2020 semester, hosted via Zoom on Fridays at 11am. Currently, we have four talks scheduled:

September 18th: Abigail See. Stanford University. 11am – 12pm (PDT).

September 25th: Chris Kennedy. Harvard Medical School. 11am – 12pm (PDT).

October 16th: Tom McCoy. Johns Hopkins University. 11am-12pm (PDT).

November 13th: Naomi Saphra, University of Ediburgh. 11am-12pm (PST).

If you are interested in joining our mailing list or giving a talk, please contact

Due to COVID-19, there are no seminars for the remainder of spring semester. Like previous years, there are also no seminars over the summer, as many students will be interning off-campus.

No decision has been made yet on whether seminars will continue in the fall or when they will start. This information will be posted on this website in a timely manner once we have determined a plan.

Thank you!

Roy Schwartz is giving a talk on Friday, March 6 at 11am – 12pm in South Hall 210.

Title: Green NLP

Abstract: The computations required for deep learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018. These computations have a surprisingly large carbon footprint. Moreover, the financial cost of the computations can make it difficult for researchers, in particular those from emerging economies, to engage in deep learning research, as well as for customers to use this technology for their applications.

In the first part of this talk I will demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best; I will show that simply pouring more resources into hyperparameter and/or random seed tuning can lead to massive improvements, e.g., making BERT performance comparable to models like XLNet and RoBERTa. I will then present a novel technique for improved reporting: expected validation performance as a function of computation budget (e.g., the number of hyperparameter search trials). Our approach supports a fairer comparison across models, and allows to estimate the amount of computation required to obtain a given accuracy.

In the second part I will present a method to substantially reduce the inference cost of NLP models. Our method modifies the BERT fine-tuning process, and allows, during inference, for early (and fast) “exit” from neural network calculations for simple instances and late (and accurate) exit for hard instances. Our method presents a favorable speed/accuracy tradeoff on several datasets, producing models which are up to four times faster than the state of the art, while preserving their accuracy. Moreover, our method requires no additional training resources (in either time or parameters) compared to the baseline BERT model.

This is joint work with Dallas Card, Jesse Dodge, Ali Farhadi, Suchin Gururangan, Hannaneh Hajishirzi, Gabriel Ilharco, Oren Etzioni, Noah A. Smith, Gabi Stanovsky and Swabha Swayamdipta.

Bio: Roy Schwartz is a research scientist at the Allen institute for AI and the University of Washington. Roy’s research focuses on improving deep learning models for natural language processing, as well as making them more efficient, by gaining mathematical and linguistic understanding of these models. He received his Ph.D. in Computer Science from the Hebrew University of Jerusalem. He will be rejoining the school of Computer Science at the Hebrew University as an assistant professor in the fall of 2020.

Marco Ribeiro is giving a talk on Friday, Feb 28 at 11am – 12pm in South Hall 210.

Title: What is wrong with my model? Detection and analysis of bugs in NLP models

Abstract: I will present two projects that deal with evaluation and analysis of NLP models beyond cross validation accuracy. First, I will talk about Errudite (ACL2019), a tool and set of principles for model-agnostic error analysis that is scalable and reproducible. Instead of manually inspecting a small set of examples, we propose systematically grouping of instances with filtering queries and counterfactual analysis (if possible).

Then, I will talk about ongoing work in which we borrow insights from software engineering (unit tests, etc) to propose a new testing methodology for NLP models. Our tests reveal a variety of critical failures in multiple tasks and models, and we show via a user study that the methodology can be used to easily detect previously unknown bugs.

Bio: Marco Tulio Ribeiro is a Senior Researcher at Microsoft Research and an Affiliate Assistant Professor at the University of Washington. His work is on facilitating the communication between humans and machine learning models, which includes interpretability, trust, debugging, feedback and robustness. He received his PhD from the University of Washington.

Maarten Sap is giving a talk on Friday, Feb 7 at 11am – 12pm in South Hall 210.

Title: Reasoning about Social Dynamics in Language

Abstract: Humans reasons about social dynamics when navigating everyday situations. Due to limited expressivity of existing NLP approaches, reasoning about the biased and harmful social dynamics in language remains a challenge, and can backfire against certain populations.

In the first part of the talk, I will analyze a failure case of NLP systems, namely, racial bias in automatic hate speech detection. We uncover severe racial skews in training corpora, and show that models trained on hate speech corpora acquire and propagate these racial biases. This results in tweets by self-identified African Americans being up to two times more likely to be labelled as offensive compared to others. We propose ways to reduce these biases, by making a tweet’s dialect more explicit during the annotation process.

Then, I will introduce Social Bias Frames, a conceptual formalism that models the pragmatic frames in which people project social biases and stereotypes onto others to reason about biased or harmful implications in language. Using a new corpus of 150k structured annotations, we show that models can learn to reason about high-level offensiveness of statements, but struggle to explain why a statement might be harmful. I will conclude with future directions for better reasoning about biased social dynamics.

Bio: Maarten Sap is a 5th year PhD student at the University of Washington advised by Noah Smith and Yejin Choi. He is interested in natural language processing (NLP) for social understanding; specifically in understanding how NLP can help us understand human behavior, and how we can endow NLP systems with social intelligence, social commonsense, or theory of mind. He’s interned on project Mosaic at AI2, working on social commonsense for artificial intelligence systems, and at Microsoft Research working on long-term memory and storytelling with Eric Horvitz.


Yonatan Belinkov is giving a talk on Friday, Jan 31 at 11am – 12pm in South Hall 210.

Title: Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

Abstract: The success of neural network models in various tasks, coupled with their opaque nature, has led to much interest in interpreting and analyzing such models. Common analysis methods for interpreting neural models in natural language processing typically examine either their structure (for example, probing classifiers) or their behavior (challenge sets, saliency methods), but not both. In this talk, I will propose a new methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. This methodology enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. I will demonstrate an application of this methodology to analyzing gender bias in pre-trained Transformer language models. In particular, we study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model’s sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are (i) sparse, concentrated in a small part of the network; (ii) synergistic, amplified or repressed by different components; and (iii) de-composable into effects flowing directly from the input and indirectly through the mediators. I will conclude by laying out a few ideas for future work on analyzing neural NLP models.

Bio: Yonatan Belinkov is a Postdoctoral Fellow at the Harvard School of Engineering and Applied Sciences (SEAS) and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research focuses on interpretability and robustness of neural network models of human language. His research has been published at various NLP/ML venues. His PhD dissertation at MIT analyzed internal language representations in deep learning models, with applications to machine translation and speech recognition. He is a Harvard Mind, Brain, and Behavior Fellow. He will be joining the Technion Computer Science department in Fall 2020.

Professor Tal Linzen is giving a talk on Thursday, Jan 23 at 11am – 12pm in South Hall 202

Title: Syntactic generalization in natural language inference

Abstract: Neural network models for natural language processing often perform very well on examples that are drawn from the same distribution as the training set. Do they accomplish such success by learning to solve the task as a human might solve it, or do they adopt heuristics that happen to work well on the data set in question, but do not reflect the normative definition of the task (how one “should” solve the task)? This question can be addressed effectively by testing how the system generalizes to examples constructed specifically to diagnose whether the system relies on such fallible heuristics. In my talk, I will discuss ongoing work applying this methodology to the natural language inference (NLI) task.

I will show that a standard neural model — BERT fine-tuned on the MNLI corpus — achieves high accuracy on the MNLI test set, but shows little sensitivity to syntactic structure when tested on our diagnostic data set (HANS); instead, the model relies on word overlap between the premise and the hypothesis, and concludes, for example, that “the doctor visited the lawyer” entails “the lawyer visited the doctor”. While accuracy on the test set is very stable across fine-tuning runs with different weight initializations, generalization behavior varies widely, with accuracy on some classes of examples ranging from 0% to 66%. Finally, augmenting the training set with a moderate number of examples that contradict the word overlap heuristic leads to a dramatic improvement in generalization accuracy. This improvement generalizes to constructions that were not included in the augmentation set. Overall, our results suggest that the syntactic deficiencies of the fine-tuned model do not arise primarily from poor abstract syntactic representations in the underlying BERT model; rather, because of its weak inductive bias, BERT requires a strong fine-tuning signal to favor those syntactic representations over simpler heuristics.

Bio: Tal Linzen is an Assistant Professor of Cognitive Science (with a joint appointment in Computer Science) at Johns Hopkins University, and affiliated faculty at the JHU Center for Language and Speech Processing. Before moving to Johns Hopkins, he was a postdoctoral researcher at the École Normale Supérieure in Paris, and before that he obtained his PhD from the Department of Linguistics at New York University. At Johns Hopkins, Tal directs the Computation and Psycholinguistics Lab, which develops computational models of human language comprehension and acquisition, as well as psycholinguistically-informed methods for interpreting, evaluating and extending neural network models for natural language processing. The lab’s work has appeared in venues such as ACL, CoNLL, EMNLP, ICLR, NAACL and TACL, as well as in journals such as Cognitive Science and Journal of Neuroscience. Tal co-organized the first two editions of the BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (EMNLP 2018, ACL 2019) and is a co-chair of CoNLL 2020.

NLP Seminar in Spring 2020 will be on Fridays at 11 am. Like last semester, there will be some exceptions to accommodate travel schedules of visitors.

This is our calendar so far, and will be updated as more people are added:

Jan 23: Tal Linzen, Johns Hopkins. 11am – 12pm, South Hall 202. (note this is a Thursday)

Jan 31: Yonatan Belinkov, Harvard. 11am – 12pm. South Hall 210.

Feb 7: Maarten Sap, UW. 11am – 12pm. South Hall 210.

Feb 28: Marco Ribeiro, Microsoft Research. 11am – 12pm.

March 6: Roy Schwartz, AI2. 11am – 12pm

March 13: No seminar.

March 27:  Spring break.

April 3: postponed due to coronavirus 

April 10: postponed due to coronavirus 

April 17. postponed due to coronavirus 

If you are in or visiting the Bay Area and interested in giving a talk, please contact Lucy at

« Older posts Newer posts »