Page 3 of 8

Marco Ribeiro is giving a talk on Friday, Feb 28 at 11am – 12pm in South Hall 210.

Title: What is wrong with my model? Detection and analysis of bugs in NLP models

Abstract: I will present two projects that deal with evaluation and analysis of NLP models beyond cross validation accuracy. First, I will talk about Errudite (ACL2019), a tool and set of principles for model-agnostic error analysis that is scalable and reproducible. Instead of manually inspecting a small set of examples, we propose systematically grouping of instances with filtering queries and counterfactual analysis (if possible).

Then, I will talk about ongoing work in which we borrow insights from software engineering (unit tests, etc) to propose a new testing methodology for NLP models. Our tests reveal a variety of critical failures in multiple tasks and models, and we show via a user study that the methodology can be used to easily detect previously unknown bugs.

Bio: Marco Tulio Ribeiro is a Senior Researcher at Microsoft Research and an Affiliate Assistant Professor at the University of Washington. His work is on facilitating the communication between humans and machine learning models, which includes interpretability, trust, debugging, feedback and robustness. He received his PhD from the University of Washington.

Maarten Sap is giving a talk on Friday, Feb 7 at 11am – 12pm in South Hall 210.

Title: Reasoning about Social Dynamics in Language

Abstract: Humans reasons about social dynamics when navigating everyday situations. Due to limited expressivity of existing NLP approaches, reasoning about the biased and harmful social dynamics in language remains a challenge, and can backfire against certain populations.

In the first part of the talk, I will analyze a failure case of NLP systems, namely, racial bias in automatic hate speech detection. We uncover severe racial skews in training corpora, and show that models trained on hate speech corpora acquire and propagate these racial biases. This results in tweets by self-identified African Americans being up to two times more likely to be labelled as offensive compared to others. We propose ways to reduce these biases, by making a tweet’s dialect more explicit during the annotation process.

Then, I will introduce Social Bias Frames, a conceptual formalism that models the pragmatic frames in which people project social biases and stereotypes onto others to reason about biased or harmful implications in language. Using a new corpus of 150k structured annotations, we show that models can learn to reason about high-level offensiveness of statements, but struggle to explain why a statement might be harmful. I will conclude with future directions for better reasoning about biased social dynamics.

Bio: Maarten Sap is a 5th year PhD student at the University of Washington advised by Noah Smith and Yejin Choi. He is interested in natural language processing (NLP) for social understanding; specifically in understanding how NLP can help us understand human behavior, and how we can endow NLP systems with social intelligence, social commonsense, or theory of mind. He’s interned on project Mosaic at AI2, working on social commonsense for artificial intelligence systems, and at Microsoft Research working on long-term memory and storytelling with Eric Horvitz.


Yonatan Belinkov is giving a talk on Friday, Jan 31 at 11am – 12pm in South Hall 210.

Title: Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

Abstract: The success of neural network models in various tasks, coupled with their opaque nature, has led to much interest in interpreting and analyzing such models. Common analysis methods for interpreting neural models in natural language processing typically examine either their structure (for example, probing classifiers) or their behavior (challenge sets, saliency methods), but not both. In this talk, I will propose a new methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. This methodology enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. I will demonstrate an application of this methodology to analyzing gender bias in pre-trained Transformer language models. In particular, we study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model’s sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are (i) sparse, concentrated in a small part of the network; (ii) synergistic, amplified or repressed by different components; and (iii) de-composable into effects flowing directly from the input and indirectly through the mediators. I will conclude by laying out a few ideas for future work on analyzing neural NLP models.

Bio: Yonatan Belinkov is a Postdoctoral Fellow at the Harvard School of Engineering and Applied Sciences (SEAS) and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research focuses on interpretability and robustness of neural network models of human language. His research has been published at various NLP/ML venues. His PhD dissertation at MIT analyzed internal language representations in deep learning models, with applications to machine translation and speech recognition. He is a Harvard Mind, Brain, and Behavior Fellow. He will be joining the Technion Computer Science department in Fall 2020.

Professor Tal Linzen is giving a talk on Thursday, Jan 23 at 11am – 12pm in South Hall 202

Title: Syntactic generalization in natural language inference

Abstract: Neural network models for natural language processing often perform very well on examples that are drawn from the same distribution as the training set. Do they accomplish such success by learning to solve the task as a human might solve it, or do they adopt heuristics that happen to work well on the data set in question, but do not reflect the normative definition of the task (how one “should” solve the task)? This question can be addressed effectively by testing how the system generalizes to examples constructed specifically to diagnose whether the system relies on such fallible heuristics. In my talk, I will discuss ongoing work applying this methodology to the natural language inference (NLI) task.

I will show that a standard neural model — BERT fine-tuned on the MNLI corpus — achieves high accuracy on the MNLI test set, but shows little sensitivity to syntactic structure when tested on our diagnostic data set (HANS); instead, the model relies on word overlap between the premise and the hypothesis, and concludes, for example, that “the doctor visited the lawyer” entails “the lawyer visited the doctor”. While accuracy on the test set is very stable across fine-tuning runs with different weight initializations, generalization behavior varies widely, with accuracy on some classes of examples ranging from 0% to 66%. Finally, augmenting the training set with a moderate number of examples that contradict the word overlap heuristic leads to a dramatic improvement in generalization accuracy. This improvement generalizes to constructions that were not included in the augmentation set. Overall, our results suggest that the syntactic deficiencies of the fine-tuned model do not arise primarily from poor abstract syntactic representations in the underlying BERT model; rather, because of its weak inductive bias, BERT requires a strong fine-tuning signal to favor those syntactic representations over simpler heuristics.

Bio: Tal Linzen is an Assistant Professor of Cognitive Science (with a joint appointment in Computer Science) at Johns Hopkins University, and affiliated faculty at the JHU Center for Language and Speech Processing. Before moving to Johns Hopkins, he was a postdoctoral researcher at the École Normale Supérieure in Paris, and before that he obtained his PhD from the Department of Linguistics at New York University. At Johns Hopkins, Tal directs the Computation and Psycholinguistics Lab, which develops computational models of human language comprehension and acquisition, as well as psycholinguistically-informed methods for interpreting, evaluating and extending neural network models for natural language processing. The lab’s work has appeared in venues such as ACL, CoNLL, EMNLP, ICLR, NAACL and TACL, as well as in journals such as Cognitive Science and Journal of Neuroscience. Tal co-organized the first two editions of the BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (EMNLP 2018, ACL 2019) and is a co-chair of CoNLL 2020.

NLP Seminar in Spring 2020 will be on Fridays at 11 am. Like last semester, there will be some exceptions to accommodate travel schedules of visitors.

This is our calendar so far, and will be updated as more people are added:

Jan 23: Tal Linzen, Johns Hopkins. 11am – 12pm, South Hall 202. (note this is a Thursday)

Jan 31: Yonatan Belinkov, Harvard. 11am – 12pm. South Hall 210.

Feb 7: Maarten Sap, UW. 11am – 12pm. South Hall 210.

Feb 28: Marco Ribeiro, Microsoft Research. 11am – 12pm.

March 6: Roy Schwartz, AI2. 11am – 12pm

March 13: No seminar.

March 27:  Spring break.

April 3: postponed due to coronavirus 

April 10: postponed due to coronavirus 

April 17. postponed due to coronavirus 

If you are in or visiting the Bay Area and interested in giving a talk, please contact Lucy at

Please join us for another NLP Seminar at 11:00am in 202 South Hall on Dec 6.

Speaker: Yoav Artzi (Cornell)

Title: Robot Control and Collaboration in Situated Instruction Following


I will present two projects studying the problem of learning to follow natural language instructions. I will present new datasets, a class of interpretable models for instruction following, learning methods that combine the benefits of supervised and reinforcement learning, and new evaluation protocols. In the first part, I will discuss the task of executing natural language instructions with a robotic agent. In contrast to existing work, we do not engineer formal representations of language meaning or the robot environment. Instead, we learn to directly map raw observations and language to low-level continuous control of a quadcopter drone. In the second part, I will propose the task of learning to follow sequences of instructions in a collaborative scenario, where both the user and the system execute actions in the environment and the user controls the system using natural language. To study this problem, we build CerealBar, a multi-player 3D game where a leader instructs a follower, and both act in the environment together to accomplish complex goals.The two projects were led by Valts Blukis, Alane Suhr, and collaborators. Additional information about both projects is available here:


Yoav Artzi is an Assistant Professor in the Department of Computer Science and Cornell Tech at Cornell University. His research focuses on learning expressive models for natural language understanding, most recently in situated interactive scenarios. He received an NSF CAREER award, paper awards in EMNLP 2015, ACL 2017, and NAACL 2018, a Google Focused Research Award, and faculty awards from Google, Facebook, and Workday. Yoav holds a B.Sc. summa cum laude from Tel Aviv University and a Ph.D. from the University of Washington.

Please join us for another NLP Seminar at 4:00pm in 202 South Hall on Nov 18th. We will have two speakers visiting from Stanford.

Speaker 1: Urvashi Khandelwal

Title: Generalization through Memorization: Nearest Neighbor Language Models


Neural language models (LMs) are typically trained on large amounts of data. However, generalizing to a larger corpus or to a different domain requires additional training which is expensive. This raises an important question – how can LMs generalize better without additional training? In this talk, I will introduce kNN-LMs which extend a pre-trained LM by linearly interpolating it with a k-nearest neighbors (kNN) model. Distances are computed in the pre-trained LM embedding space, and neighbors can be drawn from any text collection, including the original LM training set. Experiments show that using the original LM training data alone, without further training, can improve performance quite a bit. In addition, kNN-LM efficiently scales up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge. Together, these results strongly suggest that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search can help LMs to effectively use data without having to train on it.


Urvashi is a fifth year Computer Science PhD student at Stanford University. She works with the Stanford Natural Language Processing group, where she is advised by Prof. Dan Jurafsky. She works at the intersection of machine learning and natural language processing. More specifically, she is interested in analyzing and improving neural language models as well as sequence generation models.

Speaker 2: John Hewitt

Title: Probing Neural NLP: Ideas and Problems


Recent work in NLP has attempted to explore the basic linguistic skills induced by neural NLP models. Probing methods ask these questions through supervised analyses of models’ representations of sentences. In this talk, I’ll cover a new way of thinking about how neural networks can implicitly encode discrete structures, and provide probing evidence that ELMo and BERT have internal representations of syntax. I’ll then introduce work challenging the premises of probing, demonstrating that the methodology can admit false positive results and showing how probes can be designed and interpreted to avoid this.


John is a second year PhD student at Stanford University co-advised by Chris Manning and Percy Liang. He works on understanding the basic properties, capabilities, and limitations of neural networks for processing human language. He aims to  understand neural models for understanding’s sake, while also using the insights gained to develop models that learn and transfer more robustly from less data. He is the recipient of the EMNLP 2019 best paper runner up award.

Please join us for another NLP Seminar at 4:00pm in 202 South Hall on Oct 21st.

Speaker: Ian Tenney (Google)

Title: Probing for Structure in Sentence Representations


With the development of ELMo, BERT, and successors, pre-trained sentence encoders have become nearly ubiquitous in NLP. But what makes these models so powerful? What are they learning? A flurry of recent work – cheekily dubbed “BERTology” – seeks to analyze and explain these models, treating the encoder as an object of scientific inquiry.

In this talk, I’ll discuss a few of these analyses, focusing on our own “edge probing” work which looks at how linguistic structure is represented in deep models. Using tasks like tagging, parsing, and coreference as analysis tools, we show that language models learn strong representations of syntax but are less adept at semantic phenomena. Moreover, we find evidence of sequential reasoning, reminiscent of traditional pipelined NLP systems.

This work was jointly conducted with Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick.


Ian Tenney is a software engineer on the Language team at Google Research in Mountain View. His research focuses on understanding and analysis of deep NLP models, particularly on how they encode linguistic structure, and how unsupervised or weakly-supervised learning can give rise to complex representations and reasoning. He was a Senior Researcher on the sentence representation team for the 2018 JSALT workshop, and from 2016 to 2018 taught in the MIDS program at UC Berkeley School of Information. He holds an M.S. in Computer Science and a B.S. in Physics from Stanford.

Please join us for another NLP Seminar at 11:00 am at Soda 380 on Tuesday, Oct 8.

Speaker: Alexander Rush (Cornell)

Title: Revisiting Grammar Induction


Deep learning for NLP has become synonymous with global models trained with unlimited data. These models are incredible; however, they seem unlikely to tell us much about the way they (or language) work. Less heralded has been the ways in which deep methods have helped with inference in classical factored models. In this talk, I revisit the problem of grammar induction, an important benchmark task in NLP, using a variety of variational methods. Recent work shows that these methods greatly increase the performance of unsupervised learning methods. I argue that these approaches can be used in conjunction with global models to provide control in modern systems.


Alexander Sasha Rush is an Associate Professor at Cornell Tech. His group’s research is in the intersection of natural language processing, deep learning, and structured prediction with applications in machine translation, summarization, and text generation. He also supports open-source development including the OpenNMT project. His work has received several paper and demo awards at major NLP and visualization conferences, an NSF Career Award, and faculty awards.  He is currently the general chair of ICLR.

Please join us for another NLP Seminar at 4:00pm in 202 South Hall on Sept 30th.

Speaker: Jinfeng Rao (Facebook)

Title: Structure-Aware Learning and Decoding for Neural NLG in Task-Oriented Dialog


Generating fluent natural language responses from structured semantic representations is a critical step in task-oriented conversational systems. Previous work primarily use Seq2Seq models on flat meaning representations (MR), e.g., in the E2E NLG Challenge, which lacks of controllability of generated texts. We propose a tree-structured MR for better discourse-level structuring and sentence-level planning. We propose a constrained decoding and a tree-to-sequence approach to add structure constraints into model learning and decoding. Our experiments show both approaches lead to better semantic correctness and combining them achieves the best performance.

In addition, I will also briefly talk about my recent work on bridging the gap between relevance matching and semantic matching for short text similarity modeling.


Jinfeng Rao is currently a research scientist at Facebook Conversational AI. Before that, he was a visiting researcher at Stanford University. He obtained his PhD with Prof. Jimmy Lin from University of Maryland College Park. Jinfeng’s research interest lies at the intersection of natural language processing, information retrieval and deep learning. At Facebook, he focuses on building and shipping world-class NLG system in Assistant. He has published more than 20 articles in the major NLP/ML conferences, including ACL, EMNLP, KDD, etc. His past research helped Comcast build their XFINITY voice search system, where his proposed multi-task system has processed billions of voice queries from 20M+ voice remotes in 2019. His work also helped Comcast win the 69th Emmy Award (2017) for the technical contributions in advancing television technologies.

(Slides, available to those with email)

« Older posts Newer posts »