Please join us for our NLP Seminar next Monday, April 16, at 4:00pm in 202 South Hall.
Speaker: Amber Boydstun (Associate Professor of Political Science, UC Davis)
Title: How Surges in Dominant Media Narratives Move Public Opinion
Studies examining the potential effects of media coverage on public attitudes toward policy issues (e.g., abortion, capital punishment) have identified three variables that, depending on the issue, can wield significant influence: the tone of the coverage (positive/negative/neutral), the frames used (e.g., discussing the issue from an economic vs. a moral perspective), and the overall level of media attention to the issue. Yet, to date, no study has examined all three variables in combination. We fill this gap by building a theoretical argument for why, despite the important variance across different issues, in general a single measure should be able to predict significant shifts in public opinion: surges in media attention to “dominant media narratives,” or stories that consistently frame the issue the same way (e.g., economic) using the same tone (e.g., anti-immigration) relative to other competing narratives. We test this hypothesis in U.S. newspaper coverage to three very different policy issues—immigration, same-sex marriage, and gun control—from 1992 to 2012. We use manual content analysis linked with computational modeling, tracking tone (pro/anti/neutral), emphasis frames (e.g., economic, morality), and overall levels of attention. Using time series analysis of public opinion data, we show that, for all three issues, previous surges in dominant media narratives significantly shape opinion. In short, when media coverage converges around a unified way of describing a policy issue, the public tends to follow. Our study adds to the fields of political communication and public opinion and marks an advance in computational text analysis methods. (Joint work with Dallas Card and Noah Smith)
Please join us for the NLP Seminar on Monday, March 12 at 4:00pm in 202 South Hall. All are welcome!
Speaker: Rob Voigt (Stanford)
Title: Implicit Attitudes, NLP, and the “Real World”
While some forms of bias in language are explicit, such as overt references to stereotypes, much linguistic bias is far more subtle, where implicit attitudes towards social groups pervasively affect how we talk to and about members of those groups. As a result, such variation is often identifiable only in aggregate accounting for the contexts of language use. In this talk, I will present two projects from my dissertation which aim to complement NLP techniques with on-the-ground facts about the world to understand the joint linguistic and extralinguistic factors that contribute to social biases.
First, I’ll present the results of a study using body camera footage from the Oakland Police Department as interactional data for analyzing racial disparities in officer language. Applying a computational linguistic model of respect across a month of everyday traffic stops, we found that officers were less respectful to black than to white community members, even after controlling for social factors like officer race and contextual factors like the location of the stop and the severity of the offense. Second, I’ll present ongoing work exploring representations of immigrants in the US news media over historical time. Our results thus far suggest cyclic patterns of linguistic “othering” that recur with each immigrant group as they arrive and are directly connected to economic and demographic circumstances of those groups.
( Slides )
Please join us for the NLP Seminar on Monday, February 26 at 4:00pm in 202 South Hall. All are welcome!
Speaker: Jonathan Kummerfeld: U Michigan
Title: Representing Online Conversation Structure with Graphs: A New Corpus and Model
When a group of people communicate online, their conversation is rarely linear, with each message responding only to the one immediately before it. To build systems that understand a group conversation we need a way to identify the discourse structure–what each message is responding to. I’ll speak about a new corpus we constructed with reply structure annotations for 19,924 messages across 58 hours of IRC discussion. Using our annotations we analyse strengths and weaknesses of a recent heuristically extracted set of conversations that have formed the basis of extensive work on dialogue systems (Lowe et al., 2015). Finally, I’ll present statistical models for the task, which improve thread extraction performance from 25.7 F (heuristic) to 60.3 F (our approach). Using our model we extract a new set of conversations that provide high quality data for use in downstream dialogue system development.
( Slides )
Please join us for the NLP Seminar on Monday, January 22, at 4:00pm in 202 South Hall. All are welcome!
Speaker: Jacob Andreas (Berkeley)
Title: Learning from Language
The named concepts and compositional operators present in natural language provide a rich source of information about the kinds of abstractions humans use to navigate the world. Can this information help us build better machine learning models? We’ll explore three different ways of using language to support learning: to provide structure to question answering models, fast training and improved generalization for reinforcement learners, and interpretability to general-purpose deep models.
( Slides )
The NLP Seminar continues in Spring 2018! We will continue meeting Mondays from 4:00-5:00pm, in room 202 South Hall. We’ll be meeting approximately once a month this semester. We are still filling out the schedule; this is a list of the calendar so far:
Sep 22: Jacob Andreas, UC Berkeley
Feb 26: Jonathan Kummerfeld, U Michigan
Mar 12: Rob Voigt, Stanford
Apr 16: Amber Boydstun, UC Davis
Apr 30: Lyn Walker, UC Santa Cruz
For up to the minute notifications, join the email list (UC Berkeley community only).
Please join us for the NLP Seminar on Monday, November 13, at 4:00pm in 202 South Hall. All are welcome!
Speaker: He He (Stanford)
Title: Learning agents that interact with humans
The future of virtual assistants, self-driving cars, and smart homes require intelligent agents that work intimately with users. Instead of passively following orders given by users, an interactive agent must actively collaborate with people through communication, coordination, and user-adaptation. In this talk, I will present our recent work towards building agents that interact with humans. First, we propose a symmetric collaborative dialogue setting in which two agents, each with some private knowledge, must communicate in natural language to achieve a common goal. We present a human-human dialogue dataset that poses new challenges to existing models, and propose a neural model with dynamic knowledge graph embedding. Second, we study the user-adaptation problem in quizbowl – a competitive, incremental question-answering game. We show that explicitly modeling of different human behavior leads to more effective policies that exploits sub-optimal players. I will conclude by discussing opportunities and open questions in learning interactive agents.
Please join us for the NLP Seminar on Monday, October 30, at 4:00pm in 202 South Hall. All are welcome!
Speaker: Christopher Potts (Stanford Linguistics)
Title: Enriching distributional linguistic representations with structured resources
One of the most powerful ideas in natural language processing is that we can represent words and phrases using dense vectors learned from co-occurrence patterns in text. Such representations have proven themselves in many settings, and one might even argue that they make good on a common intuition among linguists: that words tend to be incredibly complex and related to each other in all sorts of subtle ways. However, co-occurrence patterns alone tend to yield only a blurry picture of the rich relationships that exist between concepts, which raises the question of how best to incorporate additional information from more structured resources. This talk will explore methods for achieving this synthesis, with special emphasis on the retrofitting method pioneered by Faruqui et al. (2015), in which existing representations are updated based on their position in a knowledge graph. I’ll describe and motivate a generalization of Faruqui et al.’s framework that explicitly models graph relations as functions (Lengerich et al. 2017), and I’ll discuss some potential pitfalls of retrofitting (Cases et al. 2017). My overall goal is to stimulate discussion about how to obtain semantically nuanced distributed representations that are useful in diverse tasks.
( Slides )
Cases, Ignacio; Minh-Thang Luong; and Christopher Potts. 2017. On the effective use of pretraining for natural language inference. Ms., Stanford University. https://arxiv.org/abs/1710.02076
Faruqui, Manaal; Jesse Dodge; Sujay K. Jauhar; Chris Dyer; Eduard Hovy; and Noah A. Smith. 2015. Retrofitting word vectors to semantic lexicons. NAACL. http://www.aclweb.org/anthology/N15-1184
Lengerich, Benjamin J.; Andrew L. Maas; and Christopher Potts. 2017. Retrofitting distributional embeddings to knowledge graphs with functional relations. Ms., Carnegie Mellon University, Stanford University, and Roam Analytics. https://arxiv.org/abs/1708.00112
Please join us for the next NLP Seminar on Monday, October 9, at 4:00pm in 202 South Hall.
Speaker: Siva Reddy (Stanford)
Title: Linguists-defined vs. Machine-induced Natural Language Structures for Executable Semantic Parsing
Querying a database to retrieve an answer, telling a robot to perform an action, or teaching a computer to play a game are tasks requiring communication with machines in a language interpretable by them. Here we consider the task of converting human languages to a knowledge-base (KB) language for question-answering. While human languages have latent structures, machine interpretable languages have explicit formal structures. The computational linguistics community has created several treebanks to understand the formal structures of human languages, e.g., universal dependencies. But are these useful for deriving machine interpretable formal structures?
In the first part of the talk, I will discuss how to convert universal dependencies in multiple languages to both general-purpose and kb-executable logical forms. In the second part, I will present a neural model on how to induce task-specific natural language structures. I will discuss the similarities and differences between linguists-defined and machine-induced structures, and pros and cons of each.
Siva Reddy is a postdoc at the Stanford NLP group working with Chris Manning. His research focuses on finding fundamental representations of language, mostly interpretable, which are useful for NLP applications, especially machine understanding. In this direction, he is currently exploring whether linguistic representations are necessary or all we need is end-to-end learning. His postdoc is partly funded by a Facebook AI Research grant. Prior to the postdoc, he was a Google PhD Fellow at the University of Edinburgh under the supervision of Mirella Lapata and Mark Steedman. He worked with Google Parsing team as an intern during his PhD, and as a full-time employee for Adam Kilgarriff’s Sketch Engine before his PhD. His team won the first place in SemEval 2011 Compositionality Detection task and a best paper at IJCNLP 2011. Apart from language, he loves nature and badminton.
Please join us for our first NLP Seminar of the Fall semester on Monday, September 25, at 4:00pm in 202 South Hall.
Speaker: David Smith (Northeastern University)
Title: Modeling Text Dependencies: Information Cascades, Translations, and Multi-Input Encoders
Dependencies among texts arise when speakers and writers copy manuscripts, cite the scholarly literature, speak from talking points, repost content on social networking platforms, or in other ways transform earlier texts. While in some cases these dependencies are observable—e.g., by citations or other links—we often need to infer them from the text alone. In our Viral Texts project, for example, we have built models of reprinting for noisily-OCR’d nineteenth-century newspapers to trace the flow of news, literature, jokes, and anecdotes throughout the United States. Our Oceanic Exchanges project is now extending that work to information propagation across language boundaries. Other projects in our group involve inferring and exploiting text dependencies to model the writing of legislation, the impact of scientific press releases, and changes in the syntax of language.
In this talk, I will discuss methods both for inferring these dependency structures and for exploiting them to improve other tasks. First, I will describe a new directed spanning tree model of information cascades and a new contrastive training procedure that exploits partial temporal ordering in lieu of labeled link data. This model outperforms previous approaches to network inference on blog datasets and, unlike those approaches, can evaluate individual links and cascades. Then, I will describe methods for extracting parallel passages from large multilingual, but not parallel, corpora by performing efficient search in the continuous document-topic simplex of a polylingual topic model. These extracted bilingual passages are sufficient to train translation systems with greater accuracy than some standard, smaller clean datasets. Finally, I will describe methods for automatically detecting multiple transcriptions of the same passage in a large corpus of noisy OCR and for exploiting these multiple witnesses to correct noisy text. These multi-input encoders provide an efficient and effective approximation to the intractable multi-sequence alignment approach to collation and allow us to produce transcripts with more than 75% reductions in error.
The NLP Seminar is back for Fall 2017! We will slightly change our meeting to Mondays from 4:00-5:00pm, in almost the same location, room 210 South Hall. We’ll be meeting approximately once a month this semester.
Here is the speaker for this semester:
Sep 25: David Smith, Northeastern U
Oct 9: Siva Reddy, Stanford U
Oct 30: Christopher Potts, Stanford U
Nov 13: He He: Stanford U
Amber Boydstun, UC Davis: postponed to Spring 2018
For up to the minute notifications, join the email list (UC Berkeley community only).