Please join us for the NLP Seminar Monday, Mar 6 at 4:00pm in 205 South Hall.
Speaker: Joel Tetreault, Grammarly
Title: Analyzing Formality in Online Communication
Full natural language understanding requires comprehending not only the content or meaning of a piece of text or speech, but also the stylistic way in which it is conveyed. To enable real advancements in dialog systems, information extraction, and human-computer interaction, computers need to understand the entirety of what humans say, both the literal and the non-literal. This talk presents an in-depth investigation of one particular stylistic aspect, formality. First, we provide an analysis of humans’ subjective perceptions of formality in four different genres of online communication. We highlight areas of high and low agreement and extract patterns that consistently differentiate formal from informal text. Next, we develop a statistical model for predicting formality at the sentence level, using rich NLP and deep learning features, and then evaluate the model’s performance against human judgments across genres. Finally, we apply our model to analyze language use in online debate forums. Our results provide new evidence in support of theories of linguistic coordination, underlining the importance of formality for language generation systems.
This work was done with Ellie Pavlick (UPenn) during her summer internship at Yahoo Labs.
Please join us for the NLP Seminar on Monday 2/27 at 3:30pm in 202 South Hall. All are welcome!
Speaker: Jayant Krishnamurthy (Allen Institute for AI)
Title: Semantic Parsing to Probabilistic Programs for Situated Question Answering
Situated question answering is the problem of answering questions about an environment such as an image or diagram. This problem is challenging because it requires jointly interpreting a question and an environment using background knowledge to select the correct answer. We present Parsing to Probabilistic Programs, a novel situated question answering model that can use background knowledge and global features of the question/environment interpretation while retaining efficient approximate inference. Our key insight is to treat semantic parses as probabilistic programs that execute nondeterministically and whose possible executions represent environmental uncertainty. We evaluate our approach on a new, publicly-released data set of 5000 science diagram questions, outperforming several competitive classical and neural baselines.
Please join us for the NLP Seminar on Monday 2/13 at 3:30pm in 202 South Hall. All are welcome!
Speaker: Stephan Meylan (UC Berkeley)
Title: Word forms are optimized for efficient communication
The inverse relationship between word length and use frequency, first identified by G.K. Zipf in 1935, is a classic empirical law that holds across a wide range of human languages. We demonstrate that length is one aspect of a much more general property of words: how distinctive they are with respect to other words in a language. Distinctiveness plays a critical role in recognizing words in fluent speech, in that it reflects the strength of potential competitors when selecting the best candidate for an ambiguous signal. Phonological information content, a measure of a word’s probability under a statistical model of a language’s sound or character sequences, concisely captures distinctiveness. Examining large-scale corpora from 13 languages, we find that distinctiveness significantly outperforms word length as a predictor of frequency. This finding provides evidence that listeners’ processing constraints shape fine-grained aspects of word forms across languages.
The NLP Seminar is back for Spring 2017! We will retain our meeting time of Mondays from 3:30-4:30pm, in the same location, room 202 South Hall.
Here is the speaker list:
Feb 13: Stephen Meylan, UC Berkeley
Feb 27: Jayant Krishnamurthy, Allen Institute for AI
March 6: Joel Tetreault, Grammarly
April 10: Danqi Chen, Stanford
April 24: Marta Recasens, Google
May 1: Pramod Viswanath, U Illinois
For up to the minute notifications, join the email list (UC Berkeley community only).
Please join us for the NLP Seminar on Monday 11/14 at 3:30pm in 202 South Hall. All are welcome!
Speaker: David Jurgens (Stanford)
Title: Citation Classification for Behavioral Analysis of a Scientific Field
Citations are an important indicator of the state of a scientific field, reflecting how authors frame their work, and influencing uptake by future scholars. However, our understanding of citation behavior has been limited to small-scale manual citation analysis. We perform the largest behavioral study of citations to date, analyzing how citations are both framed and taken up by scholars in one entire field: natural language processing. We introduce a new dataset of nearly 2,000 citations annotated for function and centrality, and use it to develop a state-of-the-art classifier and label the entire ACL Reference Corpus. We then study how citations are framed by authors and use both papers and online traces to track how citations are followed by readers. We demonstrate that authors are sensitive to discourse structure and publication venue when citing, that online readers follow temporal links to previous and future work rather than methodological links, and that how a paper cites related work is predictive of its citation count. Finally, we use changes in citation roles to show that the field of NLP is undergoing a significant increase in consensus.
Please join us for the NLP Seminar this Monday, Oct 31 at 3:30pm in 202 South Hall. All are welcome!
Speaker: Jiwei Li (Stanford)
Title: Teaching Machines to Converse
Recent neural networks models present both new opportunities and new challenges for developing conversational agents. In this talk, I will describe how we have advanced this line of research by addressing four different issues in neural dialogue generation: (1) overcoming the overwhelming prevalence of dull responses (e.g., “I don’t know”) generated from neural models; (2) enforcing speaker consistency; (3) applying reinforcement learning to foster sustained dialogue interactions. (4) How to teach a bot to interact with users and ask questions about things that he does not know.
Please join us for the NLP Seminar this Monday (Oct 17) at 3:30pm in 202 South Hall.
Speaker: Jacob Andreas (Berkeley)
Title: Reasoning about pragmatics with neural listeners and speakers
We present a model for contrastively describing scenes, in which context-specific behavior results from a combination of inference-driven pragmatics and learned semantics. Like previous learned approaches to language generation, our model uses a simple feature-driven architecture (here a pair of neural “listener” and “speaker” models) to ground language in the world. Like inference-driven approaches to pragmatics, our model actively reasons about listener behavior when selecting utterances. For training, our approach requires only ordinary captions, annotated without demonstration of the pragmatic behavior the model ultimately exhibits. In human evaluations on a referring expression game, our approach succeeds 81% of the time, compared to 69% using existing techniques.
Please join us for the NLP Seminar this Monday (Oct 3) at 3:30pm in 202 South Hall.
Speaker: Sida Wang (Stanford U)
Title: Interactive Language Learning
We introduce 2 parts of the interactive language learning setting. The first is learning from scratch and the second is learning from a community of goal-oriented language users, which is relevant to building adaptive natural language interfaces. The first part is inspired by Wittgenstein’s language games: a human wishes to accomplish some task (e.g., achieving a certain configuration of blocks), but can only communicate with a computer, who performs the actual actions (e.g., removing all red blocks). The computer initially knows nothing about language and therefore must learn it from scratch through interaction, while the human adapts to the computer’s capabilities. We created a game in a blocks world and collected interactions from 100 people playing it.
In the second part (about ongoing work), we explore the setting where a language is supported by a community of people (instead of private to each individual), and the computer has to learn from the aggregate knowledge of a community of goal-oriented language users. We explore how to use additional supervision such as definitions and demonstration
Please join us for the NLP Seminar this Monday (Sept. 19) at 3:30pm in 202 South Hall.
(This is a rescheduling of a talk that was postponed from last semester.)
Speaker: Percy Liang (Stanford)
Title: Learning from Zero
Can we learn if we start with zero examples, either labeled or unlabeled? This scenario arises in new user-facing systems (such as virtual assistants for new domains), where inputs should come from users, but no users exist until we have a working system, which depends on having training data. I discuss recent work that circumvent this circular dependence by interleaving user interaction and learning.
Please join us for the NLP Seminar on Monday Sept 12 at 3:30pm in 202 South Hall. All are welcome! (This semester we are posting preparatory readings, for those who are interested. Note also the room change to room 202 South Hall.)
Title: Natural Language Inference in the Real World
It is impossible to reason about natural language by memorizing all possible sentences. Rather, we rely on models of composition which allow the meanings of individual words to be combined to produce meanings of longer phrases. In natural language processing, we often employ models of composition that work well for carefully-curated datasets or toy examples, but prove to be very brittle when applied to the type of language that humans actually use.
This talk will discuss our work on applying natural language inference in the “real world.” I will describe observations from experimental studies of humans’ linguistic inferences, and describe the challenges they present for existing methods of automated natural language inference (with particular focus to the case of adjective-noun composition). I will also outline our current work on extending models of compositional entailment in order to better handle the types of imprecise inferences we observe in human language.
Slides: (pdf of slides)