Jeff Wu from OpenAI will be giving a talk at the Berkeley NLP seminar.
Time: Oct 21 from 11am-12pm PST
Location: South Hall 210
Title: Training models to critique themselves
Abstract: We study the setting of large language models critiquing themselves in natural language. We find that:
- Critiques help humans find flaws in summaries that they would have otherwise missed.
- Larger models write more helpful critiques, and on most tasks are better at self-critiquing.
- Larger models can use their own self-critiques, refining their own summaries into better ones.
- We suggest methodology for and find evidence that our models’ critiques may not be able to surface all its relevant knowledge of flaws.
Bio: Jeff Wu is a research engineer at OpenAI working on language modeling (e.g. GPT-2) and alignment (InstructGPT).
Alex Tamkin will be giving a hybrid talk at the NLP Seminar on Friday, Oct 14 from 11am-12pm PST. This talk will be held in person in South Hall 210.
Title: Self-Supervised Learning for the Real World
Abstract: Spearheaded by advances in NLP, machine learning is undergoing a transformative shift towards large, generalist models trained with self-supervised learning (SSL). In this talk, I’ll discuss two challenges lying ahead for this paradigm, as well as some paths towards surmounting them. First, I’ll discuss the problem of task ambiguity. While the space of tasks that models can perform is expanding rapidly, the number of bits (e.g. examples) used to specify the task is shrinking. Given these two opposing forces, how do we ensure that models learn the tasks we intend? I’ll discuss how we can measure the effects of such task ambiguity on humans and language models, as well as work showing how two-way interaction between users and large models can make strides on this problem in NLP and computer vision. Second, I’ll discuss the challenge of domain-agnostic SSL, necessary for realizing the benefits of SSL in high-impact settings such as healthcare, the sciences, and engineering. I’ll present DABS, a novel kind of Domain-Agnostic Benchmark for SSL algorithms, covering data from 12 different fields (e.g. text, genomics, wearable sensors, and particle physics). With DABS, we develop and present the first SSL methods which succeed on such a broad range of modalities.
– Active Learning Helps Pretrained Models Learn the Intended Task
– DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning
– DABS 2.0: Improved Datasets and Algorithms for Universal Self Supervision
– Viewmaker Networks: Learning Views for Unsupervised Representation Learning
Bio: Alex Tamkin is a fifth-year PhD student in Computer Science at Stanford, advised by Noah Goodman and part of the Stanford NLP Group. His research focuses on self-supervised learning, especially in multimodal and domain-general settings. He is a recipient of the Open Philanthropy AI Fellowship.
Arya McCarthy gave a hybrid talk at the NLP Seminar on Friday, Sep 30 from 11am-12pm PST. This talk was held in person in South Hall 202.
Title: Kilolanguage Learning, Projection, and Translation
Abstract: The breadth of information digitized in the world’s languages gives opportunities for linguistic insights and computational tools with pan-lingual perspective. We can achieve this by projecting lexical information across language, either at the type or token level. First, we project information between thousands of languages at the type level to investigate the classic color word hypotheses of Berlin and Kay. Applying fourteen computational linguistic measures of color word basicness/secondariness, we find cross-linguistic credence and shed additional nuance. Second, we project information between thousands of languages at the token level to create fine-grained morphological analyzers and generators. We show applications to pronoun clusivity and multilingual MT. Finally, we produce morphological tools grounded in UniMorph that improve on strong initial models and generalize across languages.
Bio: Arya McCarthy is a Ph.D. candidate at Johns Hopkins University, working on massively multilingual natural language processing. He is advised by David Yarowsky in the Center for Language and Speech Processing; his work is funded by DARPA LORELEI, the International Olympic Committee, and the American Political Science Association (APSA). His work focuses on improving translation and computational modeling of rare languages. Primarily, he approaches this through weakly supervised natural language processing at the scale of 1000s of languages. Previously, Arya has spent time at Google, Duolingo, Facebook, Harvard University, and the University of Edinburgh. Arya is the PI for an APSA grant geared toward better integrating computational and social sciences. In this effort, he is partnering with Tom Lippincott, Kathy McKeown, David Mimno, Philip Resnik, and Noah Smith.
Welcome back to campus!
Seminars are occasionally on Fridays at 11 am – 12 pm in South Hall Room 210. Throughout the semester, we will update this schedule as we invite additional speakers.
Here is the current schedule:
Sept 30: Arya McCarthy, Johns Hopkins University
Oct 14: Alex Tamkin, Stanford University
The Berkeley NLP seminar is organized by a small team of PhD students and faculty at the School of Information and EECS.
Ethan Perez will be giving a hybrid talk at the NLP seminar on Friday, April 29, from 11am-noon PST. This talk will be held in person in South Hall 202, and Zoom information will be distributed via the Berkeley NLP Seminar listserv for those wishing to attend remotely.
Title: Aligning Language Models with Human Preferences
Abstract: Self-supervised learning objectives are highly effective at pretraining language models (LMs) for various tasks. In this talk, we first show that self-supervised objectives are misaligned with human preferences in many, important ways; LMs trained on internet text generate misinformation, offensive jokes, and personal contact information, and are highly sensitive to the conditioning text (“prompt”). Next, we show that LM-based classifiers are effective at predicting which texts humans prefer. As a result, it is possible to use such classifiers as a learning signal to automatically correct the LM. We showcase this approach to train a high-quality retrieval system, obtaining strong performance across a variety of tasks using Retrieval-Augmented Generation (RAG). Even after such training schemes, some undesirable behaviors may remain undetected during training. We thus go a step further and generate inputs that elicit undesirable behaviors from the LM using other LMs, to preemptively catch and fix such behaviors. Overall, we find that some of the most powerful tools for aligning LMs with human preferences are LMs themselves.
Bio: Ethan Perez is a fourth year Ph.D. student in Natural Language Processing at New York University. He is advised by Kyunghyun Cho and Douwe Kiela and funded by NSF and Open Philanthropy. His research aims to develop learning algorithms that overcome human shortcomings, such as social biases, cognitive biases, and misconceptions. Previously, he has spent time at DeepMind, Facebook AI Research, Montreal Institute for Learning Algorithms, and Google. He earned a Bachelor’s from Rice University as the Engineering department’s Outstanding Senior.
Katie Stasaski will be giving a hybrid talk on Friday, May 6, from 11am-noon PST. This talk will be held in person in South Hall 202, and Zoom information will be distributed via the Berkeley NLP Seminar listserv for those wishing to attend remotely.
Title: Diversity in Dialogue Generation
Abstract: Conversational dialogue models struggle to produce diverse results, often over-producing typical utterances like “Yes” or “I don’t know.” This dissertation analyzes the diversity problem and proposes ways to improve dialogue agents in both the single- and multi-response setting. In the single-response setting, I propose a novel dataset collection algorithm which uses dynamically-computed corpus statistics to determine which crowdworkers to collect more data from. This process results in significantly more diverse datasets and improves the diversity of downstream dialogue agents trained on the more diverse corpora.
In the multi-response setting, I propose a new way of measuring semantic diversity using a natural language inference model, which is highly correlated with human judgments of diversity. I also propose a decoding procedure which iteratively improves the diversity of a set of model responses, achieving higher diversity with minimal loss in relevancy. Finally, I examine the extent which speech acts constrain diversity of human-generated dialogue responses. I propose a new task in which creative writers rate the extent a conversation inspires the creation of multiple diverse responses, finding that judgments align with speech act hypotheses.
will be giving a virtual talk on Friday, April 8, from 11am-noon PST. Zoom information will be distributed via the Berkeley NLP Seminar listserv for those wishing to attend remotely.
Title: Robustifying NLP with Humans in the Loop
Abstract: Most machine learning methods address prediction problems under restrictive assumptions but when applied to drive decisions in environments where those assumptions are violated. This disconnect between what the methodological framework offers and the desired applications have caused confusion both among researchers (who often lack the right formalism to tackle these problems coherently), practitioners (who have developed a folks tradition of ad hoc practices for deploying and monitoring systems), and regulators (who have applied frameworks designed for biomedical ethics to machine learning). In this talk I’ll discuss some of these issues affecting the application of machine learning and our fledgling efforts to bridge some of these gaps by injecting causal knowledge via humans in the loop, along with some critical disconnects between how humans are employed in ML research to perform various tasks and the regulatory framework around research ethics, and its implications.
Bio: Divyansh Kaushik is a PhD Candidate at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University, and a Science and Technology Policy Fellow at the Federation of American Scientists. He is advised by Dr. Eduard Hovy and Dr. Zachary Lipton and in the Approximately Correct Machine Intelligence (ACMI) Lab. An Amazon Graduate Research Fellow, Divyansh’s interests lie in exploring human-AI interaction. Over the years, his work has been supported by Amazon AI, Pricewaterhouse Coopers, and Facebook AI. He is also the President of CMU’s Graduate Student Assembly and has written on several science policy issues (recently appearing in Forbes, Institute for Progress, Issues in Science and Technology and PublicSource).
Gašper Beguš will be giving a hybrid talk on Friday, April 1, from 11am-noon PST. This talk will be held in person in South Hall 202, and Zoom information will be distributed via the Berkeley NLP Seminar listserv for those wishing to attend remotely.
Title: Cognitive modeling, neural network interpretability, and GANs
Abstract:In this talk, I propose that language can be modeled from raw speech data in a fully unsupervised manner with Generative Adversarial Networks (GANs) and that such modeling has implications both for the understanding of language acquisition and for the understanding of how deep neural networks learn internal representations. I propose a technique that allows us to “wug-test” neural networks trained on raw speech, analyze intermediate convolutional layers, and test a causal relationship between meaningful units in the output and latent/intermediate representations. I further propose an extension of the GAN architecture in which learning of meaningful linguistic units emerges from a requirement that the networks output informative data and includes both the perception and production principles. With this model, we can test what the networks can and cannot learn, how their biases match human learning biases in behavioral experiments, how speech processing in the brain compares to intermediate representations in deep neural networks (by comparing acoustic properties in intermediate convolutional layers and the brainstem), how symbolic-like rule-like computation emerges in internal representations, and what GAN’s innovative outputs can teach us about productivity in human language. This talk also makes a more general case for probing deep neural networks with raw speech data, as dependencies in speech are often better understood than those in the visual domain and because behavioral data on speech (especially the production aspect) are relatively easily accessible.
Bio: Gašper Beguš an Assistant Professor at the Department of Linguistics at UC Berkeley where he directs the Berkeley Speech and Computation Lab. Before coming to Berkeley, he was an Assistant Professor at the University of Washington and before that he graduated with a Ph.D. from Harvard. His research focuses on developing deep learning models for speech data. More specifically, he trains models to learn representations of spoken words from raw audio inputs. He combines machine learning and statistical modeling with neuroimaging and behavioral experiments to better understand how neural networks learn internal representations in speech and how humans learn to speak.
Nasrin Mostafazadeh will be giving a hybrid talk on Friday, March 4, from 11am-noon PST. This talk will be held in person in South Hall 202, and Zoom information will be distributed via the Berkeley NLP Seminar listserv for those wishing to attend remotely.
Title: How far have we come in giving our NLU systems common sense?
Abstract: Commonsense reasoning has been a long-established area in AI for more than three decades. Despite the lack of much ongoing effort in this area after the 80s, in the past few years, there has been a renewed interest in the AI community for giving machines common sense–acknowledging it as the holy grail of AI and one of the bottlenecks in deploying AI systems in the real world. With the tremendous recent progress in natural language understanding (NLU), the lack of commonsense reasoning capabilities of NLU systems is more evident than ever. In this talk, I’ll discuss the amazing recent progress made in tackling commonsense reasoning benchmarks using the pre-trained neural models. I’ll talk about the role of benchmarks in measuring our progress and how we can move the goal post towards constructing coherent mental models of narratives.
Bio: Nasrin is Co-founder of Verneek, a deep-tech startup in NYC (in stealth). Verneek’s mission is to enable anyone to make better and faster decisions anywhere, using intuitive modalities of interaction that are powered through innovative AI technologies. Before Verneek, Nasrin held research positions at AI startups and big tech companies ranging from BenevolentAI to Microsoft Research. She received her PhD at the University of Rochester, working at the conversational interaction and dialogue research group, with her PhD work focused on commonsense reasoning through the lens of narratives. She has started lines of research that push AI toward a deeper understanding of the world, currently being further developed into the core technologies at Verneek. She has been a keynote speaker, chair, organizer, and program committee member at different AI events. Nasrin was named to Forbes’ 30 Under 30 in Science in 2019 for her work in AI.
Dora Demszky will be giving a virtual talk on Friday, January 14, from 11am-noon PST. Zoom information will be distributed via the Berkeley NLP Seminar listserv.
Title: Using Natural Language Processing to Support Equitable and Student-Centered Education
Abstract: Providing consistent, individualized feedback to teachers is essential for improving instruction but can be prohibitively resource-intensive in most educational contexts. I demonstrate ways in which natural language processing (NLP) can be used to address this gap and provide teachers with feedback in a scalable and effective way. As part of a case study, I introduce an automated tool based on NLP that provides teachers with feedback on their uptake of student contributions, a high-leverage teaching practice that supports dialogic instruction and makes students feel heard. This tool is based on our fully automated measure of uptake that we validate extensively by analyzing the linguistic phenomena it captures, such as questioning and elaboration, and by demonstrating its correlation with positive educational outcomes across three datasets of student-teacher interaction. We evaluate the effectiveness of our tool to improve teachers’ uptake of student contributions by conducting a randomized controlled trial in an online computer science course, Code in Place (n=1,136 instructors). We find that the tool improves instructors’ uptake of student contributions by 24% and present suggestive evidence that our tool also improves students’ satisfaction with the course. These results demonstrate the promise of our tool to complement existing efforts in teachers’ professional development.
Bio: Dora is a PhD candidate in Linguistics at Stanford, advised by Dan Jurafsky. She works on developing natural language processing methods to support equitable and student-centered education. Her recent publications focus on analyzing the representation of historically marginalized groups in US history textbooks and on measuring and giving feedback to teachers on their uptake of student contributions in classrooms. Prior to her PhD, Dora received a BA summa cum laude from Princeton University in Linguistics with a minor in Computer Science.