Month: September 2020

Chris Kennedy will be giving a virtual talk on Friday, September 25th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv. [Update: The preprint for this talk is now available.]

Title: Constructing Interval Variables via Faceted Rasch Measurement and Multitask Deep Learning: A Hate Speech Application

Abstract: We propose a general method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT). We decompose the target construct, hate speech, into multiple constituent components that are labeled as ordinal survey items by human reviewers. Those survey responses are transformed via an IRT nonlinear activation into a debiased, continuous outcome measure. Our method estimates the survey interpretation bias of the human labelers and eliminates that influence on the generated continuous measure. We further estimate the response quality of individual labelers using faceted IRT, allowing low-quality labels to be removed or down-weighted.

Our faceted Rasch scaling procedure integrates naturally with a multitask, weight-sharing deep learning architecture for automated prediction on new data. The ratings on the theorized components of the target outcome are used as supervised, ordinal latent variables for the neural networks’ internal concept learning, improving adversarial robustness and promoting generalizability. We test the use of a neural activation function (ordinal softmax) and loss function (ordinal cross-entropy) designed to exploit the structure of ordinal outcome variables. Our multitask architecture leads to a new form of model interpretability because each continuous prediction can be directly explained by the constituent components in the penultimate layer.

We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 10,000 United States-based Amazon Mechanical Turk workers to measure a continuous spectrum from hate speech to counterspeech. We evaluate Universal Sentence Encoders, BERT, and RoBERTa as language representation models for the comment text, and compare our predictive accuracy to Google Jigsaw’s Perspective API models, showing significant improvement over this standard benchmark.

BioChris Kennedy is a postdoctoral fellow in biomedical informatics at Harvard Medical School; he received his PhD in biostatistics from UC Berkeley in 2020. His research interests include targeted causal inference and medical AI. Chris chaired the TextXD: Text Analysis Across Domains conference in 2019 & 2018 as a Berkeley Institute for Data Science fellow. He remains a research associate of D-Lab, the Integrative Cancer Research Group, and Kaiser Permanente’s Division of Research. In 2018 he led data science for the election campaigns of Governor Gavin Newsom and Congresswoman Katie Porter.

Abigail See will be giving a virtual talk on Friday, September 18th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.

Title: Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations

Abstract: In this talk I will present Chirpy Cardinal, an open-domain dialogue agent built by the Stanford NLP team in 2019 Alexa Prize competition. Building an open-domain socialbot that talks to real people is challenging – such a system must meet multiple user expectations such as broad world knowledge, conversational style, and emotional connection. Our socialbot engages users on their terms – prioritizing their interests, feelings and autonomy. As a result, our socialbot provides a responsive, personalized user experience, capable of talking knowledgeably about a wide variety of topics, as well as chatting empathetically about ordinary life. Neural generation plays a key role in achieving these goals, providing the backbone for our conversational and emotional tone. Chirpy Cardinal ultimately won 2nd place in the competition final, with a 3.6/5.0 average customer rating.

Bio: Abigail See is a PhD student in the Stanford Natural Language Processing group, advised by Professor Christopher Manning. Her research focuses on improving the controllability, interpretablility and coherence of neural NLG in open-ended settings such as story generation and chitchat dialogue. At Stanford, she has been the co-instructor of CS224n: Natural Language Processing with Deep Learning, and the organizer of AI Salon, a discussion forum for AI. Twitter: @abigail_e_see.

The Berkeley NLP Seminar will be a virtual event for the Fall 2020 semester, hosted via Zoom on Fridays at 11am. Currently, we have four talks scheduled:

September 18th: Abigail See. Stanford University. 11am – 12pm (PDT).

September 25th: Chris Kennedy. Harvard Medical School. 11am – 12pm (PDT).

October 16th: Tom McCoy. Johns Hopkins University. 11am-12pm (PDT).

November 13th: Naomi Saphra, University of Ediburgh. 11am-12pm (PST).

If you are interested in joining our mailing list or giving a talk, please contact