Tom McCoy will be giving a virtual talk on Friday, October 16th from 11am — 12pm. Zoom information will be distributed via the Berkeley NLP Seminar listserv.
Title: Analyzing the Syntactic Inductive Biases of Sequence-to-Sequence Networks
Abstract: Current NLP models reach human-level performance on many benchmarks but are not very human-like. They require far more training data than humans, and they generalize much less robustly than humans. Both problems have the same cause: our models have poor inductive biases, which are the factors that guide learning and generalization.
In this talk, I will discuss one inductive bias that is especially relevant for language, namely a bias for making generalizations based on hierarchical structure rather than linear order. I analyze this bias by training sequence-to-sequence models on two syntactic tasks. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. At test time, by evaluating on examples that disambiguate the two possible generalizations, we can see whether each model has a hierarchical bias. Using this methodology, I will show that a wide array of factors can qualitatively affect a model’s inductive biases, often in surprising ways. For example, adding parse information to the input fails to impart a hierarchical bias. The only factor that consistently contributes a hierarchical bias is the use of a tree-structured model, suggesting that human-like syntactic generalization requires architectural syntactic structure. I will close by discussing the implications for a longstanding debate in linguistics (the poverty-of-the-stimulus debate) about which innate biases guide human language acquisition.
Bio: Tom McCoy is a PhD student in the Department of Cognitive Science at Johns Hopkins University, advised by Tal Linzen and Paul Smolensky. He studies the linguistic abilities of neural networks and humans, focusing on inductive biases and representations of compositional structure. He also creates computational linguistic puzzles for NACLO, a contest that introduces high school students to linguistics.