Jeff Wu from OpenAI will be giving a talk at the Berkeley NLP seminar.
Time: Oct 21 from 11am-12pm PST
Location: South Hall 210
Title: Training models to critique themselves
Abstract: We study the setting of large language models critiquing themselves in natural language. We find that:
- Critiques help humans find flaws in summaries that they would have otherwise missed.
- Larger models write more helpful critiques, and on most tasks are better at self-critiquing.
- Larger models can use their own self-critiques, refining their own summaries into better ones.
- We suggest methodology for and find evidence that our models’ critiques may not be able to surface all its relevant knowledge of flaws.
Bio: Jeff Wu is a research engineer at OpenAI working on language modeling (e.g. GPT-2) and alignment (InstructGPT).