Please join us for the NLP Seminar this Thursday (April 21) at 4pm in 205 South Hall. All are welcome!

Speaker: Dan Gillick (Google)

Title: Multilingual language processing from bytes

Abstract:

I’ll describe my recent work on standard language processing tasks like part-of-speech tagging and named entity recognition where I replace the traditional pipeline of models with a recurrent neural network. In particular, the model reads one byte at a time (it doesn’t know anything about tokens or sentences) and produces output over byte spans. This allows for very compact, multilingual models that improve over models trained on a single language. I’ll show lots of results and we can discuss the merits and problems with this approach.