CCC Colloquium: Mark Dredze (February 17, 2010)
From University of Maryland Cloud Computing Center
Challenges of Cloud Scale Natural Language Processing
Mark Dredze (Johns Hopkins University)
Wednesday, February 17, 2010
10am, Hornbake 2119
Talk slides: PDF (7.36 MB)
The information revolution has produced huge quantities of knowledge in natural human language. Every day, more information sources move to the cloud, leading to large scale knowledge collections and tremendous potential for natural language learning. Recent focus on statistical methods has produced numerous high quality tools for processing language, including knowledge extraction, information organization and automated language translation. With more data and better statistical methods, the state of the art advances. However, the confluence of so much diverse data poses significant challenges to common statistical methods, which are often ill suited for large scale learning on diverse genres, topic domains and language dialects.
This talk will present techniques aimed at processing large and diverse natural language collections. I will present Confidence Weighted Learning, a streaming machine learning algorithm designed for the types of data distributions common in language tasks. I will show how Confidence Weighted Learning addresses domain change, in which we seek to apply a statistical system to diverse topic domains. These methods improve learning in natural language processing tasks and can be applied to confront the challenges associated with learning in diverse language settings.
About the Speaker
Mark Dredze is as an Assistant Research Professor in the department of Computer Science and a Senior Research Scientist at the Human Language Technology Center of Excellence at The Johns Hopkins University. His research interests include machine learning, natural language processing and intelligent user interfaces. His focus is on novel applications of machine learning to solve language processing challenges as well as applications of machine learning and natural language processing to support intelligent user interfaces for information management. He earned his PhD from the University of Pennsylvania and has worked at Google, IBM and Microsoft.
This talk is open to the public and will take place in the Hornbake Building, South Wing, at the University of Maryland, College Park. Directions to campus can be found here and campus maps can be found here.