Actions

Events

Computational Linguistics and Information Processing

Revision as of 21:52, 23 October 2012 by Jimmylin (talk | contribs)

The CLIP Colloquium is a weekly speaker series organized and hosted by CLIP Lab. The talks are open to everyone. Most talks are held at 11AM in AV Williams 3258 unless otherwise noted. Typically, external speakers have slots for one-on-one meetings with Maryland researchers before and after the talks; contact the host if you'd like to have a meeting.

If you would like to get on the cl-colloquium@umiacs.umd.edu list or for other questions about the colloquium series, e-mail Jimmy Lin, the current organizer.


{{#widget:Google Calendar |id=lqah25nfftkqi2msv25trab8pk@group.calendar.google.com |color=B1440E |title=Upcoming Talks |view=AGENDA |height=300 }}

10/23/2012: Bootstrapping via Graph Propagation

Speaker: Anoop Sarkar, Simon Fraser University
Time: Tuesday, October 23, 2012, 2:00 PM
Venue: AVW 4172

Note special time and place!!!

In natural language processing, the bootstrapping algorithm introduced by David Yarowsky (15 years ago) is a discriminative unsupervised learning algorithm that uses some seed rules to bootstrap a classifier (this is the ordinary sense of bootstrapping which is distinct from the Bootstrap in statistics). The Yarowsky algorithm works remarkably well on a wide variety of NLP classification tasks such as distinguishing between word senses and deciding if a noun phrase is an organization, location, or person.

Extending previous attempts at providing an objective function optimization view of Yarowsky, we show that bootstrapping a classifier from a small set of seed rules can be viewed as the propagation of labels between examples via features shared between them. This paper introduces a novel variant of the Yarowsky algorithm based on this view. It is a bootstrapping learning method which uses a graph propagation algorithm with a well defined per-iteration objective function that incorporates the cautious behaviour of the original Yarowsky algorithm.

The experimental results show that our proposed bootstrapping algorithm achieves state of the art performance or better on several different natural language data sets, outperforming other unsupervised methods such as the EM algorithm. We show that cautious learning is an important principle in unsupervised learning, however we do not understand it well, and we show that the Yarowsky algorithm can outperform or match co-training without any reliance on multiple views.

About the Speaker: Anoop Sarkar is an Associate Professor at Simon Fraser University in British Columbia, Canada where he co-directs the Natural Language Laboratory. He received his Ph.D. from the Department of Computer and Information Sciences at the University of Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised statistical parsing using tree-adjoining grammars.

His research is focused on statistical parsing and machine translation (exploiting syntax or morphology, semi-supervised learning, and domain adaptation). His interests also include formal language theory and stochastic grammars, in particular tree automata and tree-adjoining grammars.

10/24/2012: Recent Advances in Open Information Extraction

Speaker: Mausam, University of Washington
Time: Wednesday, October 24, 2012, 11:00 AM
Venue: AVW 3258

Open Information Extraction is an attractive paradigm for extracting large amounts of relational facts from natural language text in a domain-independent manner. In this talk I describe our recent progress using this model, including our latest open extractors, ReVerb and OLLIE, which substantially improve on the previous state of the art. I will end with our ongoing work that uses open extractions for various end tasks, including multi-document summarization and unsupervised event extraction.

About the Speaker: Mausam is a Research Assistant Professor at the Turing Center in the Department of Computer Science at the University of Washington, Seattle. His research interests span various sub-fields of artificial intelligence, including sequential decision making under uncertainty, large scale natural language processing, and AI applications to crowd-sourcing. Mausam obtained a PhD from University of Washington in 2007 and a Bachelor of Technology from IIT Delhi in 2001.

10/31/2012: Kilian Weinberger

11/07/2012: Using Syntactic Head Information in Hierarchical Phrase-Based Translation

Speaker: Junhui Li
Time: Wednesday, November 7, 2012, 11:00 AM
Venue: AVW 3258

The traditional hierarchical phrase-based (HPB) model is prone to overgeneration due to lack of linguistic knowledge: the grammar may suggest more derivations than appropriate, many of which may lead to ungrammatical translations. On the other hand, limitations of glue grammar rules in HPB model may actually prevent systems from considering some reasonable derivations. This talk presents a simple but effective translation model, called the Head-Driven HPB (HD-HPB) model, which incorporates head information in translation rules to better capture syntax-driven information in a derivation. In addition, unlike the original glue rules, the HD-HPB model allows improved reordering between any two neighboring non-terminals to explore a larger reordering search space. In experiments, we examined different head label sets to refine non-terminal X, including part-of-speech (POS) tags, coarsed POS tags, dependency labels.

About the Speaker: Junhui Li joined CLIP lab as a post-doc researcher from Aug 2012. He was previously a post-doc researcher in the Centre for Next Generation Localisation (CNGL), at Dublin City University from Feb 2011 to Jul 2012. Before that, he was a student at NLP Lab of Soochow University, China.

Previous Talks