# Difference between revisions of "CLIP Colloquium (Fall 2012)"

### Computational Linguistics and Information Processing

Line 240: | Line 240: | ||

'''About the Speaker:''' Junhui Li joined CLIP lab as a post-doc researcher from Aug 2012. He was previously a post-doc researcher in the Centre for Next Generation Localisation (CNGL), at Dublin City University from Feb 2011 to Jul 2012. Before that, he was a student at NLP Lab of Soochow University, China. | '''About the Speaker:''' Junhui Li joined CLIP lab as a post-doc researcher from Aug 2012. He was previously a post-doc researcher in the Centre for Next Generation Localisation (CNGL), at Dublin City University from Feb 2011 to Jul 2012. Before that, he was a student at NLP Lab of Soochow University, China. | ||

+ | |||

+ | == 11/28/2012: New Machine Learning Tools for Structured Prediction == | ||

+ | |||

+ | '''Speaker:''' [http://old-site.clsp.jhu.edu/~ves/ Veselin Stoyanov], Johns Hopkins University<br/> | ||

+ | '''Time:''' Wednesday, November 28, 2012, 2012, 11:00 AM<br/> | ||

+ | '''Venue:''' AVW 3258<br/> | ||

+ | |||

+ | I am motivated by structured prediction problems in NLP and social network analysis. Markov Random Fields (MRFs) and other Probabilistic Graphical Models (PGMs) are suitable for representing structured prediction: they can model joint distributions and utilize standard inference procedures. MRFs also provide a principled ways for incorporating background knowledge and combining multiple systems. | ||

+ | |||

+ | Two properties of structured prediction problems make learning challenging. First, structured prediction almost inevitably requires approximation to inference, decoding or model structure. Second, unlike the traditional ML setting that assumes i.i.d. training and test data, structured learning problems often consist of a single example used both for training and prediction. | ||

+ | |||

+ | We address the two issues above. First, we argue that the presence of approximations in MRF-based systems requires a novel perspective on training. Instead of maximizing data likelihood, one should seek the parameters that minimize the empirical risk of the entire imperfect system. We show how to locally optimize this risk using error back-propagation and local optimization. On four NLP problems our approach significantly reduces loss on test data compared to choosing approximate MAP parameters. | ||

+ | |||

+ | Second, we utilize data imputation in the limited data setting. At test time we use sampling to impute data that is a more accurate approximation of the data distribution. We use our risk minimization techniques to train fast discriminative models on the imputed data. This we can: (i) train discriminative models given a single training and test example; (ii) train generative/discriminative hybrids that can incorporate useful priors and learn from semi-supervised data. | ||

+ | |||

+ | '''About the Speaker:''' Veselin Stoyanov is currently a postdoctoral researcher at the Human Language Technology Center of Excellence (HLT-COE) at Johns Hopkins University (JHU). He will be joining Facebook as a Research Scientist starting in January 2013. Previously he spent two years working with Prof. Jason Eisner at JHU's Center for Language and Speech Processing supported by a Computing Innovation Postdoctoral Fellowship. He received the Ph.D. degree from Cornell University under the supervision of Prof. Claire Cardie in 2009 and the Honors B.Sc. from the University of Delaware in 2002. His research interests reside in the intersection of Machine Learning and Computational Linguistics. More precisely, he is interested in using probabilistic models for complex structured problems with applications to knowledge base population, modeling social networks, extracting information from text and coreference resolution. In addition to the CIFellowship, Ves Stoyanov is the recipient of an NSF Graduate Research Fellowship and other academic honors. | ||

+ | |||

+ | == 12/05/2012: Combining Statistical Translation Techniques for Cross-Language Information Retrieval == | ||

+ | |||

+ | '''Speaker:''' [http://www.cs.umd.edu/~fture/Home.html Ferhan Ture], University of Maryland<br/> | ||

+ | '''Time:''' Wednesday, December 5, 2012, 11:00 AM<br/> | ||

+ | '''Venue:''' AVW 3258<br/> | ||

+ | |||

+ | Cross-language information retrieval today is dominated by techniques | ||

+ | that rely principally on context-independent token-to-token mappings | ||

+ | despite the fact that state-of-the-art statistical machine translation | ||

+ | systems now have far richer translation models available in their | ||

+ | internal representations. This paper explores combination-of-evidence | ||

+ | techniques using three types of statistical translation models: | ||

+ | context-independent token translation, token translation using | ||

+ | phrase-dependent contexts, and token translation using | ||

+ | sentence-dependent contexts. Context-independent translation is | ||

+ | performed using statistically-aligned tokens in parallel text, | ||

+ | phrase-dependent translation is performed using aligned statistical | ||

+ | phrases, and sentence-dependent translation is performed using those | ||

+ | same aligned phrases together with an n-gram language model. | ||

+ | Experiments on retrieval of Arabic, Chinese, and French documents | ||

+ | using English queries show that no one technique is optimal for all | ||

+ | queries, but that statistically significant improvements in mean | ||

+ | average precision over strong baselines can be achieved by combining | ||

+ | translation evidence from all three techniques. The optimal | ||

+ | combination is, however, found to be resource-dependent, indicating | ||

+ | a need for future work on robust tuning to the characteristics of | ||

+ | individual collections. | ||

+ | |||

+ | This is a practice talk for COLING 2012. |

## Latest revision as of 19:43, 11 January 2013

## 08/20/2012: TopSig – Signature Files Revisited

**Speaker:** Shlomo Geva, Queensland University of Technology, Australia

**Time:** Monday, August 20, 2012, 11:00 AM

**Venue:** AVW 2120

**Abstract:** Performance comparisons between File Signatures and Inverted
Files for text retrieval have previously shown several
significant shortcomings of file signatures relative to inverted
files. The inverted file approach underpins most state-of-the-art
search engine algorithms, such as Language and Probabilistic
models. It has been widely accepted that traditional
file signatures are inferior alternatives to inverted files. This
paper describes TopSig, a modern approach to the construction
of file signatures - many advances in semantic hashing and
dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature
file based, search engines. This paper introduces a different
signature file approach that builds upon and extends
these recent advances. We are able to demonstrate significant
improvements in the performance of signature file based
indexing and retrieval, performance that is comparable to
that of state of the art inverted file based systems, including
Language models and BM25. These findings suggest that
file signatures offer a viable alternative to inverted files in
suitable settings and position the file signature model in
the class of Vector Space retrieval models.
TopSig is an open-source search engine from QUT and it can be discussed too if there is an interest.

**About the Speaker:** Associate Professor Shlomo Geva is the discipline leader for
Computational Intelligence and Signal Processing in the Computer
Science Department at the Queensland University of Technology in
Brisbane, Australia. His research interests include clustering,
cross-language information retrieval, focused information retrieval,
link discovery, and xml indexing.

**Host:** Doug Oard, oard@umd.edu

## 09/05/2012: 5 Minute Madness (Part I)

Internal 5-minute lightning talks.

## 09/12/2012: 5 Minute Madness (Part II)

Internal 5-minute lightning talks.

## 09/19/2012: CoB: Pairwise Similarity on Large Text Collections with MapReduce

**Speaker:** Earl Wagner, University of Maryland

**Time:** Wednesday, September 19, 2012, 11:00 AM

**Venue:** AVW 3258

Faced with high-volume information streams, intelligence analysts often rely on standing queries to retrieve materials that they need to see. Results of these queries are currently extended by effective and efficient probabilistic techniques that find similar, non-matching content. We discuss research looking further afield to find additional useful documents via MapReduce techniques performing rapid clustering of documents. This approach is intended to provide an improved “peripheral vision” to overcome some blind spots, yielding both immediate utility (detection of documents that otherwise would not have been found) and the potential for improvements to specific standing queries.

**About the Speaker:** Earl J. Wagner is a Postdoctoral Research Associate at the University of Maryland, College Park in the College of Information Studies (Maryland's iSchool). He was previously a Research Assistant at Northwestern University where he earned his Ph.D. in Computer Science.

## 09/26/2012: Better! Faster! Stronger (theorems)! Learning to Balance Accuracy and Efficiency when Predicting Linguistic Structures

**Speaker:** Hal Daume III, University of Maryland

**Time:** Wednesday, September 26, 2012, 11:00 AM

**Venue:** AVW 3258

Viewed abstractly, many classic problems in natural language processing can be cast as trying to map a complex input (eg., a sequence of words) to a complex output (eg., a syntax tree or semantic graph). This task is challenging both because language is ambiguous (learning difficulties) and represented with discrete combinatorial structures (computational difficulties). I will describe my multi-pronged research effort to develope learning algorithms that explicitly learn to trade-off accuracy and efficiency, applied to a variety of language processing phenomena. Moreover, I will show that in some cases, we can actually obtain model that is faster and more accurate by exploiting smarter learning algorithms. And yes, those algorithms come with stronger theoretical guarantees too.

The key insight that makes this possible is a connection between the task of predicting structured objects (what I care about) and imitation learning (a subfield in robotics). This insight came about as a result of my work a few years ago, and has formed the backbone of much of my work since then. These connections have led other NLP and robotics researchers to make their own independent advances using many of these ideas.

At the end of the talk, I'll briefly survey some of my other contributions in the areas of domain adaptation and multilingual modeling, both of which also fall under the general rubric of "what goes wrong when I try to apply off-the-shelf machine learning models to real language processing problems?"

## 10/03/2012: Consistent and Efficient Algorithms for Latent-Variable PCFGs

**Speaker:** Shay Cohen, Columbia University

**Time:** Wednesday, October 3, 2012, 11:00 AM

**Venue:** AVW 3258

In the past few years, there has been an increased interest in the machine learning community in spectral algorithms for estimating models with latent variables. Examples include algorithms for estimating mixture of Gaussians or for estimating the parameters of a hidden Markov model.

The EM algorithm has been the mainstay for estimation with latent variables, but because it is not guaranteed to converge to a global maximum of the likelihood, it is not a consistent estimator. Spectral algorithms, on the other hand, are often shown to be consistent.

In this talk, I am interested in presenting a spectral algorithm for latent-variable PCFGs, a model widely used in the NLP community for parsing. This model, originally introduced by Matsuzaki et al. (2005), augments with a latent state the nonterminals in an underlying PCFG grammar. These latent states re-fine the nonterminal category in order to capture subtle syntactic nuances in the data. This model has been successfully implemented in state-of-the-art parsers such as the Berkeley parser (Petrov et al., 2006).

Our spectral algorithm for latent-variable PCFGs is based on a novel tensor formulation designed for inference with PCFGs. This tensor formulation yields an "observable operator model" for PCFGs which can be readily used for spectral estimation.

The algorithm we developed is considerably faster than EM, and makes only one pass over the data. Statistics are collected from the data in this pass, and singular value decomposition is performed on matrices containing these statistics. Our algorithm is also provably consistent in the sense that, given enough samples, it will estimate probabilities for test trees close to their true probabilities under the latent-variable PCFG model.

If time permits, I will also present a method to improve the efficiency of parsing with latent-variable PCFGs. This method relies on tensor decomposition of the latent-variable PCFG. The tensor decomposition is approximate, and therefore the new parser is an approximate parser as well. Still, the quality of approximation can be guaranteed theoretically by inspecting how errors from the approximation propagate in the parse trees.

## 10/10/2012: Beyond MaltParser - Advances in Transition-Based Dependency Parsing

**Speaker:** Joakim Nivre, Uppsala University / Google

**Time:** Wednesday, October 10, 2012, 11:00 AM

**Venue:** AVW 3258

The transition-based approach to dependency parsing has become popular thanks to its simplicity and efficiency. Systems like MaltParser achieve linear-time parsing with projective dependency trees using locally trained classifiers to predict the next parsing action and greedy best-first search to retrieve the optimal parse tree, assuming that the input sentence has been morphologically disambiguated using a part-of-speech tagger. In this talk, I survey recent developments in transition-based dependency parsing that address some of the limitations of the basic transition-based approach. First, I show how globally trained classifiers and beam search can be used to mitigate error propagation and enable richer feature representations. Secondly, I discuss different methods for extending the coverage to non-projective trees, which are required for linguistic adequacy in many languages.Finally, I present a model for joint tagging and parsing that leads to improvements in both tagging and parsing accuracy as compared to the standard pipeline approach.

**About the Speaker:** Joakim Nivre is Professor of Computational Linguistics at Uppsala
University and currently visiting scientist at Google, New York. He holds a
Ph.D. in General Linguistics from the University of Gothenburg and a Ph.D. in
Computer Science from Växjö University. Joakim's research focuses on data-driven
methods for natural language processing, in particular for syntactic and semantic analysis. He is one of the main developers of the transition-based
approach to syntactic dependency parsing, described in his 2006 book Inductive
Dependency Parsing and implemented in the MaltParser system. Joakim's current
research interests include the analysis of mildly non-projective dependency
structures, the integration of morphological and syntactic processing for richly
inflected languages, and methods for cross-framework parser evaluation. He has
produced over 150 scientific publications, including 3 books, and has given
nearly 70 invited talks at conferences and institutions around the world. He is
the current secretary of the European Chapter of the Association for
Computational Linguistics.

**Host:** Hal Daume III, hal@umd.edu

## 10/23/2012: Bootstrapping via Graph Propagation

**Speaker:** Anoop Sarkar, Simon Fraser University

**Time:** Tuesday, October 23, 2012, 2:00 PM

**Venue:** AVW 4172

**Note special time and place!!!**

In natural language processing, the bootstrapping algorithm introduced by David Yarowsky (15 years ago) is a discriminative unsupervised learning algorithm that uses some seed rules to bootstrap a classifier (this is the ordinary sense of bootstrapping which is distinct from the Bootstrap in statistics). The Yarowsky algorithm works remarkably well on a wide variety of NLP classification tasks such as distinguishing between word senses and deciding if a noun phrase is an organization, location, or person.

Extending previous attempts at providing an objective function optimization view of Yarowsky, we show that bootstrapping a classifier from a small set of seed rules can be viewed as the propagation of labels between examples via features shared between them. This paper introduces a novel variant of the Yarowsky algorithm based on this view. It is a bootstrapping learning method which uses a graph propagation algorithm with a well defined per-iteration objective function that incorporates the cautious behaviour of the original Yarowsky algorithm.

The experimental results show that our proposed bootstrapping algorithm achieves state of the art performance or better on several different natural language data sets, outperforming other unsupervised methods such as the EM algorithm. We show that cautious learning is an important principle in unsupervised learning, however we do not understand it well, and we show that the Yarowsky algorithm can outperform or match co-training without any reliance on multiple views.

**About the Speaker:** Anoop Sarkar is an Associate Professor at Simon Fraser University in
British Columbia, Canada where he co-directs the Natural Language Laboratory. He received his Ph.D. from the
Department of Computer and Information Sciences at the University of
Pennsylvania under Prof. Aravind Joshi for his work on semi-supervised
statistical parsing using tree-adjoining grammars.

His research is focused on statistical parsing and machine translation (exploiting syntax or morphology, semi-supervised learning, and domain adaptation). His interests also include formal language theory and stochastic grammars, in particular tree automata and tree-adjoining grammars.

## 10/24/2012: Recent Advances in Open Information Extraction

**Speaker:** Mausam, University of Washington

**Time:** Wednesday, October 24, 2012, 11:00 AM

**Venue:** AVW 3258

Open Information Extraction is an attractive paradigm for extracting large amounts of relational facts from natural language text in a domain-independent manner. In this talk I describe our recent progress using this model, including our latest open extractors, ReVerb and OLLIE, which substantially improve on the previous state of the art. I will end with our ongoing work that uses open extractions for various end tasks, including multi-document summarization and unsupervised event extraction.

**About the Speaker:** Mausam is a Research Assistant Professor at the Turing Center in the
Department of Computer Science at the University of Washington, Seattle. His
research interests span various sub-fields of artificial intelligence,
including sequential decision making under uncertainty, large scale natural
language processing, and AI applications to crowd-sourcing. Mausam obtained
a PhD from University of Washington in 2007 and a Bachelor of Technology
from IIT Delhi in 2001.

## 11/07/2012: Using Syntactic Head Information in Hierarchical Phrase-Based Translation

**Speaker:** Junhui Li

**Time:** Wednesday, November 7, 2012, 11:00 AM

**Venue:** AVW 3258

The traditional hierarchical phrase-based (HPB) model is prone to overgeneration due to lack of linguistic knowledge: the grammar may suggest more derivations than appropriate, many of which may lead to ungrammatical translations. On the other hand, limitations of glue grammar rules in HPB model may actually prevent systems from considering some reasonable derivations. This talk presents a simple but effective translation model, called the Head-Driven HPB (HD-HPB) model, which incorporates head information in translation rules to better capture syntax-driven information in a derivation. In addition, unlike the original glue rules, the HD-HPB model allows improved reordering between any two neighboring non-terminals to explore a larger reordering search space. In experiments, we examined different head label sets to refine non-terminal X, including part-of-speech (POS) tags, coarsed POS tags, dependency labels.

**About the Speaker:** Junhui Li joined CLIP lab as a post-doc researcher from Aug 2012. He was previously a post-doc researcher in the Centre for Next Generation Localisation (CNGL), at Dublin City University from Feb 2011 to Jul 2012. Before that, he was a student at NLP Lab of Soochow University, China.

## 11/28/2012: New Machine Learning Tools for Structured Prediction

**Speaker:** Veselin Stoyanov, Johns Hopkins University

**Time:** Wednesday, November 28, 2012, 2012, 11:00 AM

**Venue:** AVW 3258

I am motivated by structured prediction problems in NLP and social network analysis. Markov Random Fields (MRFs) and other Probabilistic Graphical Models (PGMs) are suitable for representing structured prediction: they can model joint distributions and utilize standard inference procedures. MRFs also provide a principled ways for incorporating background knowledge and combining multiple systems.

Two properties of structured prediction problems make learning challenging. First, structured prediction almost inevitably requires approximation to inference, decoding or model structure. Second, unlike the traditional ML setting that assumes i.i.d. training and test data, structured learning problems often consist of a single example used both for training and prediction.

We address the two issues above. First, we argue that the presence of approximations in MRF-based systems requires a novel perspective on training. Instead of maximizing data likelihood, one should seek the parameters that minimize the empirical risk of the entire imperfect system. We show how to locally optimize this risk using error back-propagation and local optimization. On four NLP problems our approach significantly reduces loss on test data compared to choosing approximate MAP parameters.

Second, we utilize data imputation in the limited data setting. At test time we use sampling to impute data that is a more accurate approximation of the data distribution. We use our risk minimization techniques to train fast discriminative models on the imputed data. This we can: (i) train discriminative models given a single training and test example; (ii) train generative/discriminative hybrids that can incorporate useful priors and learn from semi-supervised data.

**About the Speaker:** Veselin Stoyanov is currently a postdoctoral researcher at the Human Language Technology Center of Excellence (HLT-COE) at Johns Hopkins University (JHU). He will be joining Facebook as a Research Scientist starting in January 2013. Previously he spent two years working with Prof. Jason Eisner at JHU's Center for Language and Speech Processing supported by a Computing Innovation Postdoctoral Fellowship. He received the Ph.D. degree from Cornell University under the supervision of Prof. Claire Cardie in 2009 and the Honors B.Sc. from the University of Delaware in 2002. His research interests reside in the intersection of Machine Learning and Computational Linguistics. More precisely, he is interested in using probabilistic models for complex structured problems with applications to knowledge base population, modeling social networks, extracting information from text and coreference resolution. In addition to the CIFellowship, Ves Stoyanov is the recipient of an NSF Graduate Research Fellowship and other academic honors.

## 12/05/2012: Combining Statistical Translation Techniques for Cross-Language Information Retrieval

**Speaker:** Ferhan Ture, University of Maryland

**Time:** Wednesday, December 5, 2012, 11:00 AM

**Venue:** AVW 3258

Cross-language information retrieval today is dominated by techniques that rely principally on context-independent token-to-token mappings despite the fact that state-of-the-art statistical machine translation systems now have far richer translation models available in their internal representations. This paper explores combination-of-evidence techniques using three types of statistical translation models: context-independent token translation, token translation using phrase-dependent contexts, and token translation using sentence-dependent contexts. Context-independent translation is performed using statistically-aligned tokens in parallel text, phrase-dependent translation is performed using aligned statistical phrases, and sentence-dependent translation is performed using those same aligned phrases together with an n-gram language model. Experiments on retrieval of Arabic, Chinese, and French documents using English queries show that no one technique is optimal for all queries, but that statistically significant improvements in mean average precision over strong baselines can be achieved by combining translation evidence from all three techniques. The optimal combination is, however, found to be resource-dependent, indicating a need for future work on robust tuning to the characteristics of individual collections.

This is a practice talk for COLING 2012.