CLIP Colloquium (Fall 2011)
Computational Linguistics and Information Processing
Revision as of 22:07, 17 August 2012 by Jimmylin (Created page with ' == Fall 2011 Speakers == === Sept 7, 14: 5-Minute Madness (AVW 2460) === See what everybody has been working on and get to know who's in the lab! === Sept 21: Youngjoong Ko, ...')
Fall 2011 Speakers
Sept 7, 14: 5-Minute Madness (AVW 2460)
See what everybody has been working on and get to know who's in the lab!
Sept 21: Youngjoong Ko, Comparison Mining (AVW 2460)
Almost every day, people are faced with a situation that they must decide upon one thing or the other. To make better decisions, they probably attempt to compare entities that they are interested in. These days, many web search engines are helping people look for their interesting entities. It is clear that getting information from a large amount of web data retrieved by the search engines is a much better and easier way than traditional survey methods. However, it is also clear that directly reading each document is not a perfect solution. If people only have access to a small amount of data, they may get a biased point of view. On the other hand, investigating large amounts of data is a time-consuming job. Therefore, a comparison mining system, which can automatically provide a summary of comparisons between two (or more) entities from a large quantity of web documents, would be very useful in many areas such as marketing.
In this talk, I will describe how to build a Korean comparison mining system. Our work is composed of two consecutive tasks: 1) classifying comparative sentences into different types, and 2) mining comparative entities and predicates. We performed various experiments to find relevant features and learning techniques. As a result, we achieved outstanding performance enough for practical use.
Bio: Youngjoong Ko is an associate professor of Computer Engineering at Dong-A University in Korea. He received his PhD in 2003 at Sogang University. His research focuses on text mining (opinion mining, text classification/summarization), Information Retrieval, Dialogue System (speech-act analysis, dialogue modeling). He is currently at CLIP laboratory of UMIACS in University of Maryland as a visiting scholar. Homepage: http://web.donga.ac.kr/yjko/
Sept 28: No Speaker (Rosh Hashana)
Abstract: In this talk, I will describe the research we are undertaking at the Naval Research Laboratory which revolves around chat (such as Internet Relay Chat) and the problems it causes in the military domain. Chat has become a primary means for command and control communications in the US Navy. Unfortunately, its popularity has contributed to the classic problem of information overload. For example, Navy watchstanders monitor multiple chat rooms while simultaneously performing their other monitoring duties (e.g., tactical situation screens and radio communications). Some researchers have proposed how automated techniques can help to alleviate these problems, but very little research has addressed this problem.
I will give an overview of the three primary tasks that are the current focus of our research. The first is urgency detection, which involves detecting important chat messages within a dynamic chat stream. The second is summarization, which involves summarizing chat conversations and temporally summarizing sets of chat messages. The third is human-subject studies, which involves simulating a watchstander environment and testing whether our urgency detection and summarization ideas, along with 3D-audio cueing, can aid a watchstander in conducting their duties.
Short Bio: David Uthus is a National Research Council Postdoctoral Fellow hosted at the Naval Research Laboratory, where he is currently undertaking research focusing on analyzing multiparticipant chat. He received his PhD (2010) and MSc (2006) from the University of Auckland in New Zealand and his BSc (2004) from the University of California, Davis. His research interests include microtext analysis, machine learning, metaheuristics, heuristic search, and sport scheduling.
Oct 12: AISTATS Paper Clinic (CLIP Lab)
Oct 13: Nate Chambers, Learning General Events for Specific Event Extraction (AVW 3258)
Abstract: There is a wealth of knowledge about the world encoded in written text. How much of this knowledge and in what form is it accessible by today's unsupervised learning systems? There are two primary views that most systems take on interpreting documents: (1) the document primarily describes specific facts, and (2) the document describes general knowledge about "how the world works" through specific descriptions. These two views are largely separated into two subfields within NLP: Information Extraction, and Knowledge Representations/Induction. Information Extraction is mostly concerned with extracting atomic factoids about the world (e.g., Andrew Luck threw three touchdown passes). Knowledge Induction seeks generalized inferences about the world (e.g., Quarterbacks throw footballs). Although the two operate on similar datasets, most systems focus on only one of the two tasks. This talk will describe my efforts over the past few years to merge the goals of both views, performing unsupervised knowledge induction and information extraction in tandem. I describe a model of event schemas that represents common events and their participants (Knowledge Induction), as well as an algorithm that applies this model to extract specific instances of events from newspaper articles (Information Extraction). I will describe my unique learning approach that relies on coreference resolution to learn event schemas, and then will present the first work that performs template-based IE without labeled datasets or prior knowledge.
If time allows, I will also briefly describe my interests in event ordering and temporal reasoning.
Bio: Nate Chambers is an Assistant Professor in Computer Science at the US Naval Academy. He recently graduated with his Ph.D. in CS from Stanford University. His research interests focus on Natural Language Understanding and Knowledge Acquisition from large amounts of text with minimal human supervision. Before attending Stanford, he worked as a Research Associate at the Florida Institute for Human and Machine Cognition, focusing on human-computer interfaces, dialogue systems, and knowledge representation. He received his M.S. in Computer Science from the University of Rochester in 2003, and has published over 20 peer-reviewed articles.
Oct 17: Michael Collins
There has been a long history in combinatorial optimization of methods that exploit structure in complex problems, using methods such as dual decomposition or Lagrangian relaxation. These methods leverage the observation that complex inference problems can often be decomposed into efficiently solvable sub-problems. Thus far, however, these methods are not widely used in NLP.
In this talk I'll describe recent work on inference algorithms for NLP based on Lagrangian relaxation. In the first part of the talk I'll describe work on non-projective parsing. In the second part of the talk I'll describe an exact decoding algorithm for syntax-based statistical translation. If time permits, I'll also briefly describe algorithms for dynamic programming intersections (e.g., the intersection of a PCFG and an HMM), and for phrase-based translation.
For all of the problems that we consider, the resulting algorithms produce exact solutions, with certificates of optimality, on the vast majority of examples; the algorithms are efficient for problems that are either NP-hard (as is the case for non-projective parsing, or for phrase-based translation), or for problems that are solvable in polynomial time using dynamic programming, but where the traditional exact algorithms are far too expensive to be practical.
While the focus of this talk is on NLP problems, there are close connections to inference methods, in particular belief propagation, for graphical models. Our work was inspired by recent work that has used dual decomposition as an alternative to belief propagation in Markov random fields.
This is joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Sasha Rush, and David Sontag.
Bio: Michael Collins is the Vikram S. Pandit Professor of Computer Science at Columbia University. His research interests are in natural language processing and machine learning. He completed a PhD in computer science from the University of Pennsylvania in December 1998. From January 1999 to November 2002 he was a researcher at AT&T Labs-Research, and from January 2003 until December 2010 he was an assistant/associate professor at MIT. He joined Columbia University in January 2011. Prof. Collins's research has focused on topics including statistical parsing, structured prediction problems in machine learning, and NLP applications including machine translation, dialog systems, and speech recognition. His awards include a Sloan fellowship, an NSF career award, and best paper awards at several conferences: EMNLP (2002 and 2004), UAI (2004 and 2005), CoNLL 2008, and EMNLP 2010.
Oct 19: Taesun Moon, Pull your head out of your task: broader context in unsupervised models (AVW 2328)
abstract: I discuss unsupervised models and how broader context helps in the resolution of unsupervised or distantly supervised approaches. In the first section, I discuss how document boundaries help in two low-level unsupervised tasks that aren't traditionally resolved in terms of documents: unsupervised morphological segmentation/clustering and unsupervised part-of-speech tagging. For unsupervised morphology, I describe an intuitive model that uses document boundaries to strongly constrain how stems may be clustered and segmented with minimal parameter tuning. For unsupervised part-of-speech tagging, I discuss the crouching Dirichlet, hidden Markov model, an unsupervised POS-tagging model which takes advantage of the difference in the statistical variance of content word and function word POS-tags across documents. Next, I discuss a model of inferring probabilistic word meaning as a distribution over potential paraphrases within context. As opposed to many current approaches in lexical semantics which consider a limited subset of words in a sentence to infer meaning in isolation, this model is able to jointly conduct inference over all words in a sentence. Finally, I describe an approach for connecting language and geography that anchors natural language expressions to specific regions of the Earth. The core of the system is a region-topic model, which is used to learn word distributions for each region discussed in a given corpus. This model performs toponym resolution as a by-product, and additionally enables us to characterize a geographic distribution for corpora, individual texts, or even individual words. The last is joint work with Jason Baldridge, Travis Brown, Katrin Erk, and Mike Speriosu at the University of Texas, Austin.
Bio: Taesun Moon received an MA (2009) and PhD (2011) in linguistics from the University of Texas at Austin (2011) under the supervision of Katrin Erk. He received a BA (2002) in English literature from Seoul National University in South Korea.
Oct 27: Tom Griffiths (3:30 p.m., Bioscience Research Building, 1103)
People are remarkably good at acquiring complex knowledge from limited data, as is required in learning causal relationships, categories, or aspects of language. Successfully solving inductive problems of this kind requires having good "inductive biases" -- constraints that guide inductive inference. Viewed abstractly, understanding human learning requires identifying these inductive biases and exploring their origins. I will argue that probabilistic models of cognition provide a framework that can facilitate this project, giving a transparent characterization of the inductive biases of ideal learners. I will outline how probabilistic models are traditionally used to solve this problem, and then present a new approach that uses a mathematical analysis of the effects of cultural transmission as the basis for an experimental method that magnifies the effects of inductive biases.
Nov 2: Jason Eisner, A Non-Parametric Bayesian Approach to Inflectional Morphology (AVW 2460)
We learn how the words of a language are inflected, given a plain text corpus plus a small supervised set of known paradigms. The approach is principled, simply performing empirical Bayesian inference under a straightforward generative model that explicitly describes the generation of
1. The grammar and subregularities of the language (via many finite-state transducers coordinated in a Markov Random Field). 2. The infinite inventory of types and their inflectional paradigms (via a Dirichlet Process Mixture Model based on the above grammar). 3. The corpus of tokens (by sampling inflected words from the above inventory).
Our inference algorithm cleanly integrates several techniques that handle the different levels of the model: classical dynamic programming operations on the finite-state transducers, loopy belief propagation in the Markov Random Field, and MCMC and MCEM for the non-parametric Dirichlet Process Mixture Model.
We will build up the various components of the model in turn, showing experimental results along the way for several intermediate tasks such as lemmatization, transliteration, and inflection. Finally, we show that modeling paradigms jointly with the Markov Random Field, and learning from unannotated text corpora via the non-parametric model, significantly improves the quality of predicted word inflections.
This is joint work with Markus Dreyer.
Bio: Jason Eisner is Associate Professor of Computer Science at Johns Hopkins University, where he is also affiliated with the Center for Language and Speech Processing, the Cognitive Science Department, and the national Center of Excellence in Human Language Technology. He is particularly interested in designing algorithms that statistically exploit linguistic structure. His 80 or so papers have presented a number of algorithms for parsing and machine translation; algorithms for constructing and training weighted finite-state machines; formalizations, algorithms, theorems and empirical results in computational phonology; and unsupervised or semi-supervised learning methods for domains such as syntax, morphology, and word-sense disambiguation.
Nov 3: EACL / WWW Paper Clinic (11AM, CLIP Lab)
Nov 15: Sergei Nirenburg & Marge McShane, Reference Resolution (10AM, AVW 3258)
Most work on reference resolution in natural language processing has been marked by three features: (1) it has concentrated on textual co-reference resolution, which is the linking of text strings with their coreferential, textual antecedents; (2) only a small subset of reference phenomena have been covered – namely, those that are most easily treated by a “corpus annotation + machine learning” development strategy; and (3) the methods used to treat the selected subset do not hold much promise of being extensible to a broader range of more difficult reference phenomena.
Within the theory of Ontological Semantics, we view reference resolution completely differently. For us, resolving reference means linking references of objects and events in a text to their anchors in the fact repository of the system processing the text – or, to use the terminology of intelligent agents, the memory of the agent processing the text. Furthermore, reference relations extend beyond coreference to meronymy, set-member relations, type-instance relations, so-called “bridging” constructions, etc. The result of reference resolution is the appropriate memory modification of the text processing agent.
In this talk we will briefly introduce OntoSem, our semantically-oriented text processing system and then describe the approach to reference resolution used in OntoSem. We will motivate a semantically oriented approach to reference resolution and show how and why it is currently feasible to develop a new generation of reference resolution engines.
Bio: Dr. Sergei Nirenburg is Professor in the CSEE Department AT UMBC and the Director of its Institute for Language and Information Technologies. Before coming to UMBC, Dr. Nirenburg was Director of the Computing Research Laboratory and Professor of Computer Science at New Mexico State University. He received his Ph.D. in Linguistics from the Hebrew University of Jerusalem, Israel, and his M.Sc. in Computational Linguistics from Kharkov State University, USSR. Dr. Nirenburg has written or edited seven books and has published over 130 articles in various areas of computational linguistics and artificial intelligence. Dr. Nirenburg has directed a number of large-scale research and development projects in the areas of natural language processing, knowledge representation, reasoning, knowledge acquisition and cognitive modeling.
Marge McShane is a Research Associate Professor in the Department of Computer Science and Electrical Engineering of UMBC. She received her Ph.D. from Princeton University, with a specialization in linguistics. She works on theoretical and knowledge-oriented aspects of developing language-enabled intelligent agents. She has led several knowledge acquisition and annotation projects, including the development of a general-purpose workbench for developing computationally-tractable descriptions of lesser-studied languages. A special area of Dr. McShane’s interest is reference resolution, particularly its more difficult aspects, such as ellipsis and referential vagueness. She has published two books and over 60 scientific papers.
Nov 30: Claire Monteleoni (AVW 2328): Clustering Algorithms for Streaming and Online Settings
ABSTRACT: Clustering techniques are widely used to summarize large quantities of data (e.g. aggregating similar news stories), however their outputs can be hard to evaluate. While a domain expert could judge the quality of a clustering, having a human in the loop is often impractical. Probabilistic assumptions have been used to analyze clustering algorithms, for example i.i.d. data, or even data generated by a well-separated mixture of Gaussians. Without any distributional assumptions, one can analyze clustering algorithms by formulating some objective function, and proving that a clustering algorithm either optimizes or approximates it. The k-means clustering objective, for Euclidean data, is simple, intuitive, and widely-cited, however it is NP-hard to optimize, and few algorithms approximate it, even in the batch setting (the algorithm known as "k-means" does not have an approximation guarantee). Dasgupta (2008) posed open problems for approximating it on data streams.
In this talk, I will discuss my ongoing work on designing clustering algorithms for streaming and online settings. First I will present a one-pass, streaming clustering algorithm which approximates the k-means objective on finite data streams. This involves analyzing a variant of the k-means++ algorithm, and extending a divide-and-conquer streaming clustering algorithm from the k-medoid objective. Then I will turn to endless data streams, and introduce a family of algorithms for online clustering with experts. We extend algorithms for online learning with experts, to the unsupervised setting, using intermediate k-means costs, instead of prediction errors, to re-weight experts. When the experts are instantiated as k-means approximate (batch) clustering algorithms run on a sliding window of the data stream, we provide novel online approximation bounds that combine regret bounds extended from supervised online learning, with k-means approximation guarantees. Notably, the resulting bounds are with respect to the optimal k-means cost on the entire data stream seen so far, even though the algorithm is online. I will also present encouraging experimental results.
This talk is based on joint work with Nir Ailon, Ragesh Jaiswal, and Anna Choromanska.
BIO: Claire Monteleoni is an assistant professor of Computer Science at George Washington University, and adjunct research faculty at the Center for Computational Learning Systems at Columbia University, where she was previously research faculty. She did a postdoc in Computer Science and Engineering at the University of California, San Diego, and completed her PhD and Masters in Computer Science, at MIT. Her research focus is on machine learning algorithms and theory for problems including learning from data streams, learning from raw (unlabeled) data, learning from private data, and Climate Informatics: accelerating discovery in Climate Science with machine learning. Her papers have received several awards, and she currently serves on the Senior Program Committee of the International Conference on Machine Learning, and the Editorial Board of the Machine Learning Journal.
Dec. 7: Bill Rand: Authority, Trust and Influence: The Complex Network of Social Media (AVW 2328)
The dramatic feature of social media is that it gives everyone a voice; anyone can speak out and express their opinion to a crowd of followers with little or no cost or effort, which creates a loud and potentially overwhelming marketplace of ideas. Given this egalitarian competition, how do users of social media identify authorities in this crowded space? Who do they trust to provide them with the information and the recommendations that they want? Which tastemakers have the greatest influence on social media users? Using agent-based modeling, machine learning and network analysis we begin to examine and shed light on these questions and develop a deeper understanding of the complex system of social media.
Bio: He received his doctorate in Computer Science from the University of Michigan in 2005 where he worked on the application of evolutionary computation techniques to dynamic environments, and was a regular member of the Center for the Study of Complex Systems, where he built a large-scale agent-based model of suburban sprawl. Before coming to Maryland, he was awarded a postdoctoral research fellowship at Northwestern University in the Northwestern Institute on Complex Systems (NICO), where he worked with the NetLogo development team studying agent-based modeling, evolutionary computation and network science.