Computational Linguistics and Information Processing


{{#widget:Google Calendar | |color=B1440E |title=CLIP Events |view=AGENDA |height=300 }}

2010 Past Speakers

  • Roger Levy
  • Earl Wagner
  • Eugene Charniak
  • Dave Newman
  • Ray Mooney

October 20: Kristy Hollingshead: Search Errors and Model Errors in Pipeline Systems

Pipeline systems, in which data is sequentially processed in stages with the output of one stage providing input to the next, are ubiquitous in the field of natural language processing (NLP) as well as many other research areas. The popularity of the pipeline system architecture may be attributed to the utility of pipelines in improving scalability by reducing search complexity and increasing efficiency of the system. However, pipelines can suffer from the well-known problem of "cascading errors," where errors earlier in the pipeline propagate to later stages in the pipeline. In this talk I will make a distinction between two different type of cascading errors in pipeline systems. The first I will term "search errors," where there exists a higher-scoring candidate (according to the model), but that candidate has been excluded from the search space. The second type of error that I will address might be termed "model errors," where the highest-scoring candidate (according to the model) is not the best candidate (according to some gold standard). Statistical NLP models are imperfect by nature, resulting in model errors. Interestingly, the same pipeline framework that causes search errors can also resolve (or work around) model errors; in this talk I will demonstrate several techniques for detecting and resolving search and model errors, which can result in improved efficiency with no loss in accuracy. I will briefly mention the technique of pipeline iteration, introduced in my ACL'07 paper, and introduce some related results from my dissertation. I will then focus on work done with my PhD advisor Brian Roark on chart cell constraints, as published in our COLING'08 and NAACL'09 papers; this work provably reduces the complexity of a context-free parser to quadratic performance in the worst case (observably linear) with a slight gain in accuracy using the Charniak parser. While much of this talk will be on parsing pipelines, I am currently extending some of this work to MT pipelines and would welcome discussion along those lines.

Kristy Hollingshead earned her PhD in Computer Science and Engineering this year, from the Center for Spoken Language Understanding (CSLU) at the Oregon Health & Science University (OHSU). She received her B.A. in English-Creative Writing from the University of Colorado in 2000 and her M.S. in Computer Science from OHSU in 2004. Her research interests in natural language processing include parsing, machine translation, evaluation metrics, and assistive technologies. She is also interested in general techniques on improving system efficiency, to allow for richer contextual information to be extracted for use in downstream stages of a pipeline system. Kristy was a National Science Foundation Graduate Research Fellow from 2004-2007.

October 27: Matthias Bröcheler

November 3, Stanley Kok: Structure Learning in Markov Logic Networks

Statistical learning handles uncertainty in a robust and principled way. Relational learning (also known as inductive logic programming) models domains involving multiple relations. Recent years have seen a surge of interest in the statistical relational learning (SRL) community in combining the two, driven by the realization that many (if not most) applications require both and by the growing maturity of the two fields.

Markov logic networks (MLNs) is a statistical relational model that has gained traction within the AI community in recent years because of its robustness to noise and its ability to compactly model complex domains. MLNs combine probability and logic by attaching weights to first-order formulas, and viewing these as templates for features of Markov networks. Learning the structure of an MLN consists of learning both formulas and their weights.

To obtain weighted MLN formulas, we could rely on human experts to specify them. However, this approach is error-prone and requires painstaking knowledge engineering. Further, it will not work on domains where there is no human expert. The ideal solution is to automatically learn MLN structure from data. However, this is a challenging task because of its super-exponential search space. In this talk, we present a series of algorithms that efficiently and accurately learn MLN structure.

November 10: Bob Carpenter

November 24: Ned Talley