Actions

Events: Difference between revisions

Computational Linguistics and Information Processing

 
(212 intermediate revisions by 10 users not shown)
Line 1: Line 1:
== Colloquia ==
<center>[[Image:colloq.jpg|center|504px|x]]</center>


''Titles and abstracts appear after the calendar.''
== CLIP Colloquium ==


=== Google Calendar for CLIP Speakers===
The CLIP Colloquium is a weekly speaker series organized and hosted by CLIP Lab. The talks are open to everyone. Most talks are held on Wednesday at 11AM online unless otherwise noted. Typically, external speakers have slots for one-on-one meetings with Maryland researchers.


{{#widget:Google Calendar
If you would like to get on the clip-talks@umiacs.umd.edu list or for other questions about the colloquium series, e-mail [mailto:rudinger@umd.edu Rachel Rudinger], the current organizer.
|id=lqah25nfftkqi2msv25trab8pk@group.calendar.google.com
|color=B1440E
|title=CLIP Events
|view=AGENDA
|height=300
}}


=== 2010 Past Speakers (prior to this page going live) ===
For up-to-date information, see the [https://talks.cs.umd.edu/lists/7 UMD CS Talks page].  (You can also subscribe to the calendar there.)


* Roger Levy
=== Colloquium Recordings ===
* Earl Wagner
* [[Colloqium Recording (Fall 2020)|Fall 2020]]
* Eugene Charniak
* [[Colloqium Recording (Spring 2021)|Spring 2021]]
* Dave Newman
* [[Colloqium Recording (Fall 2021)|Fall 2021]]
* Ray Mooney
* [[Colloqium Recording (Spring 2022)|Spring 2022]]


=== October 20, Kristy Hollingshead: Search Errors and Model Errors in Pipeline Systems ===
=== Previous Talks ===
* [[https://talks.cs.umd.edu/lists/7?range=past Past talks, 2013 - present]]
* [[CLIP Colloquium (Spring 2012)|Spring 2012]]  [[CLIP Colloquium (Fall 2011)|Fall 2011]]  [[CLIP Colloquium (Spring 2011)|Spring 2011]]  [[CLIP Colloquium (Fall 2010)|Fall 2010]]


Pipeline systems, in which data is sequentially processed in stages with the output of one stage providing input to the next, are ubiquitous in the field of natural language processing (NLP) as well as many other research areas. The popularity of the pipeline system architecture may be attributed to the utility of pipelines in improving scalability by reducing search complexity and increasing efficiency of the system. However, pipelines can suffer from the well-known problem of "cascading errors," where errors earlier in the pipeline propagate to later stages in the pipeline. In this talk I will make a distinction between two different type of cascading errors in pipeline systems. The first I will term "search errors," where there exists a higher-scoring candidate (according to the model), but that candidate has been excluded from the search space. The second type of error that I will address might be termed "model errors," where the highest-scoring candidate (according to the model) is not the best candidate (according to some gold standard). Statistical NLP models are imperfect by nature, resulting in model errors. Interestingly, the same pipeline framework that causes search errors can also resolve (or work around) model errors; in this talk I will demonstrate several techniques for detecting and resolving search and model errors, which can result in improved efficiency with no loss in accuracy. I will briefly mention the technique of pipeline iteration, introduced in my ACL'07 paper, and introduce some related results from my dissertation. I will then focus on work done with my PhD advisor Brian Roark on chart cell constraints, as published in our COLING'08 and NAACL'09 papers; this work provably reduces the complexity of a context-free parser to quadratic performance in the worst case (observably linear) with a slight gain in accuracy using the Charniak parser. While much of this talk will be on parsing pipelines, I am currently extending some of this work to MT pipelines and would welcome discussion along those lines.
== CLIP NEWS  ==


Kristy Hollingshead earned her PhD in Computer Science and Engineering this year, from the Center for Spoken Language Understanding (CSLU) at the Oregon Health & Science University (OHSU). She received her B.A. in English-Creative Writing from the University of Colorado in 2000 and her M.S. in Computer Science from OHSU in 2004. Her research interests in natural language processing include parsing, machine translation, evaluation metrics, and assistive technologies. She is also interested in general techniques on improving system efficiency, to allow for richer contextual information to be extracted for use in downstream stages of a pipeline system. Kristy was a National Science Foundation Graduate Research Fellow from 2004-2007.
* News about CLIP researchers on the UMIACS website [http://www.umiacs.umd.edu/about-us/news]
 
* Please follow us on Twitter @ClipUmd[https://twitter.com/ClipUmd?lang=en]
=== October 27, Stanley Kok: Structure Learning in Markov Logic Networks ===
 
Statistical learning handles uncertainty in a robust and principled way.
Relational learning (also known as inductive logic programming)
models domains involving multiple relations. Recent years have seen a
surge of interest in the statistical relational learning (SRL) community
in combining the two, driven by the realization that many (if not most)
applications require both and by the growing maturity of the two fields.
 
Markov logic networks (MLNs) is a statistical relational model that has
gained traction within the AI community in recent years because of its
robustness to noise and its ability to compactly model complex domains.
MLNs combine probability and logic by attaching weights to first-order
formulas, and viewing these as templates for features of Markov networks.
Learning the structure of an MLN consists of learning both formulas and
their weights.
 
To obtain weighted MLN formulas, we could rely on human experts
to specify them. However, this approach is error-prone and requires
painstaking knowledge engineering. Further, it will not work on domains
where there is no human expert. The ideal solution is to automatically
learn MLN structure from data. However, this is a challenging task because
of its super-exponential search space. In this talk, we present a series of
algorithms that efficiently and accurately learn MLN structure.
 
=== November 1, Owen Rambow: Relating Language to Cognitive State ===
 
In the 80s and 90s of the last century, in subdisciplines such as planning,
text generation, and dialog systems, there was considerable interest in
modeling the cognitive states of interacting autonomous agents.  Theories
such as Speech Act Theory (Austin 1962), the belief-desire-intentions model
of Bratman (1987), and Rhetorical Structure Theory (Mann and Thompson 1988)
together provide a framework in which to link cognitive state with language
use. However, in general natural language processing (NLP), little use was
made of such theories, presumably because of the difficulty at the time of
some underlying tasks (such as syntactic parsing). In this talk, I propose
that it is time to again think about the explicit modeling of cognitive
state for participants in discourse. In fact, that is the natural way to
formulate what NLP is all about.  The perspective of cognitive state can
provide a context in which many disparate NLP tasks can be classified and
related.  I will present two NLP projects at Columbia which relate to the
modeling of cognitive state:
 
Discourse participants need to model each other's cognitive states, and
language makes this possible by providing special morphological, syntactic,
and lexical markers.  I present results in automatically determining the
degree of belief of a speaker in the propositions in his or her utterance.
 
Bio: PhD from University of Pennsylvania, 1994, working on German syntax.
My office mate was Philip Resnik.  I worked at CoGentex, Inc (a small
company) and AT&T Labs -- Research until 2002, and since then at Columbia as
a Research Scientist.  My research interests cover both the nuts-and-bolts
of languages, specifically syntax, and how language is used in context.
 
=== November 10, Bob Carpenter: Whence Linguistic Data?  ===
 
The empirical approach to linguistic theory involves collecting
data and annotating it according to a coding standard.  The
ability of multiple annotators to consistently annotate new
data reflects the applicability of the theory.    In this
talk, I'll introduce a generative probabilistic model of the
annotation process for categorical data.  Given a collection of
annotated data, we can infer the true labels of items, the prevalence
of some phenomenon (e.g. a given intonation or syntactic alternation),
the accuracy and category bias of each annotator, and the codability
of the theory as measured by the mean accuracy and bias of annotators
and their variability.  Hierarchical model extensions allow us to
model item labeling difficulty and take into account annotator
background and experience.  I'll demonstrate the efficacy of the
approach using expert and non-expert pools of annotators for simple
linguistic labeling tasks such as textual inference, morphological
tagging, and named-entity extraction.  I'll discuss applications
such as monitoring an annotation effort, selecting items with active
learning, and generating a probabilistic gold standard for machine
learning training and evaluation.
 
=== November 15, William Webber: Information retrieval effectiveness: measurably going nowhere? ===
 
Information retrieval works by heuristics; correctness cannot be
formally proved, but must be empirically assessed.  Test
collections make this evaluation automated and repeatable.
Collection-based evaluation has been standard for half a century.
The IR community prides itself on the rigour of the
experimental tradition that has been built upon this
foundation;  it is notoriously difficult to publish in the
field without a thorough experimental validation.  No
attention, however, has been paid to the question of whether
methodological rigour in evaluation has to verifiable.  In
this talk, we present a survey of retrieval results published
over the past decade, which fails to find evidence that
retrieval effectiveness is in fact improving.  Rather, each
experiment's impressive leap forward is preceded by a few
careful steps back.
 
Bio:
 
William Webber is a Research Associate in the Department of Computer
Science and Software Engineering at the University of Melbourne,
Australia. He has recently completed his PhD thesis, "Measurement in
Information Retrieval Evaluation", under the supervision of Professors
Alistair Moffat and Justin Zobel.
 
=== November 24: Ned Talley ===

Latest revision as of 18:22, 3 November 2023

x

CLIP Colloquium

The CLIP Colloquium is a weekly speaker series organized and hosted by CLIP Lab. The talks are open to everyone. Most talks are held on Wednesday at 11AM online unless otherwise noted. Typically, external speakers have slots for one-on-one meetings with Maryland researchers.

If you would like to get on the clip-talks@umiacs.umd.edu list or for other questions about the colloquium series, e-mail Rachel Rudinger, the current organizer.

For up-to-date information, see the UMD CS Talks page. (You can also subscribe to the calendar there.)

Colloquium Recordings

Previous Talks

CLIP NEWS

  • News about CLIP researchers on the UMIACS website [1]
  • Please follow us on Twitter @ClipUmd[2]