Research: Difference between revisions

Revision as of 00:58, 28 August 2010

Bayesian Modeling

Faculty

Jordan Boyd-Graber
Jimmy Lin
Hal Daume III
Philip Resnik

Postdocs

Graduate Students

Eric Hardisty, Yuening Hu, Ke Zhai

Bayesian modeling is a rigorous mathematical formalism that allows us to build systems that reflect our uncertainty about the world. Applied to language, they allow us to build models that reflect the "latent" aspects of communication such as topic, part of speech, syntax, or sentiment. Using posterior inference, we can use the models to discover the latent features that best explain observed language.

In the CLIP lab, we are interested in

building tools that make it easier for people to work with Bayesian models
scaling inference for Bayesian models to the web scale
understanding how humans interpret and understand the latent variables in Bayesian models

Representative Publications and Project Pages:

Machine Translation and Paraphrasing

Faculty

Bonnie Dorr	interlingual and hybrid MT, semantically-informed syntactic MT
Mary Harper	multilingual parsing, language modeling
Philip Resnik	linguistically informed translation modeling, paraphrase, crowdsourcing and translation

Postdocs

Graduate Students

Vlad Eidelman

The CLIP Laboratory's work in machine translation continues the lab's long tradition of research in translation. Like most of the field, we work within the framework of statistical MT, but with an emphasis on taking appropriate advantage of knowledge driven or linguistically informed model structures, features, and priors. Some current areas of research include syntactically informed language models, linguistically informed translation model features, the use of unsupervised methods in translation modeling, exploitation of large scale "cloud computing" methods, and human-machine collaborative translation via crowdsourcing.

Paraphrase, the ability to express the same meaning in multiple ways, is an active area of research within the NLP community and here in the CLIP Laboratory. Our work in paraphrase includes the use of paraphrase in MT evaluation and parameter estimation, lattice and forest translation, and collaborative translation, as well as research on lexical and phrasal semantic similarity measures, meaning preservation in machine translation and summarization, and large-scale document similarity computation via cloud computing methods.

Representative Publications and Project Pages:

Greene and Resnik, NAACL 2009: More Than Words: Syntactic Packaging and Implicit Sentiment

Summarization

Parsing and Tagging

Computational Social Science

Faculty

Jordan Boyd-Graber	scientific literature analysis, persuasion
Bonnie Dorr	sentiment analysis, scientific literature analysis
Jimmy Lin	social media
Douglas W. Oard	topical relation detection
Louiqa Raschid	diffusion, prediction, event detection
Philip Resnik	sentiment, persuasion
Amy Weinberg	sentiment, persuasion

Postdocs

Graduate Students

Eric Hardisty Asad Sayed Hassan Sayyadi Shanchan Wu

Computational social science involves the use of computational methods and models to leverage "the capacity to collect and analyze data at a scale that may reveal patterns of individual and group behaviors". Research in the CLIP Laboratory is at the forefront of this emerging area, and includes sentiment analysis (computational modeling and prediction of opinions, perspective, and other private states), automatic analysis and visualization of the scientific literature, modeling the diffusion of technological innovations, and modeling and prediction of social goals and actions such as persuasion.

Representative Publications and Project Pages:

Greene and Resnik, NAACL 2009: More Than Words: Syntactic Packaging and Implicit Sentiment

Information Retrieval

Faculty

Jimmy Lin
Doug Oard

Postdocs

Earl Wagner

style="border-bottom: 3px solid grey; border-right: 1px solid grey; background:#ffefef"

Graduate Students

style="border-bottom: 3px solid grey;"

Tan Xu

colspan="3" align="left"

The goal of information retrieval is to help people find what they are looking for. Information retrieval research in the CLIP lab focuses principally on retrieval based on the language contained in text, in speech, and in document images. We work across a broad range of content types, from tweets to tomes, from talking to texting, and from Cebuano to Chinese. Three perspectives inform our work:

we integrate a broad range of computational linguistics techniques,
we focus on scalable techniques that can accommodate very large collections
we sometimes draw the boundaries of our “systems” very broadly to include both the automated tools that we create and the process by which users can best employ those tools.

One example that illustrates these perspectives is our work with “cross-language information retrieval,” in which close coupling of machine translation and information retrieval techniques make it possible for people to find and use information written in languages that they can neither read nor write. Another example is our work on the design and evaluation of “question answering” systems that can automatically find and present answers to complex questions, which serves as a bridge between our work on information retrieval and summarization.

Representative Publications and Project Pages:

Douglas W. Oard, "| Multilingual Information Access," in Encyclopedia of Library and Information Sciences, 3rd Ed., 2009.

Disambiguation

Faculty

Jordan Boyd-Graber
Judith Klavans
Philip Resnik

Postdocs

Graduate Students

Disambiguation is the process of determining the meaning or senses of a word in its context; disambiguation remains one of the most challenging NLP problems since discovering word senses involves syntactic, semantic and pragmatic contextual inferencing, along with a rich knowledge base to base selection upon. For example, the word "wing" in the theater differs from airplanes, yet another sense for furniture ("wing chair") applies to some usages. Often disambiguation can be based on windows of two and three words, but usually involves larger computation. Techniques for disambiguation range from the use of large scale thesaural resources (such as WordNet) to purely statistical methods.

Representative Publications and Project Pages:

@@ Line 36: / Line 36: @@
 * [http://drum.lib.umd.edu/handle/1903/10058 Gibbs Sampling for the Uninitiated]
 |}
 ==Machine Translation and Paraphrasing==

Research: Difference between revisions

Computational Linguistics and Information Processing

Revision as of 00:58, 28 August 2010

Contents

Bayesian Modeling

Machine Translation and Paraphrasing

Summarization

Parsing and Tagging

Computational Social Science

Information Retrieval

Disambiguation