Actions

Research: Difference between revisions

Computational Linguistics and Information Processing

Line 27: Line 27:
| style="border-bottom: 3px solid grey;" colspan="3" align="left" |
| style="border-bottom: 3px solid grey;" colspan="3" align="left" |


The Web promises unprecedented access to the perspectives of an
enormous number of people on a wide range of issues.  Turning that
still untamed cacophony into meaningful insights requires dealing with
the linguistic diversity and scale of the Web.  Most current research
focuses on specialized tasks such as tracking consumer opinions, and
virtually all current research treats the Web as both monolithic and
monolingual, ignoring the variety of languages represented and the
rich interplay between topics and issues under discussion.


This project moves the state of the art forward by focusing on two key
The goal of information retrieval is to help people find what they are looking for.  Information retrieval research in the CLIP lab focuses principally on retrieval based on the language contained in text, in speech, and in document imagesWe work across a broad range of content types, from tweets to tomes, from talking to texting, and from Cebuano to ChineseThree perspectives inform our work:
challengesFirst, highly-scalable MapReduce algorithms for
* we integrate a broad range of computational linguistics techniques,
linguistic modeling within a Bayesian framework, making use of
* we focus on scalable techniques that can accommodate very large collections
variational inference to achieve a high degree of parallelization on
* we sometimes draw the boundaries of our “systems” very broadly to include both the automated tools that we create and the process by which users can best employ those tools.  
Web-scale datasetsSecond, novel Bayesian models that learn
consistent interpretations of text across languages and a wide range
of response variables of interest (for example, views on an issue,
strength of emotion relative to an event, and focus of attention).


The techniques developed in this project will be demonstrated on large
One example that illustrates these perspectives is our work with “cross-language information retrieval,” in which close coupling of machine translation and information retrieval techniques make it possible for people to find and use information written in languages that they can neither read nor writeAnother example is our work on the design and evaluation of “question answering” systems that can automatically find and present answers to complex questions, which serves as a bridge between our work on information retrieval and summarization.
crawls of Web pages and blogsPotential applications for these
technologies include helping a schoolchild learn that people in
different countries may view some issues very differently, helping a
politician understand how constituents are reacting to proposed
legislation, or helping an intelligence analyst understand how public
opinion is evolving in a hostile country.


|-
|-

Revision as of 01:31, 13 August 2010

Machine Translation

Summarization

Parsing and Tagging

Sentiment Analysis

Bayesian Modeling

Cross‐language Bayesian models for Web‐scale text analysis using MapReduce
PI Jimmy Lin
Other Faculty Jordan Boyd-Graber, Philip Resnik
Graduate Students
Funding National Science Foundation 1018625


The goal of information retrieval is to help people find what they are looking for. Information retrieval research in the CLIP lab focuses principally on retrieval based on the language contained in text, in speech, and in document images. We work across a broad range of content types, from tweets to tomes, from talking to texting, and from Cebuano to Chinese. Three perspectives inform our work:

  • we integrate a broad range of computational linguistics techniques,
  • we focus on scalable techniques that can accommodate very large collections
  • we sometimes draw the boundaries of our “systems” very broadly to include both the automated tools that we create and the process by which users can best employ those tools.

One example that illustrates these perspectives is our work with “cross-language information retrieval,” in which close coupling of machine translation and information retrieval techniques make it possible for people to find and use information written in languages that they can neither read nor write. Another example is our work on the design and evaluation of “question answering” systems that can automatically find and present answers to complex questions, which serves as a bridge between our work on information retrieval and summarization.

Project Webpage Publications