Actions

Research

Computational Linguistics and Information Processing

Machine Translation

Summarization

Parsing and Tagging

Sentiment Analysis

Bayesian Modeling

Cross‐language Bayesian models for Web‐scale text analysis using MapReduce
PI Jimmy Lin
Other Faculty Jordan Boyd-Graber, Philip Resnik
Students
Funding NSF 1018625

The Web promises unprecedented access to the perspectives of an enormous number of people on a wide range of issues. Turning that still untamed cacophony into meaningful insights requires dealing with the linguistic diversity and scale of the Web. Most current research focuses on specialized tasks such as tracking consumer opinions, and virtually all current research treats the Web as both monolithic and monolingual, ignoring the variety of languages represented and the rich interplay between topics and issues under discussion.

This project moves the state of the art forward by focusing on two key challenges. First, highly-scalable MapReduce algorithms for linguistic modeling within a Bayesian framework, making use of variational inference to achieve a high degree of parallelization on Web-scale datasets. Second, novel Bayesian models that learn consistent interpretations of text across languages and a wide range of response variables of interest (for example, views on an issue, strength of emotion relative to an event, and focus of attention).

The techniques developed in this project will be demonstrated on large crawls of Web pages and blogs. Potential applications for these technologies include helping a schoolchild learn that people in different countries may view some issues very differently, helping a politician understand how constituents are reacting to proposed legislation, or helping an intelligence analyst understand how public opinion is evolving in a hostile country.