CCC Colloquium: Jordan Boyd-Graber (February 26, 2010)

From University of Maryland Cloud Computing Center
Jump to: navigation, search

Inference and Validation of Probabilistic Models of Language in the Cloud

Jordan Boyd-Graber (University of Maryland)

Friday, February 26, 2010
10am, Hornbake 2119

Talk slides: PDF (6.50 MB)


Probabilistic models of language provide a statistically sound formalism for describing the patterns that appear in real-world text. In this talk, I will discuss ways of bringing humans tighter into the loop for the development of these models and to how to build richer models that better capture the properties of language.

First, I discuss a data collection procedure to collect empirical, human-based judgments of how semantically similar concepts are. We call this measurement "evocation," compare it with other similarity measures, and use it to improve assistive devices created for people suffering from aphasia, a debilitating neurological disorder.

Second, I discuss techniques to build human-centered measurements of topic models (e.g. document-centric models such as latent Dirichlet allocation and probabilistic latent semantic indexing, popular in information retrieval). After performing large-scale evaluations for these models, we found that human judgments of quality don't necessarily correlate with traditional evaluations such as held-out likelihood.

Finally, I demonstrate a syntactic topic model, a Bayesian nonparametric model of language that combines local syntactic context features with document-wide properties using a product of experts formalism. We derive efficient variational inference that can be distributed over a large number of machines and show that it offers the "best of both worlds" over syntax-only and document-only models.

About the Speaker

Jordan Boyd-Graber is a postdoc at the University of Maryland working with Philip Resnik and was a PhD student at Princeton University under David Blei. Before Princeton, Jordan obtained two bachelors degrees (history and computer science) from the California Institute of Technology. Originally from Iowa, Jordan enjoys writing and participating in trivia contests.


This talk is open to the public and will take place in the Hornbake Building, South Wing, at the University of Maryland, College Park. Directions to campus can be found here and campus maps can be found here.