Computational Linguistics and Information Processing

Revision as of 15:28, 19 September 2013 by Jimmylin (talk | contribs)

The CLIP Colloquium is a weekly speaker series organized and hosted by CLIP Lab. The talks are open to everyone. Most talks are held at 11AM in AV Williams 3258 unless otherwise noted. Typically, external speakers have slots for one-on-one meetings with Maryland researchers before and after the talks; contact the host if you'd like to have a meeting.

If you would like to get on the list or for other questions about the colloquium series, e-mail Jimmy Lin, the current organizer.

{{#widget:Google Calendar | |color=B1440E |title=Upcoming Talks |view=AGENDA |height=300 }}

9/4/2013 and 9/11/2013: N-Minute Madness

The people of CLIP talk about what's going on in N minutes.

Special location note: on 9/4/2013, we'll be in AVW 4172.

9/18/2013: CLIP Lab Meeting

Phillip will set the agenda.

9/25/2013: Spatio-Temporal Crime Prediction using GPS- and Time-Tagged Tweets

Speaker: Matthew Gerber, University of Virginia
Time: Wednesday, September 25, 2013, 11:00 AM
Venue: AVW 3258

Recent research has shown that social media messages (e.g., tweets) can be used to predict various large-scale events like elections (Bermingham and Smeaton, 2011), infectious disease outbreaks (St. Louis and Zorlu, 2012), and even national revolutions (Howard et al., 2011). The essential hypothesis is that the timing, location, and content of these messages are informative with regard to such future events. For many years, the Predictive Technology Laboratory at the University of Virginia has been constructing statistical prediction models of criminal incidents (e.g., robberies and assaults), and we have recently found preliminary evidence of Twitter’s predictive power in this domain (Wang, Brown, and Gerber, 2012). In my talk, I will present an overview of our crime prediction research with a specific focus on current Twitter-based approaches. I will discuss (1) how precise locations and times of tweets have been integrated into the crime prediction model, and (2) how the textual content of tweets has been integrated into the model via latent Dirichlet allocation. I will present current results of our research in this area and discuss future areas of investigation.

About the Speaker: Matthew Gerber joined the University of Virginia faculty in 2011 and is currently a Research Assistant Professor in the Department of Systems and Information Engineering. Prior to joining the University of Virginia, Matthew was a Ph.D. candidate in the Department of Computer Science and Engineering at Michigan State University and a Visiting Instructor in the School of Computing and Information Systems at Grand Valley State University. In 2010, he received (jointly with Joyce Chai) the ACL Best Long Paper Award for his work on recovering null-instantiated arguments for semantic role labeling. His current research focuses on the semantic analysis of natural language text and its application to various prediction and informatics problems.

10/2/2013: Title TBA

Speaker: Miles Osborne, University of Edinburgh
Time: Wednesday, October 2, 2013, 11:00 AM
Venue: AVW 3258

10/4/2013: Recent Advances in Automatic Summarization

Speaker: Yang Liu, University of Texas, Dallas
Time: Friday, October 4, 2013, 11:00 AM
Venue: AVW 4172

There has been great progress on automatic summarization over the past two decades, most notably approaches based on integer linear programming (ILP). In this talk I’ll present some recent work on summarization, focusing first on extractive summarization. I will describe work with my students on the use of a supervised regression model for bigram weight estimation and the application of an ILP model to maximize the bigram coverage in the summaries.

In the second part of the talk, I will discuss new approaches to compressive summarization, moving a step closer to abstractive summarization. Using a pipeline compression and summarization framework, I will show how to create a summary guided compression module and to use it for generating multiple compression candidates. Compressed summary sentences are selected from these candidates by the application of a subsequent ILP system.

Finally, I will introduce a graph-cut based method for joint compression and summarization. This is more efficient than the standard ILP-based joint compressive summarization method, and has the flexibility to incorporate grammar constraints in order to generate summaries with better readability. I will present various experimental results to demonstrate the effectiveness of our approaches.

About the Speaker: Dr. Yang Liu is currently an Associate Professor in the Computer Science Department at the University of Texas at Dallas (UTD). She received her B.S. and M.S degree from Tsinghua University, and Ph.D from Purdue University in 2004. She was a researcher at the International Computer Science Institute (ICSI) at Berkeley for 3 years before she joined UTD as an assistant professor in 2005. Dr. Liu's research interest is in speech and natural language processing. She has published over 100 papers in this field. Dr. Liu received the NSF CAREER award in 2009 and the Air Force Young Investigator award in 2010. She is currently an Associate editor of IEEE Transactions on Audio, Speech, and Language Processing; ACM Transactions on Speech and Language Processing; ACM Transactions on Asian Language Information Processing; and Speech Communication.

10/9/2013: Semantics and Social Science: Learning to Extract International Relations from Political Context

Speaker: Brendan O'Connor, Carnegie Mellon University
Time: Wednesday, October 9, 2013, 11:00 AM
Venue: AVW 3258

10/23/2013: Towards Minimizing the Annotation Cost of Certified Text Classification

Speaker: Mossaab Bagdouri, University of Maryland
Time: Wednesday, October 23, 2013, 11:00 AM
Venue: AVW 3258

The common practice of testing a sequence of text classifiers learned on a growing training set, and stopping when a target value of estimated effectiveness is first met, introduces a sequential testing bias. In settings where the effectiveness of a text classifier must be certified (perhaps to a court of law), this bias may be unacceptable. The choice of when to stop training is made even more complex when, as is common, the annotation of training and test data must be paid for from a common budget: each new labeled training example is a lost test example. Drawing on ideas from statistical power analysis, we present a framework for joint minimization of training and test annotation that maintains the statistical validity of effectiveness estimates, and yields a natural definition of an optimal allocation of annotations to training and test data. We identify the development of allocation policies that can approximate this optimum as a central question for research. We then develop simulation-based power analysis methods for van Rijsbergen's F-measure, and incorporate them in four baseline allocation policies which we study empirically. In support of our studies, we develop a new analytic approximation of confidence intervals for the F-measure that is of independent interest.

10/30/2013: Teaching machines to read for fun and profit

Speaker: Gary Kazantsev, Bloomberg LP
Time: Wednesday, October 30, 2013, 11:00 AM
Venue: AVW 3258

11/08/2013: Title TBA

Speaker: Ani Nenkova, University of Pennsylvania
Time: Friday, November 8, 2013, 11:00 AM
Venue: AVW 4172

12/06/2013: Title TBA

Speaker: Karen Livescu, Toyota Technological Institute at Chicago
Time: Friday, December 4, 2013, 11:00 AM
Venue: AVW 3258

12/06/2013: Title TBA

Speaker: Yejin Choi, Stony Brook University
Time: Friday, December 6, 2013, 11:00 AM
Venue: AVW 4172

Previous Talks