Events

The CLIP Colloquium is a weekly speaker series organized and hosted by CLIP Lab. The talks are open to everyone. Most talks are held at 11AM in AV Williams 3258 unless otherwise noted. Typically, external speakers have slots for one-on-one meetings with Maryland researchers before and after the talks; contact the host if you'd like to have a meeting.

If you would like to get on the cl-colloquium@umiacs.umd.edu list or for other questions about the colloquium series, e-mail Jimmy Lin, the current organizer.

{{#widget:Google Calendar |id=lqah25nfftkqi2msv25trab8pk@group.calendar.google.com |color=B1440E |title=Upcoming Talks |view=AGENDA |height=300 }}

9/4/2013 and 9/11/2013: N-Minute Madness

The people of CLIP talk about what's going on in N minutes.

Special location note: on 9/4/2013, we'll be in AVW 4172.

9/18/2013: CLIP Lab Meeting

Phillip will set the agenda.

9/25/2013: Spatio-Temporal Crime Prediction using GPS- and Time-Tagged Tweets

Speaker: Matthew Gerber, University of Virginia
Time: Wednesday, September 25, 2013, 11:00 AM
Venue: AVW 3258

Recent research has shown that social media messages (e.g., tweets) can be used to predict various large-scale events like elections (Bermingham and Smeaton, 2011), infectious disease outbreaks (St. Louis and Zorlu, 2012), and even national revolutions (Howard et al., 2011). The essential hypothesis is that the timing, location, and content of these messages are informative with regard to such future events. For many years, the Predictive Technology Laboratory at the University of Virginia has been constructing statistical prediction models of criminal incidents (e.g., robberies and assaults), and we have recently found preliminary evidence of Twitter’s predictive power in this domain (Wang, Brown, and Gerber, 2012). In my talk, I will present an overview of our crime prediction research with a specific focus on current Twitter-based approaches. I will discuss (1) how precise locations and times of tweets have been integrated into the crime prediction model, and (2) how the textual content of tweets has been integrated into the model via latent Dirichlet allocation. I will present current results of our research in this area and discuss future areas of investigation.

About the Speaker: Matthew Gerber joined the University of Virginia faculty in 2011 and is currently a Research Assistant Professor in the Department of Systems and Information Engineering. Prior to joining the University of Virginia, Matthew was a Ph.D. candidate in the Department of Computer Science and Engineering at Michigan State University and a Visiting Instructor in the School of Computing and Information Systems at Grand Valley State University. In 2010, he received (jointly with Joyce Chai) the ACL Best Long Paper Award for his work on recovering null-instantiated arguments for semantic role labeling. His current research focuses on the semantic analysis of natural language text and its application to various prediction and informatics problems.

10/2/2013: Development of a Term Weighting Formula for Search Result Ranking

Speaker: Jiaul Paik, University of Maryland
Time: Wednesday, October 2, 2013, 11:00 AM
Venue: AVW 3258

The effectiveness of all well known search engines crucially depends on the quality of the underlying term weighting mechanism. In this talk, first, I will briefly talk about the grand hypotheses which build the foundation for effective term weighting, followed by the limitations of the state of the art methods. I will then describe the development of a novel TF-IDF term weighting scheme. Finally, I will show the experimental resuls and compare them with the state of the art term weghting schemes. The talk will conclude with some potential future directions.

About the Speaker: Jiaul Paik is a new CLIP postdoc. He earned his PhD in Computer Science from the Indian Statistical Institute, Kolkata, India. He has published a number of papers in ACM TOIS, ACM TALIP and ACM SIGIR. His research mainly focuses on challenges in information retrieval.

10/4/2013: Recent Advances in Automatic Summarization

Speaker: Yang Liu, University of Texas, Dallas
Time: Friday, October 4, 2013, 11:00 AM
Venue: AVW 4172

There has been great progress on automatic summarization over the past two decades, most notably approaches based on integer linear programming (ILP). In this talk I’ll present some recent work on summarization, focusing first on extractive summarization. I will describe work with my students on the use of a supervised regression model for bigram weight estimation and the application of an ILP model to maximize the bigram coverage in the summaries.

In the second part of the talk, I will discuss new approaches to compressive summarization, moving a step closer to abstractive summarization. Using a pipeline compression and summarization framework, I will show how to create a summary guided compression module and to use it for generating multiple compression candidates. Compressed summary sentences are selected from these candidates by the application of a subsequent ILP system.

Finally, I will introduce a graph-cut based method for joint compression and summarization. This is more efficient than the standard ILP-based joint compressive summarization method, and has the flexibility to incorporate grammar constraints in order to generate summaries with better readability. I will present various experimental results to demonstrate the effectiveness of our approaches.

About the Speaker: Dr. Yang Liu is currently an Associate Professor in the Computer Science Department at the University of Texas at Dallas (UTD). She received her B.S. and M.S degree from Tsinghua University, and Ph.D from Purdue University in 2004. She was a researcher at the International Computer Science Institute (ICSI) at Berkeley for 3 years before she joined UTD as an assistant professor in 2005. Dr. Liu's research interest is in speech and natural language processing. She has published over 100 papers in this field. Dr. Liu received the NSF CAREER award in 2009 and the Air Force Young Investigator award in 2010. She is currently an Associate editor of IEEE Transactions on Audio, Speech, and Language Processing; ACM Transactions on Speech and Language Processing; ACM Transactions on Asian Language Information Processing; and Speech Communication.

10/9/2013: Text Analysis and Social Science: Learning to Extract International Relations from the News

Speaker: Brendan O'Connor, Carnegie Mellon University
Time: Wednesday, October 9, 2013, 11:00 AM
Venue: AVW 3258

What can text analysis tell us about society? Enormous corpora of news, social media, and historical documents record events, beliefs, and culture. Automated text analysis is interesting since it scales to large data sets, and can assist in discovering patterns and themes. My research develops practical and scientifically rigorous text analysis methods that can help answer research questions in sociolinguistics and political science.

For this talk I'll focus on our work on events and international politics. Political scientists are interested in studying international relations through *event data*: time series records of who did what to whom, as described in news articles. Rule-based information extraction systems have been used for 20 years to study these phenomena. We develop a dynamic logistic normal statistical model for unsupervised learning of event classes and political dynamics from news text. It learns what verbs and textual descriptions correspond to different types of diplomatic and military interactions between countries, and simultaneously infers the time-series of interactions between countries. Unlike a topic model, it leverages syntactic parsing and argument structure, which is critical in this domain. Using a parsed corpus of several million news articles over 15 years, we evaluate how well its learned event classes match ones defined by experts in previous work, how well its inferences about countries correspond to real-world conflict, and conduct a qualitative case study illustrating its inferences for the recent history of Israeli-Palestinian relations.

This is joint work with Brandon M. Stewart (Harvard University) and Noah A. Smith (CMU). Publication (ACL 2013) and more information here: http://brenocon.com/irevents/

About the Speaker: Brendan O'Connor (http://brenocon.com/) is a 5th year Ph.D. Candidate in Carnegie Mellon University's Machine Learning Deptartment. He is interested in machine learning and natural language processing, especially when informed by or applied to the social sciences. In the past he has interned in the Facebook Data Science group, and worked on crowdsourcing (Crowdflower / Dolores Labs) and "semantic" search (Powerset). His undergraduate degree was Symbolic Systems.

10/16/2013: Using Semantics to help learn Phonetic Categories

Speaker: Stella Frank, University of Edinburgh
Time: Wednesday, October 16, 2013, 11:00 AM
Venue: AVW 3258

Computational models of language acquisition seek to replicate human linguistic learning capabilities, such as an infant's ability to identify the relevant sound categories in a language, given similar inputs. In this talk I will present some on-going work which extends a Bayesian model of phonetic categorisation (Feldman et al., 2013). The original model learns a lexicon as well as phonetic categories, incorporating the constraint that phonemes appear in word contexts. However, it has trouble separating minimal pairs (such as 'cat'/'caught'/'kite'). The proposed extension adds further information via situational context information, a form of weak semantics or world knowledge, to disambiguate potential minimal pairs. I will present our current results and discuss potential next steps.

About the Speaker: Stella Frank is currently a postdoc at the University of Edinburgh, from whence she received a PhD in Informatics in 2013. Her research interests lie in computational modelling of language acquisition using unsupervised Bayesian modelling techniques.

10/23/2013: Towards Minimizing the Annotation Cost of Certified Text Classification

Speaker: Mossaab Bagdouri, University of Maryland
Time: Wednesday, October 23, 2013, 11:00 AM
Venue: AVW 3258

The common practice of testing a sequence of text classifiers learned on a growing training set, and stopping when a target value of estimated effectiveness is first met, introduces a sequential testing bias. In settings where the effectiveness of a text classifier must be certified (perhaps to a court of law), this bias may be unacceptable. The choice of when to stop training is made even more complex when, as is common, the annotation of training and test data must be paid for from a common budget: each new labeled training example is a lost test example. Drawing on ideas from statistical power analysis, we present a framework for joint minimization of training and test annotation that maintains the statistical validity of effectiveness estimates, and yields a natural definition of an optimal allocation of annotations to training and test data. We identify the development of allocation policies that can approximate this optimum as a central question for research. We then develop simulation-based power analysis methods for van Rijsbergen's F-measure, and incorporate them in four baseline allocation policies which we study empirically. In support of our studies, we develop a new analytic approximation of confidence intervals for the F-measure that is of independent interest.

10/30/2013: Teaching machines to read for fun and profit

Speaker: Gary Kazantsev, Bloomberg LP
Time: Wednesday, October 30, 2013, 11:00 AM
Venue: AVW 4172

11/08/2013: Computational style analysis, with practical applications to automatic summarization

Speaker: Ani Nenkova, University of Pennsylvania
Time: Friday, November 8, 2013, 11:00 AM
Venue: AVW 4172

Natural language research is often equated with attempts to derive structure and meaning from unstructured data. Given this focus, computational treatment of style has largely remained unexplored. In this talk I will argue that elements of style such as the redundancy in text, the level of specificity or its entertaining effect, affect the performance of standard systems and that good approaches to computational style analysis will be beneficial for such systems.

I will first briefly present our findings from a corpus study of local coherence motivating the need for style analysis. Theories of discourse relations and entity coherence fail to explain local coherence for a large portion of newspaper text, with stylistic factors often involved in the cases where standard theories do not apply.

Next I will present our work on developing measures of one particular element of style, text specificity. I will discuss how we successfully developed a classifier for sentence specificity. The classifier allows us to analyze specificity in summaries produced by people and machines and reveals that machine summaries are overly specific. Furthermore analysis of sentence compression data shows that when summarizing people often edit a specific sentence in the source into general one for the summary, indicating that specificity is a suitable objective for compression systems that will naturally lead to the need for compression.

I will conclude with a brief overview of other style-related tasks which affect the performance of summarization systems.

About the Speaker: Ani Nenkova is an Assistant Professor of Computer and Information Science at the University of Pennsylvania. Her main areas of research are automatic text summarisation, affect recognition and text quality. She obtained her PhD degree in Computer Science from Columbia University in 2006. She also spent a year and a half as a postdoctoral fellow at Stanford University before joining Penn in Fall 2007. Ani and her collaborators are recipients of the best student paper award at SIGDial in 2010 and best paper award at EMNLP in 2012. She received an NSF CAREER award in 2010. The Penn team co-led by Ani won the audio-visual emotion recognition challenge (AVEC) for word-level prediction in 2012. Ani was a member of the editorial board of Computational Linguistics (2009--2011) and has served as an area chair/senior program committee member for ACL (2013, 2012, 2010), NAACL (2010, 2007), AAAI (2013) and IJCAI (2011).

11/13/2013: Title TBA

Speaker: Miles Osborne, University of Edinburgh
Time: Wednesday, November 13, 2013, 11:00 AM
Venue: AVW 3258

12/04/2013: Title TBA

Speaker: Karen Livescu, Toyota Technological Institute at Chicago
Time: Friday, December 4, 2013, 11:00 AM
Venue: AVW 4172

12/06/2013: Title TBA

Speaker: Yejin Choi, Stony Brook University
Time: Friday, December 6, 2013, 11:00 AM
Venue: AVW 4172

Events

Computational Linguistics and Information Processing

Revision as of 23:47, 25 October 2013 by Jimmylin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

9/4/2013 and 9/11/2013: N-Minute Madness

9/18/2013: CLIP Lab Meeting

9/25/2013: Spatio-Temporal Crime Prediction using GPS- and Time-Tagged Tweets

10/2/2013: Development of a Term Weighting Formula for Search Result Ranking

10/4/2013: Recent Advances in Automatic Summarization

10/9/2013: Text Analysis and Social Science: Learning to Extract International Relations from the News

10/16/2013: Using Semantics to help learn Phonetic Categories

10/23/2013: Towards Minimizing the Annotation Cost of Certified Text Classification

10/30/2013: Teaching machines to read for fun and profit

11/08/2013: Computational style analysis, with practical applications to automatic summarization

11/13/2013: Title TBA

12/04/2013: Title TBA

12/06/2013: Title TBA

Previous Talks

Events

Computational Linguistics and Information Processing

Revision as of 23:47, 25 October 2013 by Jimmylin (talk | contribs)(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

9/4/2013 and 9/11/2013: N-Minute Madness

9/18/2013: CLIP Lab Meeting

9/25/2013: Spatio-Temporal Crime Prediction using GPS- and Time-Tagged Tweets

10/2/2013: Development of a Term Weighting Formula for Search Result Ranking

10/4/2013: Recent Advances in Automatic Summarization

10/9/2013: Text Analysis and Social Science: Learning to Extract International Relations from the News

10/16/2013: Using Semantics to help learn Phonetic Categories

10/23/2013: Towards Minimizing the Annotation Cost of Certified Text Classification

10/30/2013: Teaching machines to read for fun and profit

11/08/2013: Computational style analysis, with practical applications to automatic summarization

11/13/2013: Title TBA

12/04/2013: Title TBA

12/06/2013: Title TBA

Previous Talks

Revision as of 23:47, 25 October 2013 by Jimmylin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)