Actions

Events: Difference between revisions

Computational Linguistics and Information Processing

No edit summary
(57 intermediate revisions by 8 users not shown)
Line 1: Line 1:
<center>[[Image:colloq.jpg|center|504px|x]]</center>
<center>[[Image:colloq.jpg|center|504px|x]]</center>


== CLIP Colloquium ==


The CLIP Colloquium is a weekly speaker series organized and hosted by CLIP Lab. The talks are open to everyone. Most talks are held at 11AM in AV Williams 3258 unless otherwise noted. Typically, external speakers have slots for one-on-one meetings with Maryland researchers before and after the talks; contact the host if you'd like to have a meeting.
The CLIP Colloquium is a weekly speaker series organized and hosted by CLIP Lab. The talks are open to everyone. Most talks are held on Wednesday at 11AM in AV Williams 3258 unless otherwise noted. Typically, external speakers have slots for one-on-one meetings with Maryland researchers.


If you would like to get on the cl-colloquium@umiacs.umd.edu list or for other questions about the colloquium series, e-mail [mailto:jimmylin@umd.edu Jimmy Lin], the current organizer.
If you would like to get on the clip-talks@umiacs.umd.edu list or for other questions about the colloquium series, e-mail [mailto:oard@umiacs.umd.edu Doug Oard], the current organizer.


For up-to-date information, see the [https://talks.cs.umd.edu/lists/7 UMD CS Talks page].  (You can also subscribe to the calendar there.)


{{#widget:Google Calendar
=== Colloquium Recordings ===
|id=lqah25nfftkqi2msv25trab8pk@group.calendar.google.com
* [[Colloqium Recording (Fall 2020)|Fall 2020]]
|color=B1440E
* [[Colloqium Recording (Spring 2021)|Spring 2021]]
|title=Upcoming Talks
|view=AGENDA
|height=300
}}


=== Previous Talks ===
* [[https://talks.cs.umd.edu/lists/7?range=past Past talks, 2013 - present]]
* [[CLIP Colloquium (Spring 2012)|Spring 2012]]  [[CLIP Colloquium (Fall 2011)|Fall 2011]]  [[CLIP Colloquium (Spring 2011)|Spring 2011]]  [[CLIP Colloquium (Fall 2010)|Fall 2010]]


== CLIP NEWS  ==


== 01/30/2013: Human Translation and Machine Translation ==
* News about CLIP researchers on the UMIACS website [http://www.umiacs.umd.edu/about-us/news]
 
* Please follow us on Twitter @umdclip [https://twitter.com/umdclip?lang=en]
'''Speaker:''' [http://homepages.inf.ed.ac.uk/pkoehn/ Philipp Koehn],  University of Edinburgh<br/>
'''Time:''' Wednesday, January 30, 2013, 11:00 AM<br/>
'''Venue:''' AVW 3258<br/>
 
Despite all the recent successes of machine translation, when it
comes to high quality publishable translation, human translators
are still unchallenged. Since we can't beat them, can we help
them to become more productive? I will talk about some recent
work on developing assistance tools for human translators.
You can also check out a prototype [http://www.caitra.org/ here]
and learn about our ongoing European projects [http://www.casmacat.eu/ CASMACAT]
and [http://www.matecat.com/ MATECAT].
 
'''About the Speaker:''' Philipp Koehn is Professor of Machine Translation at the
School of Informatics at the University of Edinburgh, Scotland.
He received his PhD at the University of Southern California
and spent a year as postdoctoral researcher at MIT.
He is well-known in the field of statistical machine translation
for the leading open source toolkit Moses, the organization
of the annual Workshop on Statistical Machine Translation
and its evaluation campaign as well as the Machine Translation
Marathon. He is founding president of the ACL SIG MT and
currently serves a vice president-elect of the ACL SIG DAT.
He has published over 80 papers and the textbook in the
field. He manages a number of EU and DARPA funded
research projects aimed at morpho-syntactic models, machine
learning methods and computer assisted translation tools.
 
== 02/06/2013: A New Recommender System for Large-scale Document Exploration ==
 
'''Speaker:''' [http://www.cs.cmu.edu/~chongw/ Chong Wang],  Carnegie Mellon University<br/>
'''Time:''' Wednesday, February 6, 2013, 11:00 AM<br/>
'''Venue:''' AVW 3258<br/>
 
How can we help people quickly navigate the vast amount of data
and acquire useful knowledge from it? Recommender systems provide
a promising solution to this problem. They narrow down the search
space by providing a few recommendations that are tailored to
users' personal preferences. However, these systems usually work
like a black box, limiting further opportunities to provide more
exploratory experiences to their users.
 
In this talk, I will describe how we build a new recommender
system for document exploration. Specially, I will talk about two
building blocks of the system in detail. The first is about a new
probabilistic model for document recommendation that is both
predictive and interpretable. It not only gives better predictive
performance, but also provides better transparency than
traditional approaches. This transparency creates many new
opportunities for exploratory analysis---For example, a user can
manually adjust her preferences and the system responds to this
by changing its recommendations. Second, building a recommender
system like this requires learning the probabilistic model from
large-scale empirical data. I will describe a scalable approach
for learning a wide class of probabilistic models that include
our recommendation model as a special case.
 
'''About the Speaker:''' Chong is a Project Scientist in Eric Xing's group, Machine Learning Department, Carnegie Mellon University.  His PhD advisor was David M. Blei from Princeton University.
 
== 02/13/2013: Computational Modeling of Sociopragmatic Language Use in Arabic and English Social Media ==
 
'''Speaker:''' [http://www1.ccls.columbia.edu/~mdiab/ Mona Diab], Columbia University<br/>
'''Time:''' Wednesday, February 13, 2013, 11:00 AM<br/>
'''Venue:''' AVW 3258<br/>
 
Social media language is a treasure trove for mining and understanding human interactions. In discussion fora, people naturally form groups and subgroups aligning along points of consensus and contention. These subgroup formations are quite nuanced as people could agree on some topic such as liking the movie the matrix, but some within that group might disagree on rating the acting skills of Keanu Reeves. Languages manifest these alignments exploiting  interesting sociolinguistic devices in different ways. In this talk, I will present our work on subgroup modeling and detection in both Arabic and English social media language. I will share with you our experiences with modeling both explicit and implicit attitude using high and low dimensional feature modeling. This work is the beginning of an interesting exploration into the realm of building computational  models of some aspects of the sociopragmatics of human language with the hopes that this research could lead to a  better understanding of human interaction.
 
'''About the Speaker:''' Mona Diab is an Associate Professor of Computer Science at the George Washington University. She is also a cofounder of the CADIM (Columbia Arabic Dialect Modeling) group at Columbia University. Mona earned her PhD in Computational Linguistics from University of Maryland College Park with Philip Resnik in 2003 and then did her postdoctoral training with Daniel Jurafsky at Stanford University where she was part of the NLP group.  from 2005 till 2012, before joining GWU in Jan of 2013, Mona held the position of Research Scientist/Principle Investigator at Columbia University Center for Computational Learning Systems (CCLS). Mona's research  interests span computational lexical semantics, multilingual processing (with a special interest in Arabic and low resource languages), unsupervised learning for NLP, computational sociopragmatic modeling, information extraction and machine translation. Over the past 9 years, Mona has developed significant expertise in modeling low resource languages with a focus on Arabic dialect processing. She is especially interested in ways to leverage existing rich resources to inform algorithms for processing low resource languages. Her research has been published in over 90 papers in various internationally recognized scientific venues. Mona serves as the current elected President of the ACL SIG on Semitic Language Processing, she is also the elected Secretary for the ACL SIG on issues in the Lexicon (SIGLEX). She also serves on the NAACL board as an elected member. Mona recently (2012) co-founded  the yearly *SEM conference that attempts to bring together all aspects of semantic processing under the same umbrella venue.
 
== 02/14/2013: Efficient Probabilistic Models for Rankings and Orderings ==
 
'''Speaker:''' [http://stanford.edu/~jhuang11/ Jon Huang], Stanford University<br/>
'''Time:''' Thursday, February 14, 2013, 11:00 AM<br/>
'''Venue:''' AVW 3258<br/>
 
The need to reason probabilistically with rankings and orderings arises
in a number of real world problems.  Probability distributions over
rankings and orderings arise naturally, for example, in preference data,
and political election data, as well as a number of less obvious
settings such as topic analysis and neurodegenerative disease
progression modeling. Representing distributions over the space of all
rankings is challenging, however, due to the factorial number of ways to
rank a collection of items.  The focus of my talk is to discuss methods
for combatting this factorial explosion in probabilistic representation
and inference.
 
Ordinarily, a typical machine learning method for dealing with
combinatorial complexity might be to exploit conditional independence
relations in order to decompose a distribution into compact factors of a
graphical model.  For ranked data, however, a far more natural and
useful probabilistic relation is that of `riffled independence'.  I will
introduce the concept of riffled independence and discuss how these
riffle independent relations can be used to decompose a distribution
over rankings into a product of compactly represented factors.  These
so-called hierarchical riffle-independent distributions are particularly
amenable to efficient inference and learning algorithms and in many
cases lead to intuitively interpretable probabilistic models. To
illustrate the power of exploiting riffled independence, I will discuss
a few applications, including Irish political election analysis,
visualizing the japanese preferences of sushi types and modeling the
progression of Alzheimer's disease, showing results on real datasets in
each problem.
 
This is joint work with Carlos Guestrin (University of Washington),
Ashish Kapoor (Microsoft Research) and Daniel Alexander (University
College London).
 
== 02/27/2013: Building Scholarly Methodologies with Large-Scale Topic Analysis ==
 
'''Speaker:''' [http://www.cs.princeton.edu/~mimno/ David Mimno], Princeton University<br/>
'''Time:''' Wednesday, February 27, 2013, 9:00 AM<br/>
'''Venue:''' Hornbake (South Wing) Room 2119<br/>
 
'''NOTE SPECIAL TIME AND LOCATION!!!'''
 
In the last ten years we have seen the creation of massive digital text collections, from Twitter feeds to million-book libraries, all in dozens of languages. At the same time, researchers have developed text mining methods that go beyond simple word frequency analysis to uncover thematic patterns. When we combine big data with powerful algorithms, we enable analysts in many different fields to enhance qualitative perspectives with quantitative measurements. But these methods are only useful if we can apply them at massive scale and distinguish consistent patterns from random variations. In this talk I will describe my work building reliable topic modeling methodologies for humanists, social scientists and science policy officers.
 
'''About the Speaker:''' David Mimno is a postdoctoral researcher in the Computer Science department at Princeton University. He received his PhD from the University of Massachusetts, Amherst. Before graduate school, he served as Head Programmer at the Perseus Project, a digital library for cultural heritage materials, at Tufts University. He is supported by a CRA Computing Innovation fellowship.
 
== 03/13/2013: Is Any Politics Local? An Automated Analysis of Mayoral and Gubernatorial Addresses ==
 
'''Speaker:''' [http://explore.georgetown.edu/people/dh335/ Dan Hopkins],  Georgetown University<br/>
'''Time:''' Wednesday, March 13, 2013, 11:00 AM<br/>
'''Venue:''' AVW 3258<br/>
 
Dubbed "laboratories of democracy," America's states and its large cities face a wide variety of public policy challenges.  But in a period of expanding federal authority and increased long-distance communication, the extent to which U.S. states and large cities pursue varying policy agendas is at once important and unknown.  This paper draws on techniques from automated content analysis to measure the major topics in more than 500 "State of the State" and "State of the City" addresses given by American executive officials since 2000.  Drawing on the Correlated Topic Model (Blei and Lafferty 2006) and other approaches to topic modeling, it demonstrates that big-city mayors do address a distinctive set of topics from their counterparts in state capitols, but one that is surprisingly consistent across cities.  Knowing a mayor's political party provides little leverage on the topics he or she is likely to highlight, while the same is true for objective indicators such as economic conditions or the city's crime rate.  At the state level, partisanship proves more predictive of the topics addressed by Governors, but there, too, institutional responsibilities constrain leaders to emphasize a broad and similar set of issues.  American political institutions inscribe a substantial role for geographic and institutional differences, but the policy agendas of America's states and largest cities are homogeneous and overlapping.     
 
'''About the Speaker:''' Daniel J. Hopkins is an Assistant Professor of Government at Georgetown University whose research focuses on American politics, with a special emphasis on political behavior, urban and local politics, racial and ethnic politics, and statistical methods.  Specifically, his research has addressed issues including the role of rhetoric and of local contexts in shaping political behavior.  It has also involved the development and application of automated techniques for analyzing political rhetoric.  Professor Hopkins' work has appeared in a variety of scholarly and popular outlets, including the American Political Science Review, the American Journal of Political Science, the Journal of Politics, and The Washington Post.  Professor Hopkins received his Ph.D. from Harvard University in 2007.
 
== 03/27/2013: Corpora and Statistical Analysis of Non-Linguistic Symbol Systems ==
 
'''Speaker:''' Richard Sproat, Google New York<br/>
'''Time:''' Wednesday, March 27, 2013, 11:00 AM<br/>
'''Venue:''' AVW 3258<br/>
 
We report on the creation and analysis of a set of corpora of non-linguistic symbol systems.
The resource, the first of its kind, consists of data from seven systems, both ancient and modern,
with four further systems under development, and several others planned. The systems represent
a range of types, including heraldic systems, formal systems, and systems that are mostly or purely
decorative. We also compare these systems statistically with a large set of linguistic systems, which
also range over both time and type.
 
We show that none of the measures proposed in published work by Rao and colleagues (Rao et al., 2009a; Rao, 2010)
or Lee and colleagues (Lee et al., 2010a) works. In particular, Rao’s entropic measures are evidently useless when
one considers a wider range of examples of real non-linguistic symbol systems. And Lee’s measures, with the cutoff
values they propose, misclassify nearly all of our non-linguistic systems. However, we also show that one of Lee’s
measures, with different cutoff values, as well as another measure we develop here, do seem useful. We further
demonstrate that they are useful largely because they are both highly correlated with a rather trivial feature:
mean text length.
 
'''About the Speaker:''' Richard Sproat received his Ph.D. in Linguistics from the Massachusetts
Institute of Technology in 1985. He has worked at AT&T Bell Labs, at
Lucent's Bell Labs and at AT&T Labs -- Research, before joining the faculty of
the University of Illinois. From there he moved to the Center for Spoken
Language Understanding at the Oregon Health & Science University. In the Fall of
2012 he moved to Google, New York as a Research Scientist.
 
Sproat has worked in numerous areas relating to language and computational
linguistics, including syntax, morphology, computational morphology,
articulatory and acoustic phonetics, text processing, text-to-speech synthesis,
and text-to-scene conversion. Some of his recent work includes multilingual
named entity transliteration, the effects of script layout on readers'
phonological awareness, and tools for automated assessment of child language. At
Google he works on multilingual text normalization and finite-state methods for
language processing. He also has a long-standing interest in writing systems and
symbol systems more generally.
 
 
== 04/10/2013: Learning with Marginalized Corrupted Features ==
 
'''Speaker:''' [http://www.cse.wustl.edu/~kilian/ Kilian Weinberger],  Washington University in St. Louis<br/>
'''Time:''' Wednesday, April 10, 2013, 11:00 AM<br/>
'''Venue:''' AVW 3258<br/>
 
If infinite amounts of labeled data are provided, many machine learning algorithms become perfect. With finite amounts of data, regularization or priors have to be used to introduce bias into a classifier. We propose a third option: learning with marginalized corrupted features. We corrupt existing data as a means to generate infinitely many additional training samples from a slightly different data distribution -- explicitly in a way that the corruption can be marginalized out in closed form. This leads to machine learning algorithms that are fast, effective and naturally scale to very large data sets. We showcase this technology in two settings: 1. to learn text document representations from unlabeled data and 2. to perform supervised learning with closed form gradient updates for empirical risk minimization.
 
Text documents (and often images) are traditionally expressed as bag-of-words feature vectors (e.g. as tf-idf). By training linear denoisers that recover unlabeled data from partial corruption, we can learn new data-specific representations. With these, we can match the world-record accuracy on the Amazon transfer learning benchmark with a simple linear classifier. In comparison with the record holder (stacked denoising autoencoders) our approach shrinks the training time from several days to a few minutes.
 
Finally, we present a variety of loss functions and corrupting distributions, which can be applied out-of-the-box with empirical risk minimization. We show that our formulation leads to significant improvements in document classification tasks over the typically used l_p norm regularization. The new learning framework is extremely versatile, generalizes better, is more stable during test-time (towards distribution drift) and only adds a few lines of code to typical risk minimization. 
 
'''About the Speaker:''' Kilian Q. Weinberger is an Assistant Professor in the Department of Computer Science & Engineering at Washington University in St. Louis. He received his Ph.D. from the University of Pennsylvania in Machine Learning under the supervision of Lawrence Saul. Prior to this, he obtained his undergraduate degree in Mathematics and Computer Science at the University of Oxford. During his career he has won several best paper awards at ICML, CVPR and AISTATS. In 2011 he was awarded the AAAI senior program chair award and in 2012 he received the NSF CAREER award. Kilian Weinberger's research is in Machine Learning and its applications. In particular, he focuses on high dimensional data analysis, metric learning, machine learned web-search ranking, transfer- and multi-task learning as well as bio medical applications.
 
== 04/24/2013: Matthew Gerber ==
 
TBA
 
 
== Previous Talks ==
* [[CLIP Colloquium (Fall 2012)|Fall 2012]]
* [[CLIP Colloquium (Spring 2012)|Spring 2012]]
* [[CLIP Colloquium (Fall 2011)|Fall 2011]]
* [[CLIP Colloquium (Spring 2011)|Spring 2011]]
* [[CLIP Colloquium (Fall 2010)|Fall 2010]]

Revision as of 18:21, 6 June 2021

x

CLIP Colloquium

The CLIP Colloquium is a weekly speaker series organized and hosted by CLIP Lab. The talks are open to everyone. Most talks are held on Wednesday at 11AM in AV Williams 3258 unless otherwise noted. Typically, external speakers have slots for one-on-one meetings with Maryland researchers.

If you would like to get on the clip-talks@umiacs.umd.edu list or for other questions about the colloquium series, e-mail Doug Oard, the current organizer.

For up-to-date information, see the UMD CS Talks page. (You can also subscribe to the calendar there.)

Colloquium Recordings

Previous Talks

CLIP NEWS

  • News about CLIP researchers on the UMIACS website [1]
  • Please follow us on Twitter @umdclip [2]