CCC Colloquium: Yasin Silva (February 22, 2010)
Similarity-aware Query Processing and Optimization
Yasin N. Silva (Purdue University)
Monday, February 22, 2010
10am, Hornbake 2119
Talk slides: PDF (1.56 MB)
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological applications, require or can significantly benefit from the identification and processing of similarities in the data. Even though some work has been done to extend the semantics of some operations, e.g., join and selection, to be aware of data similarities; there has not been much study on the role, interaction, and implementation of similarity-aware operations as first-class database operators. The focus of the work presented in this talk is the proposal and study of several similarity-aware database operators and a systematic analysis of their: role as physical operators, interactions, optimizations, and implementation techniques. In this talk, we will presents the core research questions driving our research work and describe in detail two classes of similarity-aware operators: Similarity Group-by and Similarity Join. We will describe multiple optimization techniques for the introduced operators. Specifically, we present: (1) multiple non-trivial equivalence rules that enable similarity query transformations, (2) Eager and Lazy aggregation transformations for Similarity Group-by and Similarity Join to allow pre-aggregation before potentially expensive joins, and (3) techniques to use materialized views to answer similarity-based queries. We will also present the main guidelines to implement the presented operators as integral components of a DBMS query engine and some of the key performance evaluation results of this implementation in an open source DBMS (PostgreSQL). In addition, we will present the way the proposed similarity-aware operators are efficiently used to answer more useful and complex business questions in a decision support system.
About the Speaker
Yasin N. Silva is a Ph.D. candidate in the Computer Science Department at Purdue University. He received his B.S. in Computer Science from the Pontificia Universidad Catolica, Peru (2000) and his M.S. in Computer Science from Purdue University (2006). He also completed the Applied Management Principles program at the Krannert School of Management (2008). Yasin's research areas deal with data management systems and privacy preservation in general. More specifically, he has been working on the areas of query processing and optimization, privacy assurance in database systems, web-scale data management systems - cloud computing, and scientific database systems. Yasin is currently working on his PhD dissertation, entitled: "Similarity-aware Query Processing and Optimization". Yasin's previous work experience includes internships at IBM Research and Microsoft Research. Yasin is a student representative of AMIGOS, a Purdue University association that supports Latinos and other under-represented minorities in computer science. He served as the Purdue University coordinator of the Latinos in Academic Advancement Program and co-owns a company that offers several cultural programs aimed to create awareness and acceptance of the Hispanic community. Yasin received the Graduate Student Award for Outstanding Teaching, the Siemens Corporation Scholarship, and the Motorola Scholarship for Entrepreneurship. He was also inducted into Upsilon Pi Epsilon, the International Honor Society for the Computer Sciences.
This talk is open to the public and will take place in the Hornbake Building, South Wing, at the University of Maryland, College Park. Directions to campus can be found here and campus maps can be found here.