Actions

DSMM: Data Science for Macro-Modeling with Financial and Economic Datasets: Difference between revisions

From datascience

 
(54 intermediate revisions by the same user not shown)
Line 1: Line 1:


== DSMM 2017 ==
[[File:Sponsor_images2.png]] 


=== Schedule ===
===COVID-19 UPDATE===
 
Due to the COVID-19 outbreak, SIGMOD 2020 will be an online event. In response, DSMM will be scheduled as an online event on June 14th starting at 11 a.m. EASTERN TIME. Call in details and the schedule are [https://docs.google.com/presentation/d/16F4fNv7TDMZfamIEBD1Lz8mjwD9gnt9TyqHD1A4MRV0/edit#slide=id.p here:]
'''9:00 a.m. to 10:30 a.m. - WELCOME and KEYNOTE and FEIII 2017 REPORT'''
 
'''11:00 a.m. to 12:30 p.m. - DSMM Long and Short Papers'''
Papers in [[2017-Session-1]]
 
'''12:30 p.m.  to 2:00 p.m. - LUNCH '''
Enjoy lunch and the FEIII participant posters!
 
'''2:00 p.m. to 3:30 p.m. - FEIII Participant Reports'''
Papers in [[2017-Session-2]]
'''4:00 p.m. to 6 p.m. - FEIII 2018 Planning'''
 
'''Accepted DSMM Papers'''
[[]]
 
'''Accepted FEIII Participant Reports'''
[[]]
 
===Overview===
 
The promise of Big Data, linked data and social data is the availability of large scale yet granular datasets to support modeling of complex ecosystems reflecting cyber-human decision making.  While complex data-driven models have emerged for climate modeling or systems biology, there has been less activity in macro-modeling with multiple heterogeneous economic or financial datasets. Such analytics requires dealing with multiple heterogeneous streams of data, each of which can be high in volume and variety and reflect varying degrees of veracity. The advent of Big Data infrastructures and analytical tools can support the required integration across these data sources, as well as macro-modeling with diverse datasets, and can potentially lead to the exploration of complex financial and economic ecosystems.
 
The financial world is a closely interlinked Web of financial entities and networks, supply chains and financial ecosystems. Financial analysts, regulators and academic researchers recognize they must address the unprecedented and unfamiliar challenges of monitoring, integrating, and analyzing such networks and ecosystems at scale. A researcher would have to process multiple heterogeneous data streams, extract relevant information, clean it, integrate information from distinct streams, perform entity resolution, and aggregate data before they can even begin their analysis.  Doing all of this creates a high barrier for financial and economic data science at scale. The benefits of addressing these challenges are immense and may result in improved tools for regulators to monitor financial systems or to set economic or fiscal policy. Additional benefits may include fundamentally new designs of market mechanisms, new ways to reach consumers, and new ways to exploit the wisdom of the crowds.
 
The DSMM 2017 workshop will explore the challenges of data science for macro-modeling with financial and/or economic datasets. The workshop will also showcase the '''Financial Entity Identification and Information Integration (FEIII) at Scale Challenge'''.  [https://ir.nist.gov/dsfin/]
 
Proceedings of ACM DSMM 2014 are available here: http://dl.acm.org/citation.cfm?id=2630729. The Proceedings of ACM DSMM 2016 are available here: http://dl.acm.org/citation.cfm?id=2951894
 
====Important Dates ====
 
{|
||
|-
|'''Submission deadline:'''  ||  '''Friday, March 3, 2017'''
|-
|Notification to authors: || Sunday, March 26, 2017
|-
|Camera-ready:  ||    Friday, April 14, 2017
|-
|Registration deadline: ||
|-
|Workshop:    ||            Sunday, May 14, 2017
|}
 
====Submission====
 
Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.
We will accept the following types of papers:
* Regular papers that are a maximum of 6 pages will have a presentation slot.
* Extended abstracts of up to 2 pages will have a poster presentation and a short presentation slot.
 
Manuscripts should be submitted electronically as PDF files and be formatted using the SIGMOD camera-ready templates [http://www.acm.org/sigs/publications/proceedings-templates templates].
Authors are allowed to include extra material beyond the six pages as a clearly marked appendix, which reviewers are not obliged to read.
 
'''Submission Site'''
[https://cmt3.research.microsoft.com/DSMM2017/]
 
===Organization===
 
====Program Chairs====
{|
|| ||
|-
|Doug Burdick  || IBM Research || drburdic@us.ibm.com
|-
|Rajasekar Krishnamurthy    ||  IBM Research||                        rajase@us.ibm.com
|-
|Louiqa Raschid||            University of Maryland||              louiqa@umiacs.umd.edu
|-
|}
 
====Steering Committee====
{|| ||
|-
|Laura Haas||   IBM Research|| lmhaas@us.ibm.com
|-
|H.V. Jagadish||   University of Michigan|| jag@umich.edu
|-
|Shiv Vaithyanathan|| IBM Research|| vaithyan@us.ibm.com
|}
 
====Program Committee====
{|
|-
|Elena Baralis||            Politecnico di Torino||                elena.baralis@polito.it
|-
|Don Berndt||              University of South Florida|| dberndt@usf.edu
|-
|Jefferson Braswell ||    Tahoe Blue||                                  ljb@TahoeBlue.com
|-
|Sanjiv Das||   Santa Clara University|| srdas@scu.edu
|-
|Amol Deshpande||            University of Maryland|| amol@cs.umd.edu
|-
|Mark Dredze|| John Hopkins University|| mdredze@cs.jhu.edu
|-
Gerard Hoberg|| USC||           hoberg@marshall.usc.edu
|-
Mark Flood||   Office of Financial Research||         mark.flood@ofr.treasury.gov
|-
|Juliana Freire|| New York University|| juliana.freire@nyu.edu
|-
|Vasant Honavar||            Pennsylvania State University||        vhonavar@ist.psu.edu
|-
|Panos Ipeirotis||              New York University|| panos@stern.nyu.edu
|-
|Joe Langsam||   University of Maryland|| jlangsam@rhsmith.umd.edu
|-
|Shawn Mankad||              Cornell University||                    spm263@cornell.edu
|-
|Felix Naumann || University of Potsdam|| felix.naumann@hpi.uni-potsdam.de
|-
|Frank Olken||              ||          frankolken@gmail.com
|-
|Kevin Sheppard|| Office of Financial Research|| kevin.sheppard@ofr.treasury.gov
|-
|Ian Soboroff || NIST || ian.soboroff@nist.gov
|-
|Roger Stein || CSRA and MIT || steinr@mit.edu
|-
|Kunpeng Zhang  ||            University of Maryland||              kzhang@rhsmith.umd.edu 
|-
|Jian Wu|| Pennsylvania State University||    jxw394@psu.edu
|}
 
== DSMM2016 ==
 
=== Schedule ===
 
'''8:30 a.m. to 10 a.m. - WELCOME and FEIII Challenge Year One Report and Financial News'''
Papers in [[2016-Session1]]
 
'''10:30 a.m. to 12 noon - Networks, Agents and Ontologies'''
Papers in [[2016-Session 2]]
'''12 noon to 1:30 p.m. - LUNCH ''' [[2016-list-of-posters]]
Enjoy lunch and the FEIII participant posters!
'''1:30 p.m. to 3 p.m. - SHORT papers and FEIII Participant Reports'''
Papers in [[2016-Session 3]]
 
'''3:30 p.m. to 5 p.m. - FEIII Year Two Planning'''
 
'''Accepted DSMM Papers'''
[[2016-list-of-papers]]
 
'''Accepted FEIII Participant Reports'''
[[2016-list-of-posters]]
 
===Overview===
 
The DSMM 2016 workshop will explore the challenges of data science for macro-modeling with financial and/or economic datasets. The workshop will also showcase a planned multi-year '''Financial Entity Identification and Information Integration (FEIII) at Scale Challenge'''.
 
The promise of Big Data, linked data and social data is the availability of large scale yet granular datasets to support modeling of complex ecosystems reflecting cyber-human decision making.  However, fully realizing this promise requires successful integration of heterogeneous data from a wide variety of data sources. While complex data-driven models have emerged for climate modeling or systems biology, there has been less activity in macro-modeling with multiple heterogeneous economic or financial datasets. Two trends are increasing opportunities for such macro-modeling of financial and economic ecosystems. First, public financial data is becoming increasingly available from a variety of sources, including WRDS’s CRSP, SEC EDGAR, and the Federal Reserve’s FRED.  Second, Big Data infrastructures and analytical tools to support the required integration across these data sources are becoming increasingly available.  Thus, an exploration of the data science challenges involved in such macro-modeling with financial and economic data is timely. Economists have had a successful history of using longitudinal datasets (US Census Bureau, Department of Labor, World Bank, etc.) to drive econometric and statistical research in finance and economics.  However, such analyses fail to completely address the compelling need to analyze complex ecosystems and supply chains in their entirety. Such analytics requires dealing with multiple heterogeneous streams of data, each of which can be high in volume and variety and reflect varying degrees of veracity.  Clearly, we have a classic big data challenge.
 
Although integrating datasets may pose technical and policy/privacy challenges, the potential benefits are immense. For example, social media data often contains features that could enhance macroeconomic statistics derived from traditional survey-driven datasets. The resulting enriched datasets could explore hypotheses with a different focus or level of granularity. The financial world is a closely interlinked Web of financial entities and networks, supply chains and financial ecosystems. Financial analysts, regulators and academic researchers recognize they must address the unprecedented and unfamiliar challenges of monitoring, integrating, and analyzing such networks and ecosystems at scale. A researcher would have to process multiple heterogeneous data streams, extract relevant information, clean it, integrate information from distinct streams, perform entity resolution, and aggregate data before they can even begin their analysis.  Doing all of this creates a high barrier for financial and economic data science at scale. The benefits of addressing these challenges are immense and may result in improved tools for regulators to monitor financial systems or to set economic or fiscal policy. Additional benefits may include fundamentally new designs of market mechanisms, new ways to reach consumers, and new ways to exploit the wisdom of the crowds.
 
'''Targeted Audience''': We expect attendees with an interest in information integration, data mining, knowledge representation, network and visual analytics, stream data processing, etc. The DSMM 2014 workshop, in conjunction with SIGMOD 2014, attracted a diverse group of researchers from databases, data modeling, finance, math/stat and economics. Proceedings of ACM DSMM 2014 are available here:
http://dl.acm.org/citation.cfm?id=2630729
 
====Important Dates====
{|
||
|-
|'''Submission deadline:'''  ||  '''Friday April 1, 2016'''
|-
|Notification to authors: ||Sunday May 1, 2016
|-
|Camera-ready:  ||    Friday May 13, 2016
|-
|Registration deadline: || Sunday May 15, 2016
|-
|Workshop:    ||            Friday July 1, 2016
|}
 
====Submission====
 
Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.
We will accept the following types of papers:
* Regular papers that are a maximum of 6 pages will have a presentation slot.
* Extended abstracts of up to 2 pages will have a poster presentation and a short presentation slot.
 
Manuscripts should be submitted electronically as PDF files and be formatted using the SIGMOD camera-ready templates [http://www.acm.org/sigs/publications/proceedings-templates templates].
Authors are allowed to include extra material beyond the six pages as a clearly marked appendix, which reviewers are not obliged to read.
 
'''Submission Site'''
https://cmt3.research.microsoft.com/DSMM2016/
 
===Organization===
 
====Program Chairs====
{|
|| ||
|-
|Doug Burdick  || IBM Research || drburdic@us.ibm.com
|-
|Rajasekar Krishnamurthy    ||  IBM Research||                        rajase@us.ibm.com
|-
|Louiqa Raschid||            University of Maryland||              louiqa@umiacs.umd.edu
|-
|}
 
====Steering Committee====
{|| ||
|-
|Laura Haas||   IBM Research|| lmhaas@us.ibm.com
|-
|H.V. Jagadish||   University of Michigan|| jag@umich.edu
|-
|Shiv Vaithyanathan|| IBM Research|| vaithyan@us.ibm.com
|}
 
====Program Committee====
{|
|-
|Sanjiv Das||   Santa Clara University|| srdas@scu.edu
|-
|Amol Deshpande||            University of Maryland|| amol@cs.umd.edu
|-
|Mark Flood||   Office of Financial Research||         mark.flood@ofr.treasury.gov
|-
|Gerard Hoberg||     USC                    ||  hoberg@marshall.usc.edu
|-
|Vasant Honavar||            Pennsylvania State University||        vhonavar@ist.psu.edu
|-
|Panos Ipeirotis||              New York University|| panos@stern.nyu.edu
|-
|Joe Langsam||   University of Maryland|| jlangsam@rhsmith.umd.edu
|-
|Shawn Mankad||              University of Maryland||              smankad@rhsmith.umd.edu     
|-
|Felix Naumann || University of Potsdam|| felix.naumann@hpi.uni-potsdam.de
|-
|Frank Olken||              National Science Foundation||          folken@nsf.gov
|-
|Kevin Sheppard|| Office of Financial Research|| kevin.sheppard@ofr.treasury.gov
|-
|Ian Soboroff || NIST || ian.soboroff@nist.gov
|-
|Roger Stein || CSRA and MIT || steinr@mit.edu
|-
|Kunpeng Zhang  ||            University of Maryland||              kzhang@rhsmith.umd.edu 
|}
 
== 2016 Financial Entity Identification and Information Integration (FEII) Challenges ==
 
 
The Financial Entity Identification and Information Integration (FEIII) Challenges are
open data challenges that focus on the financial domain. Sign up to participate,
download the data, follow the rules, submit your solution, and come talk about your
work and future challenges at a workshop.  Our first challenge is aligning identifiers
across databases.
  Visit us at the NIST FEIII Challenges home page [https://ir.nist.gov/dsfin/]
 
== DSMM 2014  ==
 
=== Schedule ===
 
'''8:30 a.m. to 10 a.m. - WELCOME and KEYNOTE and Opening Session on Financial Analytics'''
Papers in [[Session1]]
 
'''10:30 a.m. to 12 noon - Financial Data Integration Tools and Methods and POSTER SLAM'''
Papers in [[Session 2]]
'''12 noon to 1:30 p.m. - LUNCH in the Summit Room''' [[list-of-posters]]
Enjoy the view and the posters!
'''1:30 p.m. to 3 p.m. - Financial Networks and Games and Regulatory Data'''
Papers in [[Session 3]]
 
'''3:30 p.m. to 5 p.m. - DSfin Financial Entity Resolution At Scale CHALLENGE and WRAP-UP'''
[[Session 4]]
 
'''Accepted Papers'''
[[list-of-papers]]
 
'''Accepted Posters'''
[[list-of-posters]]


===Overview===
===Overview===


'''Focus of the Workshop'''
DSMM 2020 will explore the challenges of macro-modeling with financial and socio-economic datasets. The workshop will also showcase the '''Financial Entity Identification and Information Integration (FEIII) Challenge''' and will involve a challenge task over small business data.
The increasing availability of Open Data from a variety of sources including the Web, social media and the government, in conjunction with the growth of Big Data infrastructures and analytics tools, provides the ability to model complex ecosystems enabling cyber-human decision making. While data-driven models have emerged for a range of challenges from climate modeling to systems biology to personalized medicine, there has been relatively, little activity in macro-modeling using multiple heterogeneous financial and economic datasets.
[https://ir.nist.gov/feiii/]
 
The real promise of Open Data and Big Data lies in the dramatically increased value gained from integrating data from multiple sources, as illustrated by the following example: The systemic risks associated with the subprime lending market and the crash of the housing market in 2007 could have been modeled through a comprehensive integration and analysis of available public datasets. For example, the datasets relevant to the home mortgage supply chain include the following: (a) regulatory documents made available by MBS issuers, publicly traded financial institutions and mutual funds; (b) subscription-based third party datasets on underlying mortgages; (c) individual home transaction data such as sales, foreclosure and tax records; (d) local economic data such as employment and income-levels; (e) financial news articles. Integrating these datasets may have provided financial analysts, regulators and academic researchers, with comprehensive models to enable risk assessment.
 
Economists have been the leaders in creating longitudinal panel datasets and have had a successful history of using national datasets from the Census Bureau, the Department of Labor, etc., and global datasets from the UN, World Bank, etc. Here, too, there has been much less activity in modeling that integrated multiple heterogeneous datasets. While integrating datasets may pose technical, policy and privacy challenges, the potential benefits are immense.  For example, social media data often contains features that could enhance macroeconomic statistics derived from traditional survey-driven datasets.  Enriching longitudinal panel datasets with social media could explore hypotheses with a different focus or level of granularity; for example, one could study the decision making of individuals whose social media profiles would reflect their beliefs, intent, interests, sentiments, opinions, and state of mind. 
 
This workshop will explore the challenges of data science for macro-modeling with financial and/or economic datasets. Two workshops, in 2010 and 2012, brought together a diverse community of academic researchers, regulators and practitioners who articulated the range of multi-disciplinary research challenges for macro-prudential modeling of financial systemic risk. The National Bureau of Economic Research Summer Institute in 2012 offered a workshop on novel data-centric techniques that attracted both economists and computer scientists. The workshop will target attendees of these prior meetings and will build upon the solid foundation established at these prior events.
 
'''Targeted Audience''': We expect a mix of paper submissions and attendees with an interest in information integration, data mining, knowledge representation, stream data processing, etc. A small number of domain specialists from finance and economics are also expected to attend.
 
'''Important Dates'''
Submission deadline:    '''EXTENDED!!!''' Monday March 31, 2014. '''EXTENDED!!!'''
Notification to authors: Friday May 2, 2014.
Camera-ready due:        Friday May 23, 2014.
Registration deadline:
Workshop:                Friday June 27, 2014.
 
'''Submission Format'''
Authors are invited to submit original, unpublished research papers that are not being considered for publication in any other forum.
We will accept the following types of papers:
* Regular papers that are a maximum of 6 pages will have a presentation slot.
* Extended abstracts of up to 2 pages will have a poster presentation and a short presentation slot
    if time permits.
 
Manuscripts should be submitted electronically as PDF files and be formatted using the SIGMOD camera-ready templates [http://www.acm.org/sigs/publications/proceedings-templates templates].
Authors are allowed to include extra material beyond the six pages as a clearly marked appendix, which reviewers are not obliged to read.
 
'''Submission Site'''
https://cmt.research.microsoft.com/DSMM2014/
 
=== Organization ===
 
'''Program Chairs'''
{|
|| ||
|-
|Rajasekar Krishnamurthy    ||  IBM Research||                        rajase@us.ibm.com
|-
|Louiqa Raschid||            University of Maryland||              louiqa@umiacs.umd.edu
|-
|Shiv Vaithyanathan||        IBM Research||                        vaithyan@us.ibm.com
|-


|}
Past Proceedings are available here:
[https://dl.acm.org/doi/proceedings/10.1145/3401832 2020]
[https://dl.acm.org/citation.cfm?id=3336499 2019]
[https://dl.acm.org/citation.cfm?id=3220547 2018] [https://dl.acm.org/citation.cfm?id=3077240 2017]
[https://dl.acm.org/citation.cfm?id=2951894 2016] [https://dl.acm.org/citation.cfm?id=2630729 2014].
The engines of commerce and industry continuously generate rich heterogeneous data that reflect financial and economic activity. Unfortunately, this complex data is often not captured or curated in machine understandable form, or readily integrated across resources and data streams, presenting an obstacle for research, policy and industry use. The Business Open Knowledge Network (BOKN) is an effort to harness and exploit this data.  BOKN is envisioned as a shared resource of curated knowledge, with tools to support large-scale data analysis, and interfaces to allow access to additional repositories. BOKN will create unprecedented opportunities for financial and socio-economic research, will inform data-driven fiscal and economic policy, and will empower innovators and entrepreneurs.


'''Steering Committee'''
The DSMM workshop will explore technical challenges relevant to BOKN which includes combining state-of-the-art computational approaches for extracting, representing, linking, and analyzing data with complex and nuanced knowledge about the business domain. Domain-specific tools can leverage a wealth of unstructured data on the Web, as well as semi- structured data and time series datasets provided for regulatory or legal purposes, and reference datasets with standard identifiers and metadata that enable cross-resource federation. BOKN will include a hybrid knowledge graph that supports traditional symbolic knowledge representation enhanced by high-dimensional vector space embeddings capturing temporal evolution and semantic relationships that support machine learning applications.
{|| ||
|-
|Lise Getoor||   University of California Santa Cruz|| getoor@soe.ucsc.edu
|-
|Laura Haas||   IBM Research|| lmhaas@us.ibm.com
|-
|H.V. Jagadish||   University of Michigan|| jag@umich.edu
|-
|}


'''Program Committee'''
We expect attendees with an interest in information integration, data mining, knowledge representation, network and visual analytics, stream data processing, etc. to participate.
{|
|-
|Richard Anderson||   Lindenwood University|| rganderson.stl@gmail.com
|-
|Michael Cafarella||   University of Michigan|| michjc@umich.edu
|-
|Sanjiv Das||   Santa Clara University|| srdas@scu.edu
|-
|Amol Deshpande||            University of Maryland|| amol@cs.umd.edu
|-
|Mark Flood||   Office of Financial Research||         mark.flood@treasury.gov
|-
|Juliana Freire||   New York University||         juliana.freire@nyu.edu
|-
|Gerard Hoberg||   University of Maryland|| ghoberg@rhsmith.umd.edu
|-
|Vasant Honavar||            Pennsylvania State University||        vhonavar@ist.psu.edu
|-
|Joe Langsam||   University of Maryland|| jlangsam@rhsmith.umd.edu
|-
|Shawn Mankad||              University of Maryland||              smankad@rhsmith.umd.edu     
|-
|Frank Olken||              National Science Foundation||          folken@nsf.gov
|-
|Felix Naumann||   Hasso Plattner Institute, Germany||    felix.naumann@hpi.uni-potsdam.de
|-
|Christopher Ré||   Stanford University||                 chrismre@cs.stanford.edu
|-
|| ||
|-
|'''Webmaster'''
|-
|Peratham Wiriyathammabhum||University of Maryland||peratham@cs.umd.edu
|}

Latest revision as of 02:07, 17 June 2020

Sponsor images2.png

COVID-19 UPDATE

Due to the COVID-19 outbreak, SIGMOD 2020 will be an online event. In response, DSMM will be scheduled as an online event on June 14th starting at 11 a.m. EASTERN TIME. Call in details and the schedule are here:

Overview

DSMM 2020 will explore the challenges of macro-modeling with financial and socio-economic datasets. The workshop will also showcase the Financial Entity Identification and Information Integration (FEIII) Challenge and will involve a challenge task over small business data. [1]

Past Proceedings are available here: 2020 2019 2018 2017 2016 2014. The engines of commerce and industry continuously generate rich heterogeneous data that reflect financial and economic activity. Unfortunately, this complex data is often not captured or curated in machine understandable form, or readily integrated across resources and data streams, presenting an obstacle for research, policy and industry use. The Business Open Knowledge Network (BOKN) is an effort to harness and exploit this data. BOKN is envisioned as a shared resource of curated knowledge, with tools to support large-scale data analysis, and interfaces to allow access to additional repositories. BOKN will create unprecedented opportunities for financial and socio-economic research, will inform data-driven fiscal and economic policy, and will empower innovators and entrepreneurs.

The DSMM workshop will explore technical challenges relevant to BOKN which includes combining state-of-the-art computational approaches for extracting, representing, linking, and analyzing data with complex and nuanced knowledge about the business domain. Domain-specific tools can leverage a wealth of unstructured data on the Web, as well as semi- structured data and time series datasets provided for regulatory or legal purposes, and reference datasets with standard identifiers and metadata that enable cross-resource federation. BOKN will include a hybrid knowledge graph that supports traditional symbolic knowledge representation enhanced by high-dimensional vector space embeddings capturing temporal evolution and semantic relationships that support machine learning applications.

We expect attendees with an interest in information integration, data mining, knowledge representation, network and visual analytics, stream data processing, etc. to participate.