https://wiki.umiacs.umd.edu/adapt/api.php?action=feedcontributions&user=Scsong&feedformat=atomAdapt - User contributions [en]2024-03-29T13:11:58ZUser contributionsMediaWiki 1.39.6https://wiki.umiacs.umd.edu/adapt/index.php?title=Papers&diff=2742Papers2010-07-19T15:43:49Z<p>Scsong: </p>
<hr />
<div>==Papers==<br />
===2010===<br />
* S. Song and J. JaJa. Effective Strategies for Temporally Anchored Information Retrieval, :UMIACS-TR-2010-05. 2010, University of Maryland Institute for Advanced Computer Studies. [http://www.lib.umd.edu/drum/bitstream/1903/10108/1/UMIACS-TR-2010-05.pdf pdf]<br />
* S. Song and J. JaJa. Techniques to Audit and Certify the Long Term Integrity of Digital Archives, International Journal of Digital Libraries, 2010. [http://www.umiacs.umd.edu/~joseph/IJDL_published.pdf pdf]<br />
===2009===<br />
* J. JaJa and S. Song, Robust Tools and Services for Long-Term Preservation of Digital Information, Special Issue: Library of Congress National Digital Information and Preservation Parntership, 57:3, 580-594, 2009. [http://www.umiacs.umd.edu/~joseph/library-trends.pdf pdf]<br />
* Song, S. and JaJa, J. Search and Access Strategies for Web Archives. in Archiving 2009. 2009: IS&T. [[media:Archiving2009_submitted.pdf|pdf]]<br />
* Smorul, M. and JaJa, J. A Case Study in Distributed Collection Monitoring and Auditing Using the Audit Control Environment (ACE). in Archiving 2009. 2009: IS&T.<br />
* Smorul, M., Song, S., and JaJa, J. An Implementation of the Audit Control Environment(ACE) to Support the Long Term Integrity of Digital Archives. in DigCCurr 2009. 2009: University of North Carolina.[[media:DigCCurr2009_060909.pdf|pdf]]<br />
* JaJa, J., Smorul, M., and Song, S. Tools and Services for Long-Term Preservation of Digital Archives. in Indo-US Workshop on International Trends in Digital Preservation. 2009. Pune, India [[media:Indo-US-Workshop-jaja.pdf|pdf]]<br />
<br />
===2008===<br />
* Song, S. and JaJa, J. Fast Browsing of Archived Web Contents. in 8th International Web Archiving Workshop. 2008. Aarhus, Denmark. [[media:iwaw08_Song_JaJa_Final.pdf|pdf]]<br />
* Song, S. and JaJa, J., Archiving Temporal Web Information: Organization of Web Contents for Fast Access and Compact Storage:UMIACS-TR-2008-08. 2008, University of Maryland Institute for Advanced Computer Studies. [[media:temporal-web-archiving-final-umiacs-tr-2008-08.pdf|pdf]]<br />
<br />
===2007===<br />
* Smorul, M., McGann, M., and JaJa, J. PAWN: A Policy-Driven Environment for Implementing Producer-Archive Interactions in Support of Long Term Digital Preservation. in Archiving 2007. 2007: IS&T. [[media:Pawn-archiving07-uploaded.pdf|pdf]]<br />
* Song, S. and JaJa, J. ACE: A Novel Software Platform to Ensure the Integrity of Long Term Archives. in Archiving 2007. 2007: IS&T. [[media:rad71E67.pdf|pdf]]<br />
* Smorul, M., McGann, M., and JaJa, J. The Use of the Producer-Archive Workflow Network (PAWN) in Support of Customized Archival Practice. in DigCCurr 2007. 2007. Chapel Hill, North Carolina. [[media:Digccurr07-paper-final.pdf|pdf]]<br />
* Song, S. and JaJa, J., Web Archiving: Organizing Web Objects into Web Containers to Optimize Access:UMIACS-TR-2007-42. 2007, University of Maryland Institute for Advanced Computer Studies. [[media:OrganizingWebObjects_UMIACS-TR-2007_42.pdf|pdf]]<br />
<br />
===2006===<br />
* Geremew, M., Song, S., and JaJa, J. Using Scalable and Secure Web Technologies to Design a Global Format Registry Prototype: Architecture, Implementation, and Testing. in Proceedings of Archiving 2006. 2006. Ottawa, Canada: IS&T. [[media:archiving06_final.pdf|pdf]]<br />
<br />
===2005===<br />
* JaJa, J., Smorul, M., McCall, F., and Wang, Y. Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments. in The 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies. 2005. Monterey, California: IEEE Computer Society. [[media:msst-final5.pdf|pdf]]<br />
<br />
===2004===<br />
* JaJa, J., McCall, F., Smorul, M., Moore, Y., and Chadduck, R. Digital Archiving and Long Term Preservation: An Early Experience with Grid and Digital Library Technologies. [[media:thic-04.doc|doc]]<br />
* Smorul, M., JaJa, J., Wang, Y., and McCall, F., PAWN: Producer – Archive Workflow Network in Support of Digital Preservation:UMIACS-TR-2004-49. 2004. [[media:UMIACS-TR-2004-49.pdf|pdf]]<br />
<br />
===2003===<br />
* Smorul, M., JaJa, J., McCall, F., Brown, S.F., Moore, R., Marciano, R., Chen, S.-Y., Lopez, R., and Chadduck, R., Recovery of a Digital Image Collection Through the SDSC/UMD/NARA Prototype Persistent Archive:UMIACS-TR-2003-105. 2003. [[media:UMIACS-TR-2003-105.pdf|pdf]]<br />
<br />
==Presentations==<br />
===2010===<br />
* Smorul, M. ACE DPIF presentation at NIST, Mar 10[[media:ACE-NIST10.ppt|ppt]]<br />
===2009===<br />
* Smorul, M. ACE Presentation for GeoMapp yearly meeting August 09 [[media:Geomapp-ace.ppt|ppt]]<br />
* Smorul, M. and JaJa, J. Tools and Services for the Long Term Preservation and Access of Digital Archives, Sun Pasig, GW University [[media:Gw-pasig-09.ppt|ppt]]<br />
* Smorul, M. ACE Experiences, DigCcurr 2009 Chapel Hill NC [[media:Smorul-digccurr09.ppt|ppt]]<br />
* Song, S. and JaJa, J. Search and Access Strategies for Web Archives. in Archiving 2009. 2009: IS&T. [[media:webarc-slides-archiving-09-v2.ppt|ppt]]<br />
* JaJa, J., Smorul, M., and Song, S. Tools and Services for Long-Term Preservation of Digital Archives. in Indo-US Workshop on International Trends in Digital Preservation. 2009. Pune, India [[media:JaJa-Indo-US-Workshop-March2009.ppt|ppt]]<br />
<br />
===2008===<br />
* Sun Pasig Baltimore, ADAPT Tools overview [[media:SUN-PASIG-Nov08.ppt|ppt]]<br />
* ERA Research Project: Ingestion and Preservation Tools and Services (Partnerships in Innovation 2008) [[Media:JAJA-NARA-UMD-Conference-08.ppt|ppt]]<br />
* Fast Browsing of Archived Web Contents: International Web Archiving Workshop 2008 at Aarhus, Denmark (August 18-19, 2008) [[Media:IWAW2008.pdf|pdf]]<br />
* ACE: NDIIPP Partners Meeting at Arlington, VA (July 8-10, 2008) [[media:session4_chronopolis.ppt|ppt]]<br />
* NARA Presenation on recent research (6/08)<br />
** Overview [[media:NARA-June08-v2.ppt|ppt]]<br />
** PAWN [[media:pawn-0.7-nara-jun11.ppt|ppt]]<br />
** ACE [[media:ACEAuditingControlEnvironment.ppt|ppt]]<br />
** Replication Monitor [[media:ReplicationMonitoring.ppt|ppt]]<br />
* Technical Overview of Processes [[media: NewPawnProcess.ppt|ppt]]<br />
* Midwest Archives Conference (4/08) [[media: 2008MAC-Smorul.ppt|ppt]]<br />
<br />
===2007===<br />
* PAWN, NAGARA Kansas City Meeting (7/07) [[media: PAWN-nagara-2-smorul.ppt|ppt]]<br />
* ACE Presentation: NDIIPP and NARA review (7/07) [[media: ACE-July26.ppt|ppt]]<br />
* July project review, PAWN update (7/07) [[media: PAWN-updates-07-24.ppt|ppt]]<br />
* PAWN Presentation at Archiving 2007 (5/07) [[media: PAWN-Archiving07-v1.ppt|ppt]]<br />
* ACE Presentation at Archiving 2007 (5/07) [[media: ACE-Archiving2007.ppt|ppt]]<br />
* Long Term Sustainment of Digital Information for Science and Engineering, NIST Interoperability Week(04/07) [[media: ADAPT-NIST-04-07.ppt|ppt]]<br />
* DigCCurr 2007 (4/07) [[media: digccurr07-v1.ppt|ppt]]<br />
* NDIIPP Focus Breakout (1/07) [[media: focus-ndiipp-07.ppt|ppt]]<br />
* NDIIPP PAWN Breakout (1/07) [[media: PAWN-ndiipp-jan07.ppt|ppt]]<br />
* NDIIPP Pecha Kucha Session (1/07) [[media: ndiipp-summary-jan-07.ppt|ppt]]<br />
<br />
===2006===<br />
* Presentation at Archives II (2006-07-28) [[media: pawn-nara-20060728.ppt|ppt]]<br />
* Library of Congress (7/06) [[media: loc-july-06.ppt|ppt]]<br />
* July project review, PAWN update (7/06) [[media: PAWNprogress-Jul-06-2.ppt|ppt]]<br />
* Using Scalable and Secure Web Technologies to Design Global Format Registry (Archiving 2006 - 5/06) [[media: Conference_final.ppt|ppt]]<br />
* Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information (5/06) [[media: dgo2006-jaja.ppt|ppt]]<br />
<br />
===2005===<br />
* Grid brick overview (11/05) [[media: GRIDBrickStatus-jun2-05.ppt|ppt]]<br />
* FOCUS overview (10/05) [[media: NARASlides_new.ppt|ppt]]<br />
* Pawn Release III status (10/05) [[media: PAWNTestIII-c.ppt|ppt]]<br />
* June project review, PAWN update (6/05) [[media: Review-Jun2-05.ppt|ppt]]<br />
* March NGDA data supplier meeting, ADAPT/PAWN/GRASP/GLCF (3/05) [[media: NGDA-adapt-glcf-05Mar.ppt|ppt]]<br />
* Esip Federation Spring 05<br />
** Introduction (Mike Smorul / Gary Jackson) [[media: 0.Introduction.ppt|ppt]]<br />
** SRB Overview (Mike Smorul) [[media: 1.srb-overview.ppt|ppt]]<br />
** Globus Overview (Gary Jackson) [[media: 2.globus.ppt|ppt]]<br />
** GRASP (Mike Smorul) [[media: 3.grasp.ppt|ppt]]<br />
** LPE (Gary Jackson) [[media: 4.LPE.ppt|ppt]]<br />
** PAWN for Scientific Collections (Mike Smorul) [[media: 5.pawn-mss.ppt|ppt]]<br />
** Globus vs SRB (Mike Smorul) [[media: globusvssrb.ppt|ppt]]<br />
* PAWN, LPE, GRASP overviews for JAXA/GLCF meeting (2/05) [[media: PAWN_LPE_grasp-05feb18.ppt|ppt]]<br />
<br />
===2004===<br />
* Dec 04 Presentation to OAIS members [[media: oais-pawn.ppt|ppt]]<br />
* PAWN: A Novel Ingestion Workflow Technology for Digital Preservation ([http://www.erpanet.org/events/2004/budapest/index.php ERPANET Workshop on Workflow]) [[media: erpanet.ppt|ppt]]<br />
<br />
==Posters==<br />
<br />
===2009===<br />
* Archiving2009 Web Archiving poster 5/09 [[media:webarc-search-poster-archiving-09-v3.ppt|ppt]]<br />
* Archiving2009 ACE poster 5/09<br />
<br />
===2008===<br />
<br />
===2007===<br />
* NDIIPP PAWN poster 1/07 [[media: globusvssrb.ppt|ppt]]<br />
* NDIIPP ACE poster 1/07 [[media: globusvssrb.ppt|ppt]]<br />
<br />
===2006===<br />
* NDIIPP Focus poster [[media: FOCUS_Poster-3.ppt|ppt]]<br />
* NDIIPP Adapt poster [[media: ndiipp-poster-2.ppt|ppt]]<br />
<br />
===2005===<br />
* PAWN and ADAPT poster 9/05 [[media: adapt-pawn4-final.ppt|ppt]]<br />
* Student jobs brochure 9/05 [[media: adapt-student-brochure.ppt|ppt]]<br />
<br />
===2004===<br />
* PAWN Poster for Nov 04 NARA Conference 11/04 [[media: pawn-poster.ppt|ppt]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:DigCCurr2009_060909.pdf&diff=2696File:DigCCurr2009 060909.pdf2010-02-23T02:01:56Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Papers&diff=2695Papers2010-02-23T02:01:35Z<p>Scsong: </p>
<hr />
<div>==Papers==<br />
<br />
===2009===<br />
* Song, S. and JaJa, J. Search and Access Strategies for Web Archives. in Archiving 2009. 2009: IS&T. [[media:Archiving2009_submitted.pdf|pdf]]<br />
* Smorul, M. and JaJa, J. A Case Study in Distributed Collection Monitoring and Auditing Using the Audit Control Environment (ACE). in Archiving 2009. 2009: IS&T.<br />
* Smorul, M., Song, S., and JaJa, J. An Implementation of the Audit Control Environment(ACE) to Support the Long Term Integrity of Digital Archives. in DigCCurr 2009. 2009: University of North Carolina.[[media:DigCCurr2009_060909.pdf|pdf]]<br />
* JaJa, J., Smorul, M., and Song, S. Tools and Services for Long-Term Preservation of Digital Archives. in Indo-US Workshop on International Trends in Digital Preservation. 2009. Pune, India [[media:Indo-US-Workshop-jaja.pdf|pdf]]<br />
<br />
===2008===<br />
* Song, S. and JaJa, J. Fast Browsing of Archived Web Contents. in 8th International Web Archiving Workshop. 2008. Aarhus, Denmark. [[media:iwaw08_Song_JaJa_Final.pdf|pdf]]<br />
* Song, S. and JaJa, J., Archiving Temporal Web Information: Organization of Web Contents for Fast Access and Compact Storage:UMIACS-TR-2008-08. 2008, University of Maryland Institute for Advanced Computer Studies. [[media:temporal-web-archiving-final-umiacs-tr-2008-08.pdf|pdf]]<br />
<br />
===2007===<br />
* Smorul, M., McGann, M., and JaJa, J. PAWN: A Policy-Driven Environment for Implementing Producer-Archive Interactions in Support of Long Term Digital Preservation. in Archiving 2007. 2007: IS&T. [[media:Pawn-archiving07-uploaded.pdf|pdf]]<br />
* Song, S. and JaJa, J. ACE: A Novel Software Platform to Ensure the Integrity of Long Term Archives. in Archiving 2007. 2007: IS&T. [[media:rad71E67.pdf|pdf]]<br />
* Smorul, M., McGann, M., and JaJa, J. The Use of the Producer-Archive Workflow Network (PAWN) in Support of Customized Archival Practice. in DigCCurr 2007. 2007. Chapel Hill, North Carolina. [[media:Digccurr07-paper-final.pdf|pdf]]<br />
* Song, S. and JaJa, J., Web Archiving: Organizing Web Objects into Web Containers to Optimize Access:UMIACS-TR-2007-42. 2007, University of Maryland Institute for Advanced Computer Studies. [[media:OrganizingWebObjects_UMIACS-TR-2007_42.pdf|pdf]]<br />
<br />
===2006===<br />
* Geremew, M., Song, S., and JaJa, J. Using Scalable and Secure Web Technologies to Design a Global Format Registry Prototype: Architecture, Implementation, and Testing. in Proceedings of Archiving 2006. 2006. Ottawa, Canada: IS&T. [[media:archiving06_final.pdf|pdf]]<br />
<br />
===2005===<br />
* JaJa, J., Smorul, M., McCall, F., and Wang, Y. Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments. in The 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies. 2005. Monterey, California: IEEE Computer Society. [[media:msst-final5.pdf|pdf]]<br />
<br />
===2004===<br />
* JaJa, J., McCall, F., Smorul, M., Moore, Y., and Chadduck, R. Digital Archiving and Long Term Preservation: An Early Experience with Grid and Digital Library Technologies. [[media:thic-04.doc|doc]]<br />
* Smorul, M., JaJa, J., Wang, Y., and McCall, F., PAWN: Producer – Archive Workflow Network in Support of Digital Preservation:UMIACS-TR-2004-49. 2004. [[media:UMIACS-TR-2004-49.pdf|pdf]]<br />
<br />
===2003===<br />
* Smorul, M., JaJa, J., McCall, F., Brown, S.F., Moore, R., Marciano, R., Chen, S.-Y., Lopez, R., and Chadduck, R., Recovery of a Digital Image Collection Through the SDSC/UMD/NARA Prototype Persistent Archive:UMIACS-TR-2003-105. 2003. [[media:UMIACS-TR-2003-105.pdf|pdf]]<br />
<br />
==Presentations==<br />
===2009===<br />
* Song, S. and JaJa, J. Search and Access Strategies for Web Archives. in Archiving 2009. 2009: IS&T. [[media:webarc-slides-archiving-09-v2.ppt|ppt]]<br />
* JaJa, J., Smorul, M., and Song, S. Tools and Services for Long-Term Preservation of Digital Archives. in Indo-US Workshop on International Trends in Digital Preservation. 2009. Pune, India [[media:JaJa-Indo-US-Workshop-March2009.ppt|ppt]]<br />
<br />
===2008===<br />
* ERA Research Project: Ingestion and Preservation Tools and Services (Partnerships in Innovation 2008) [[Media:JAJA-NARA-UMD-Conference-08.ppt|ppt]]<br />
* Fast Browsing of Archived Web Contents: International Web Archiving Workshop 2008 at Aarhus, Denmark (August 18-19, 2008) [[Media:IWAW2008.pdf|pdf]]<br />
* ACE: NDIIPP Partners Meeting at Arlington, VA (July 8-10, 2008) [[media:session4_chronopolis.ppt|ppt]]<br />
* NARA Presenation on recent research (6/08)<br />
** Overview [[media:NARA-June08-v2.ppt|ppt]]<br />
** PAWN [[media:pawn-0.7-nara-jun11.ppt|ppt]]<br />
** ACE [[media:ACEAuditingControlEnvironment.ppt|ppt]]<br />
** Replication Monitor [[media:ReplicationMonitoring.ppt|ppt]]<br />
* Technical Overview of Processes [[media: NewPawnProcess.ppt|ppt]]<br />
* Midwest Archives Conference (4/08) [[media: 2008MAC-Smorul.ppt|ppt]]<br />
<br />
===2007===<br />
* PAWN, NAGARA Kansas City Meeting (7/07) [[media: PAWN-nagara-2-smorul.ppt|ppt]]<br />
* ACE Presentation: NDIIPP and NARA review (7/07) [[media: ACE-July26.ppt|ppt]]<br />
* July project review, PAWN update (7/07) [[media: PAWN-updates-07-24.ppt|ppt]]<br />
* PAWN Presentation at Archiving 2007 (5/07) [[media: PAWN-Archiving07-v1.ppt|ppt]]<br />
* ACE Presentation at Archiving 2007 (5/07) [[media: ACE-Archiving2007.ppt|ppt]]<br />
* Long Term Sustainment of Digital Information for Science and Engineering, NIST Interoperability Week(04/07) [[media: ADAPT-NIST-04-07.ppt|ppt]]<br />
* DigCCurr 2007 (4/07) [[media: digccurr07-v1.ppt|ppt]]<br />
* NDIIPP Focus Breakout (1/07) [[media: focus-ndiipp-07.ppt|ppt]]<br />
* NDIIPP PAWN Breakout (1/07) [[media: PAWN-ndiipp-jan07.ppt|ppt]]<br />
* NDIIPP Pecha Kucha Session (1/07) [[media: ndiipp-summary-jan-07.ppt|ppt]]<br />
<br />
===2006===<br />
* Presentation at Archives II (2006-07-28) [[media: pawn-nara-20060728.ppt|ppt]]<br />
* Library of Congress (7/06) [[media: loc-july-06.ppt|ppt]]<br />
* July project review, PAWN update (7/06) [[media: PAWNprogress-Jul-06-2.ppt|ppt]]<br />
* Using Scalable and Secure Web Technologies to Design Global Format Registry (Archiving 2006 - 5/06) [[media: Conference_final.ppt|ppt]]<br />
* Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information (5/06) [[media: dgo2006-jaja.ppt|ppt]]<br />
<br />
===2005===<br />
* Grid brick overview (11/05) [[media: GRIDBrickStatus-jun2-05.ppt|ppt]]<br />
* FOCUS overview (10/05) [[media: NARASlides_new.ppt|ppt]]<br />
* Pawn Release III status (10/05) [[media: PAWNTestIII-c.ppt|ppt]]<br />
* June project review, PAWN update (6/05) [[media: Review-Jun2-05.ppt|ppt]]<br />
* March NGDA data supplier meeting, ADAPT/PAWN/GRASP/GLCF (3/05) [[media: NGDA-adapt-glcf-05Mar.ppt|ppt]]<br />
* Esip Federation Spring 05<br />
** Introduction (Mike Smorul / Gary Jackson) [[media: 0.Introduction.ppt|ppt]]<br />
** SRB Overview (Mike Smorul) [[media: 1.srb-overview.ppt|ppt]]<br />
** Globus Overview (Gary Jackson) [[media: 2.globus.ppt|ppt]]<br />
** GRASP (Mike Smorul) [[media: 3.grasp.ppt|ppt]]<br />
** LPE (Gary Jackson) [[media: 4.LPE.ppt|ppt]]<br />
** PAWN for Scientific Collections (Mike Smorul) [[media: 5.pawn-mss.ppt|ppt]]<br />
** Globus vs SRB (Mike Smorul) [[media: globusvssrb.ppt|ppt]]<br />
* PAWN, LPE, GRASP overviews for JAXA/GLCF meeting (2/05) [[media: PAWN_LPE_grasp-05feb18.ppt|ppt]]<br />
<br />
===2004===<br />
* Dec 04 Presentation to OAIS members [[media: oais-pawn.ppt|ppt]]<br />
* PAWN: A Novel Ingestion Workflow Technology for Digital Preservation ([http://www.erpanet.org/events/2004/budapest/index.php ERPANET Workshop on Workflow]) [[media: erpanet.ppt|ppt]]<br />
<br />
==Posters==<br />
<br />
===2009===<br />
* Archiving2009 Web Archiving poster 5/09 [[media:webarc-search-poster-archiving-09-v3.ppt|ppt]]<br />
* Archiving2009 ACE poster 5/09<br />
<br />
===2008===<br />
<br />
===2007===<br />
* NDIIPP PAWN poster 1/07 [[media: globusvssrb.ppt|ppt]]<br />
* NDIIPP ACE poster 1/07 [[media: globusvssrb.ppt|ppt]]<br />
<br />
===2006===<br />
* NDIIPP Focus poster [[media: FOCUS_Poster-3.ppt|ppt]]<br />
* NDIIPP Adapt poster [[media: ndiipp-poster-2.ppt|ppt]]<br />
<br />
===2005===<br />
* PAWN and ADAPT poster 9/05 [[media: adapt-pawn4-final.ppt|ppt]]<br />
* Student jobs brochure 9/05 [[media: adapt-student-brochure.ppt|ppt]]<br />
<br />
===2004===<br />
* PAWN Poster for Nov 04 NARA Conference 11/04 [[media: pawn-poster.ppt|ppt]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Lemur_Indexer_(modified)&diff=2637Webarc:Lemur Indexer (modified)2009-12-04T16:37:35Z<p>Scsong: </p>
<hr />
<div>== What It Does ==<br />
Given an input source, the Lemur/Indri Indexer (modified) either builds a new index or incrementally adds new documents on an existing index. Below are new features we added on top of the original Lemur/Indri toolkit. The original Lemur/Indri toolkit can be found at http://www.lemurproject.org/.<br />
* New statistics parameters are included in the index. They are:<br />
** fresh document count: the number of wiki revisions in the entire index in the monthly snapshot, not including those carried over from the previous snapshot.<br />
** term count in fresh documents: the number of terms appearing in the monthly snapshot, not including those carried over from the previous snapshot.<br />
** fresh document count for each term: the number of wiki revisions that contain a given term in the monthly snapshot, not including those carried over from the previous snapshot.<br />
** term count in fresh documents for each term: the number of the occurrences of a given term in the monthly snapshot, not including those carried over from the previous snapshot.<br />
<br />
Note that the the new index also contains the previous statistics that count documents and terms over the entire collection (monthly snapshot + carryover revisions).<br />
<br />
* Berkeley DB support as a new input source: a new class indri.parse.BDBTaggedDocumentIterator is added. Minor changes in a number of related classes.<br />
<br />
<br />
== How To Build ==<br />
The same way as you'd build the original Lemur/Indri toolkit. I.e. under the toolkit directory,<br />
<pre><br />
make; make install<br />
</pre><br />
<br />
== How To Run ==<br />
The same way as you'd run the original IndriBuildIndex (http://www.lemurproject.org/tutorials/interm_indexing-2.php).<br />
<br />
<pre><br />
./IndriBuildIndex <Indri indexer parameter file><br />
</pre><br />
<br />
== Input File ==<br />
Mostly the same format as defined in the original Indri indexer parameter file format (http://www.lemurproject.org/lemur/indexing.php#IndriBuildIndex), with an additional support for the Berkley DB as an input source.<br />
<br />
Example parameter file for indexing carryovers via Berkley DB.<br />
<pre><br />
<parameters><br />
<index>/fs/webarc3/data/wikipedia/lemur_index/monthly/month-003</index><br />
<indexType>indri</indexType><br />
<corpus><br />
<path>/fs/webarc3/data/wikipedia/bdb-monthly/month-003-co</path><br />
<class>trectext_from_bdb</class><br />
</corpus><br />
</parameters><br />
</pre><br />
<br />
Example parameter file for indexing a monthly snapshot.<br />
<pre><br />
<parameters><br />
<index>/fs/webarc3/data/wikipedia/lemur_index/monthly/month-003</index><br />
<indexType>indri</indexType><br />
<corpus><br />
<path>/fs/webarc3/data/wikipedia/preprocessed-monthly/trec-month-003.xml</path><br />
<class>trectext</class><br />
</corpus><br />
</parameters><br />
</pre><br />
<br />
== Output Files ==<br />
An Indri index directory at the location specified in the input parameter file (<index>...</index>).<br />
<br />
== Notes ==<br />
* As a new index format contains extra statistics, the new index is not compatible with the original Indri index, thus won't be readable by the original Lemur toolkit. <br />
<br />
== Source Codes ==<br />
svn co http://narasvn.umiacs.umd.edu/repository/src/webarc/lemur-4.10</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporal_Scoring_Result_Graphs&diff=2636Webarc:Temporal Scoring Result Graphs2009-11-25T20:32:27Z<p>Scsong: </p>
<hr />
<div>== Okapi BM-25 ==<br />
=== r = 10 ===<br />
<center>[[Image:okapi_recall_r10.png|640px]]</center><br />
<center>[[Image:okapi_kt_r10.png|640px]]</center><br />
<br />
=== r = 20 ===<br />
<center>[[Image:okapi_recall_r20.png|640px]]</center><br />
<center>[[Image:okapi_kt_r20.png|640px]]</center><br />
<br />
=== r = 100 ===<br />
<center>[[Image:okapi_recall_r100.png| 640px]]</center><br />
<center>[[Image:okapi_kt_r100.png| 640px]]</center><br />
<br />
== KL-Divergence with Dirichlet's Smoothing ==<br />
=== r = 10 ===<br />
<center>[[Image:kl_recall_r10.png| 640px]]</center><br />
<center>[[Image:kl_kt_r10.png| 640px]]</center><br />
<br />
=== r = 20 ===<br />
<center>[[Image:kl_recall_r20.png| 640px]]</center><br />
<center>[[Image:kl_kt_r20.png| 640px]]</center><br />
<br />
=== r = 100 ===<br />
<center>[[Image:kl_recall_r100.png| 640px]]</center><br />
<center>[[Image:kl_kt_r100.png| 640px]]</center></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Kl_kt_r100.png&diff=2635File:Kl kt r100.png2009-11-25T20:31:59Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Kl_recall_r100.png&diff=2634File:Kl recall r100.png2009-11-25T20:31:47Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Kl_kt_r20.png&diff=2633File:Kl kt r20.png2009-11-25T20:31:37Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Kl_recall_r20.png&diff=2632File:Kl recall r20.png2009-11-25T20:31:26Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Kl_kt_r10.png&diff=2631File:Kl kt r10.png2009-11-25T20:31:09Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Kl_recall_r10.png&diff=2630File:Kl recall r10.png2009-11-25T20:30:59Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Okapi_kt_r100.png&diff=2629File:Okapi kt r100.png2009-11-25T20:30:43Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Okapi_recall_r100.png&diff=2628File:Okapi recall r100.png2009-11-25T20:30:34Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Okapi_kt_r20.png&diff=2627File:Okapi kt r20.png2009-11-25T20:30:23Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Okapi_recall_r20.png&diff=2626File:Okapi recall r20.png2009-11-25T20:30:02Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Okapi_kt_r10.png&diff=2625File:Okapi kt r10.png2009-11-25T20:29:39Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Okapi_recall_r10.png&diff=2624File:Okapi recall r10.png2009-11-25T20:29:23Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporal_Scoring_Result_Graphs&diff=2623Webarc:Temporal Scoring Result Graphs2009-11-25T20:29:03Z<p>Scsong: </p>
<hr />
<div>== Okapi BM-25 ==<br />
=== r = 10 ===<br />
<center>[[Image:okapi_recall_r10.png|480px]]</center><br />
<center>[[Image:okapi_kt_r10.png|480px]]</center><br />
<br />
=== r = 20 ===<br />
<center>[[Image:okapi_recall_r20.png|480px]]</center><br />
<center>[[Image:okapi_kt_r20.png|480px]]</center><br />
<br />
=== r = 100 ===<br />
<center>[[Image:okapi_recall_r100.png|480px]]</center><br />
<center>[[Image:okapi_kt_r100.png|480px]]</center><br />
<br />
== KL-Divergence with Dirichlet's Smoothing ==<br />
=== r = 10 ===<br />
<center>[[Image:kl_recall_r10.png|480px]]</center><br />
<center>[[Image:kl_kt_r10.png|480px]]</center><br />
<br />
=== r = 20 ===<br />
<center>[[Image:kl_recall_r20.png|480px]]</center><br />
<center>[[Image:kl_kt_r20.png|480px]]</center><br />
<br />
=== r = 100 ===<br />
<center>[[Image:kl_recall_r100.png|480px]]</center><br />
<center>[[Image:kl_kt_r100.png|480px]]</center></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2622Webarc:Temporally Anchored Scoring Experiments2009-11-25T20:26:36Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results (It has been reported that about 80% of queries are multi-termed). <br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are used. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83-month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Recall: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean recalls and kendall's <math>\tau</math>'s in graphs. Each of the graphs in the case where r=10 and Okapi BM25 is used as the scoring method is shown below.<br />
<br />
<center>[[Image:okapi_recall_r10.png|480px]]</center><br />
<center>[[Image:okapi_kt_r10.png|480px]]</center><br />
<br />
[[Webarc:Temporal Scoring Result Graphs |Complete Results]]<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Time Windows Used in Experiments| [3] Time Windows]]<br />
* [[Webarc:Combining Statistics in Multiple Time Windows| [4] How We Combine Statistics]]<br />
* [[Webarc:Tools Developed |[5] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Main&diff=2619Webarc:Main2009-11-24T20:23:16Z<p>Scsong: </p>
<hr />
<div>==Overview==<br />
<center>[[Image:web.jpg]]</center><br />
<br />
An unprecedented amount of information encompassing almost every facet of human activity across the world is currently available on the web and is growing at an extremely fast pace. In many cases, the web is the only medium where such information is recorded. However, the web is an ephemeral medium whose contents are constantly changing and new information is rapidly replacing old information, resulting in the disappearance of a large number of web pages every day and in a permanent loss of part of our cultural and scientific heritage on a regular basis. A number of efforts, currently underway, are trying to develop methodologies and tools for capturing and archiving some of the web’s contents that are deemed critical. However there are major technical, social, and political challenges that are confronting these efforts. Major technical challenges include automatic tools to identify, find, and collect web contents to be archived, automatic extraction of metadata and context for such contents including linking structures that are inherent to the web, the organization and indexing of the data and the metadata, and the development of preservation and access mechanisms for current and future users, all at unprecedented scale and complexity.<br />
<br />
Leaving aside dynamic and deep contents, web contents involve a wide variety of objects such as html pages, documents, multimedia files, scripts, etc., as well as, linking structures involving these objects. While the size of most web pages is small, the total number of web pages on a single web site can range from one to several millions. For example, as of Oct 30, 2006, Wikipedia.org alone claims to have about 1.4 million articles, each making up a distinct web page. A critical piece of web archiving is to capture the linking structures and organize the archived pages in such a way that future generations of users will be able to access and navigate through the archived web information in the same way as in the original linked structure. Note that by that time, the archived web contents may have migrated through several generations of hardware and software upgrades, including migration through different types of media, different file systems, and different formats.<br />
<br />
While many challenges for archiving web contents exist, we focus in our work on ''scalable'' solutions supporting ''compact storage'' and ''fast access'' to large scale web archives. Our efforts to achieve the goal have been made around answering the following two questions:<br />
<br />
# How do we compactly store contents in a way to retrieve contents quickly?<br />
# How do we enable effective information discovery and access to the archived contents?<br />
<br />
To address the first question, we devise methodologies to put together tightly tightly related objects [[Webarc:Packaging|[1]]], and index their locations temporarily [[Webarc:PISA|[2]]]. Using the fast location index [[Webarc:PISA|[2]]], a quick duplicate detection scheme is also designed to maintain compact archive storage. As the first step toward answering the second question, we devise a temporal text-search index [[Webarc:Temporal Text-Search Index|[3]]] using a similar idea that we use for the fast location index.<br />
<br />
==Our Approaches==<br />
* [[Webarc:Packaging|[1] Web Container Packaging]]<br />
* [[Webarc:PISA|[2] PISA:Persistent Indexing Structure for Archives]]<br />
* [[Webarc:Search and Access Strategies |[3] Search and Access Strategies]]<br />
<!-- * [[Webarc:Temporal Scoring |[4] Temporal Scoring]] --></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Combining_Statistics_in_Multiple_Time_Windows&diff=2618Webarc:Combining Statistics in Multiple Time Windows2009-11-20T22:35:15Z<p>Scsong: </p>
<hr />
<div>We maintain an index for each time window. In each index, we store four term-dependent statistics parameters and four term-independent (index-wide) statistics parameters as follows:<br />
<br />
* Term-dependent<br />
** <font size="4"><math>df(t, tw_k)</math></font>: the number of all documents containing term t within time window k.<br />
** <font size="4"><math>df(t, tw_k)'</math></font>: the number of '''fresh''' documents containing term t within time window k.<br />
** <font size="4"><math>tf(t, tw_k)</math></font>: the number of occurrences of term t in all documents within time window k.<br />
** <font size="4"><math>tf(t, tw_k)'</math></font>: the number of occurrences of term t in '''fresh''' documents within time window k.<br />
<br />
* Term-independent (index-wide)<br />
** <font size="4"><math>df(tw_k)</math></font>: the number of all documents within time window k.<br />
** <font size="4"><math>df(tw_k)'</math></font>: the number of '''fresh''' documents within time window k.<br />
** <font size="4"><math>tf(tw_k)</math></font>: the number of all terms in all documents within time window k.<br />
** <font size="4"><math>tf(tw_k)'</math></font>: the number of all terms in '''fresh''' documents within time window k.<br />
<br />
By '''fresh''' documents, we mean the documents that were newly updated or appeared in the given time window. Thus, the fresh documents do not appear any previous time windows.<br />
<br />
We combine statistics for time windows ''i''~''j'' as follows:<br />
<br />
*<math>df(t, tw_{i\sim j}) = df(t, tw_i) + \sum_{k=i+1}^j df(t, tw_k)'</math><br />
*<math>tf(t, tw_{i\sim j}) = tf(t, tw_i) + \sum_{k=i+1}^j tf(t, tw_k)'</math><br />
*<math>df(tw_{i\sim j}) = df(tw_i) + \sum_{k=i+1}^j df(tw_k)'</math><br />
*<math>tf(tw_{i\sim j}) = tf(tw_i) + \sum_{k=i+1}^j tf(tw_k)'</math></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Combining_Statistics_in_Multiple_Time_Windows&diff=2617Webarc:Combining Statistics in Multiple Time Windows2009-11-20T22:34:51Z<p>Scsong: </p>
<hr />
<div>We maintain an index for each time window. In each index, we store four term-dependent statistics parameters and four term-independent (index-wide) statistics parameters as follows:<br />
<br />
* Term-dependent<br />
** <font size="4"><math>df(t, tw_k)</math></font>: the number of all documents containing term t within time window k.<br />
** <font size="4"><math>df(t, tw_k)'</math></font>: the number of '''fresh''' documents containing term t within time window k.<br />
** <font size="4"><math>tf(t, tw_k)</math></font>: the number of occurrences of term t in all documents within time window k.<br />
** <font size="4"><math>tf(t, tw_k)'</math></font>: the number of occurrences of term t in '''fresh''' documents within time window k.<br />
<br />
* Term-independent (index-wide)<br />
** <font size="4"><math>df(tw_k)</math></font>: the number of all documents within time window k.<br />
** <font size="4"><math>df(tw_k)'</math></font>: the number of '''fresh''' documents within time window k.<br />
** <font size="4"><math>tf(tw_k)</math></font>: the number of all terms in all documents within time window k.<br />
** <font size="4"><math>tf(tw_k)'</math></font>: the number of all terms in '''fresh''' documents within time window k.<br />
<br />
By '''fresh''' documents, we mean the documents that were newly updated or appeared in the given time window. Thus, the fresh documents do not appear any previous time windows.<br />
<br />
We combine statistics for time windows ''i''~''j'' as follows:<br />
<br />
*<math>df(t, tw_{i\sim j}) = df(t, tw_i) + \sum_{k=i+1}^j df(t, tw_k')</math><br />
*<math>tf(t, tw_{i\sim j}) = tf(t, tw_i) + \sum_{k=i+1}^j tf(t, tw_k')</math><br />
*<math>df(tw_{i\sim j}) = df(tw_i) + \sum_{k=i+1}^j df(tw_k')</math><br />
*<math>tf(tw_{i\sim j}) = tf(tw_i) + \sum_{k=i+1}^j tf(tw_k')</math></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Combining_Statistics_in_Multiple_Time_Windows&diff=2616Webarc:Combining Statistics in Multiple Time Windows2009-11-20T22:34:41Z<p>Scsong: </p>
<hr />
<div>We maintain an index for each time window. In each index, we store four term-dependent statistics parameters and four term-independent (index-wide) statistics parameters as follows:<br />
<br />
* Term-dependent<br />
** <font size="4"><math>df(t, tw_k)</math></font>: the number of all documents containing term t within time window k.<br />
** <font size="4"><math>df(t, tw_k)'</math></font>: the number of '''fresh''' documents containing term t within time window k.<br />
** <font size="4"><math>tf(t, tw_k)</math></font>: the number of occurrences of term t in all documents within time window k.<br />
** <font size="4"><math>tf(t, tw_k)'</math></font>: the number of occurrences of term t in '''fresh''' documents within time window k.<br />
<br />
* Term-independent (index-wide)<br />
** <font size="4"><math>df(tw_k)</math></font>: the number of all documents within time window k.<br />
** <font size="4"><math>df(tw_k)'</math></font>: the number of '''fresh''' documents within time window k.<br />
** <font size="4"><math>tf(tw_k)</math></font>: the number of all terms in all documents within time window k.<br />
** <font size="4"><math>tf(tw_k)'</math></font>: the number of all terms in '''fresh''' documents within time window k.<br />
<br />
By ''fresh'' documents, we mean the documents that were newly updated or appeared in the given time window. Thus, the fresh documents do not appear any previous time windows.<br />
<br />
We combine statistics for time windows ''i''~''j'' as follows:<br />
<br />
*<math>df(t, tw_{i\sim j}) = df(t, tw_i) + \sum_{k=i+1}^j df(t, tw_k')</math><br />
*<math>tf(t, tw_{i\sim j}) = tf(t, tw_i) + \sum_{k=i+1}^j tf(t, tw_k')</math><br />
*<math>df(tw_{i\sim j}) = df(tw_i) + \sum_{k=i+1}^j df(tw_k')</math><br />
*<math>tf(tw_{i\sim j}) = tf(tw_i) + \sum_{k=i+1}^j tf(tw_k')</math></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Combining_Statistics_in_Multiple_Time_Windows&diff=2615Webarc:Combining Statistics in Multiple Time Windows2009-11-20T22:33:57Z<p>Scsong: </p>
<hr />
<div>We maintain an index for each time window. In each index, we store four term-dependent statistics parameters and four term-independent (index-wide) statistics parameters as follows:<br />
<br />
* Term-dependent<br />
** <font size="4"><math>df(t, tw_k)</math></font>: the number of all documents containing term t within time window k.<br />
** <font size="4"><math>df(t, tw_k)'</math></font>: the number of fresh documents containing term t within time window k.<br />
** <font size="4"><math>tf(t, tw_k)</math></font>: the number of occurrences of term t in all documents within time window k.<br />
** <font size="4"><math>tf(t, tw_k)'</math></font>: the number of occurrences of term t in fresh documents within time window k.<br />
<br />
* Term-independent (index-wide)<br />
** <font size="4"><math>df(tw_k)</math></font>: the number of all documents within time window k.<br />
** <font size="4"><math>df(tw_k)'</math></font>: the number of fresh documents within time window k.<br />
** <font size="4"><math>tf(tw_k)</math></font>: the number of all terms in all documents within time window k.<br />
** <font size="4"><math>tf(tw_k)'</math></font>: the number of all terms in fresh documents within time window k.<br />
<br />
By ''fresh'' documents, we mean the documents that were newly updated or appeared in the given time window. Thus, the fresh documents do not appear any previous time windows.<br />
<br />
We combine statistics for time windows ''i''~''j'' as follows:<br />
<br />
*<math>df(t, tw_{i\sim j}) = df(t, tw_i) + \sum_{k=i+1}^j df(t, tw_k')</math><br />
*<math>tf(t, tw_{i\sim j}) = tf(t, tw_i) + \sum_{k=i+1}^j tf(t, tw_k')</math><br />
*<math>df(tw_{i\sim j}) = df(tw_i) + \sum_{k=i+1}^j df(tw_k')</math><br />
*<math>tf(tw_{i\sim j}) = tf(tw_i) + \sum_{k=i+1}^j tf(tw_k')</math></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2614Webarc:Temporally Anchored Scoring Experiments2009-11-20T22:32:51Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results (It has been reported that about 80% of queries are multi-termed). <br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are used. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83-month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Precision: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean precisions and kendall's <math>\tau</math>'s in graphs. An example of such plotting for <font size="4"><math>mP_{20}(x, y)</math></font> is shown below. In the figure, TW2 represents the case where the time window size is two months, TW4 four months, TW8 eight months, and so on.<br />
<br />
<center>[[Image:mP_20.png|480px]]</center><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Time Windows Used in Experiments| [3] Time Windows]]<br />
* [[Webarc:Combining Statistics in Multiple Time Windows| [4] How We Combine Statistics]]<br />
* [[Webarc:Tools Developed |[5] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2613Webarc:Temporally Anchored Scoring Experiments2009-11-20T22:21:28Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Precision: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean precisions and kendall's <math>\tau</math>'s in graphs. An example of such plotting for <font size="4"><math>mP_{20}(x, y)</math></font> is shown below. In the figure, TW2 represents the case where the time window size is two months, TW4 four months, TW8 eight months, and so on.<br />
<br />
<center>[[Image:mP_20.png|480px]]</center><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Time Windows Used in Experiments| [3] Time Windows]]<br />
* [[Webarc:Combining Statistics in Multiple Time Windows| [4] How We Combine Statistics]]<br />
* [[Webarc:Tools Developed |[5] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Input_Dataset_Statistics&diff=2612Webarc:Input Dataset Statistics2009-11-20T22:20:40Z<p>Scsong: </p>
<hr />
<div>From the English Wikipedia XML dump crated on April 25th 2007, we extracted 83 monthly snapshots. All the experiments have been performed on these monthly snapshots.<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| Original Data || English Wikipedia XML Dump created on April 25th 2007<br />
|-<br />
| Extracted Data || 83 monthly snapshots between February 2001 ~ December 2007<br />
|-<br />
| Included in Each Snapshot || Most recent revision of each article as of the end of the month.<br />
|-<br />
| Number of Articles || 2,077,745<br />
|-<br />
| Number of Revisions || 16,618,497 <br />
|-<br />
| Average Number of Revisions per Article || 8<br />
|-<br />
| Average Lifespan per Revision || 27.4 months<br />
|-<br />
| The Most Frequently-Updated Article || "Cannabis", "Chess", and "Sport" - 73 revisions each <br />
|- <br />
| The Least Frequently-Updated Article || "Buzz!! The Movie" and 317,126 others - 1 revision each<br />
|}<br />
<center>[[Image:wiki_update_frequency.png|640px]]</center><br />
The following table shows the number of articles in each monthly snapshot. In each month, we separately counted the articles that were newly updated within the month, and the articles that were last updated in a previous month.<br />
<br />
In the following table, February 2001 is short-written as month-001, March 2001 as month-002, ..., December 2007 as month-083.<br />
{| class="wikitable" style="text-align:right; margin: 1em auto 1em auto " border="1"<br />
|-align="center"<br />
!width="100"|Snapshot || Number of Carried-Over Articles || Number of Newly-Updated Articles || Total Number of Articles<br />
|-<br />
| month-001 || 0 || 23 || 23<br />
|-<br />
| month-002 || 21 || 160 || 181<br />
|-<br />
| month-003 || 179 || 212 || 391<br />
|-<br />
| month-004 || 385 || 670 || 1,055<br />
|-<br />
| month-005 || 1,046 || 186 || 1,232<br />
|-<br />
| month-006 || 1,207 || 349 || 1,556<br />
|-<br />
| month-007 || 1,534 || 695 || 2,229<br />
|-<br />
| month-008 || 2,163 || 1,190 || 3,353<br />
|-<br />
| month-009 || 3,266 || 1,917 || 5,183<br />
|-<br />
| month-010 || 4,972 || 1,939 || 6,911<br />
|-<br />
| month-011 || 6,364 || 2,491 || 8,855<br />
|-<br />
| month-012 || 8,129 || 2,277 || 10,406<br />
|-<br />
| month-013 || 8,626 || 3,408 || 12,034<br />
|-<br />
| month-014 || 10,145 || 3,669 || 13,814<br />
|-<br />
| month-015 || 12,149 || 2,890 || 15,039<br />
|-<br />
| month-016 || 13,197 || 3,797 || 16,994<br />
|-<br />
| month-017 || 13,967 || 5,192 || 19,159<br />
|-<br />
| month-018 || 15,825 || 6,291 || 22,116<br />
|-<br />
| month-019 || 17,560 || 10,200 || 27,760<br />
|-<br />
| month-020 || 21,135 || 16,112 || 37,247<br />
|-<br />
| month-021 || 29,871 || 28,525 || 58,396<br />
|-<br />
| month-022 || 52,325 || 10,311 || 62,636<br />
|-<br />
| month-023 || 51,434 || 27,957 || 79,391<br />
|-<br />
| month-024 || 71,448 || 13,466 || 84,914<br />
|-<br />
| month-025 || 76,908 || 12,617 || 89,525<br />
|-<br />
| month-026 || 81,189 || 13,327 || 94,516<br />
|-<br />
| month-027 || 85,235 || 14,343 || 99,578<br />
|-<br />
| month-028 || 87,799 || 17,513 || 105,312<br />
|-<br />
| month-029 || 87,679 || 23,705 || 111,384<br />
|-<br />
| month-030 || 92,631 || 26,438 || 119,069<br />
|-<br />
| month-031 || 102,220 || 23,983 || 126,203<br />
|-<br />
| month-032 || 109,987 || 24,068 || 134,055<br />
|-<br />
| month-033 || 116,297 || 24,339 || 140,636<br />
|-<br />
| month-034 || 119,494 || 29,287 || 148,781<br />
|-<br />
| month-035 || 125,761 || 33,126 || 158,887<br />
|-<br />
| month-036 || 133,969 || 34,961 || 168,930<br />
|-<br />
| month-037 || 137,032 || 46,191 || 183,223<br />
|-<br />
| month-038 || 139,282 || 63,119 || 202,401<br />
|-<br />
| month-039 || 153,382 || 66,120 || 219,502<br />
|-<br />
| month-040 || 164,852 || 70,693 || 235,545<br />
|-<br />
| month-041 || 150,747 || 101,445 || 252,192<br />
|-<br />
| month-042 || 186,426 || 84,940 || 271,366<br />
|-<br />
| month-043 || 201,801 || 88,208 || 290,009<br />
|-<br />
| month-044 || 213,105 || 95,792 || 308,897<br />
|-<br />
| month-045 || 230,877 || 99,329 || 330,206<br />
|-<br />
| month-046 || 208,679 || 146,893 || 355,572<br />
|-<br />
| month-047 || 239,545 || 140,840 || 380,385<br />
|-<br />
| month-048 || 289,942 || 111,292 || 401,234<br />
|-<br />
| month-049 || 310,721 || 109,583 || 420,304<br />
|-<br />
| month-050 || 309,097 || 136,095 || 445,192<br />
|-<br />
| month-051 || 311,723 || 162,419 || 474,142<br />
|-<br />
| month-052 || 332,785 || 171,091 || 503,876<br />
|-<br />
| month-053 || 346,613 || 189,980 || 536,593<br />
|-<br />
| month-054 || 362,863 || 215,235 || 578,098<br />
|-<br />
| month-055 || 387,670 || 233,548 || 621,218<br />
|-<br />
| month-056 || 430,610 || 227,345 || 657,955<br />
|-<br />
| month-057 || 438,169 || 263,848 || 702,017<br />
|-<br />
| month-058 || 477,553 || 266,604 || 744,157<br />
|-<br />
| month-059 || 462,758 || 326,800 || 789,558<br />
|-<br />
| month-060 || 484,083 || 356,435 || 840,518<br />
|-<br />
| month-061 || 536,424 || 349,439 || 885,863<br />
|-<br />
| month-062 || 535,708 || 400,751 || 936,459<br />
|-<br />
| month-063 || 574,904 || 409,781 || 984,685<br />
|-<br />
| month-064 || 605,089 || 430,124 || 1,035,213<br />
|-<br />
| month-065 || 646,905 || 439,811 || 1,086,716<br />
|-<br />
| month-066 || 674,061 || 472,054 || 1,146,115<br />
|-<br />
| month-067 || 699,421 || 510,847 || 1,210,268<br />
|-<br />
| month-068 || 776,674 || 485,047 || 1,261,721<br />
|-<br />
| month-069 || 806,557 || 506,063 || 1,312,620<br />
|-<br />
| month-070 || 833,026 || 530,075 || 1,363,101<br />
|-<br />
| month-071 || 882,059 || 531,460 || 1,413,519<br />
|-<br />
| month-072 || 895,075 || 571,286 || 1,466,361<br />
|-<br />
| month-073 || 956,064 || 561,065 || 1,517,129<br />
|-<br />
| month-074 || 982,413 || 587,120 || 1,569,533<br />
|-<br />
| month-075 || 1,024,155 || 597,422 || 1,621,577<br />
|-<br />
| month-076 || 1,068,145 || 604,626 || 1,672,771<br />
|-<br />
| month-077 || 1,133,363 || 605,840 || 1,739,203<br />
|-<br />
| month-078 || 1,188,351 || 634,385 || 1,822,736<br />
|-<br />
| month-079 || 1,233,461 || 654,324 || 1,887,785<br />
|-<br />
| month-080 || 1,306,141 || 629,149 || 1,935,290<br />
|-<br />
| month-081 || 1,329,029 || 654,937 || 1,983,966<br />
|-<br />
| month-082 || 1,405,930 || 621,105 || 2,027,035<br />
|-<br />
| month-083 || 1,441,438 || 636,307 || 2,077,745<br />
|-<br />
| total || 30,070,825 || 16,618,497 || 46,689,322<br />
|}</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2606Webarc:Temporally Anchored Scoring Experiments2009-11-19T08:57:06Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Precision: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean precisions and kendall's <math>\tau</math>'s in graphs. An example of such plotting for <font size="4"><math>mP_{20}(x, y)</math></font> is shown below. In the figure, TW1 represents the case where the time window size is a single month, TW2 two months, TW4 four months, and so on.<br />
<br />
<center>[[Image:mP_20.png|480px]]</center><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Time Windows Used in Experiments| [3] Time Windows]]<br />
* [[Webarc:Combining Statistics in Multiple Time Windows| [4] How We Combine Statistics]]<br />
* [[Webarc:Tools Developed |[5] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2605Webarc:Temporally Anchored Scoring Experiments2009-11-19T08:56:53Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Precision: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean precisions and kendall's <math>\tau</math>'s in graphs. An example of such plotting for <font size="4"><math>mP_{20}(x, y)</math></font> is shown below. In the figure, TW1 represents the case where the time window size is a single month, TW2 two months, TW4 four months, and so on.<br />
<br />
<center>[[Image:mP_20.png|480px]]</center><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Time Windows Used in Experiments| [3] Time Windows]]<br />
* [[Webarc:Combining Statistics in Multiple Time Windows| [4] How We Combine Statistics]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2604Webarc:Temporally Anchored Scoring Experiments2009-11-19T08:54:26Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Precision: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean precisions and kendall's <math>\tau</math>'s in graphs. An example of such plotting for <font size="4"><math>mP_{20}(x, y)</math></font> is shown below. In the figure, TW1 represents the case where the time window size is a single month, TW2 two months, TW4 four months, and so on.<br />
<br />
<center>[[Image:mP_20.png|480px]]</center><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2603Webarc:Temporally Anchored Scoring Experiments2009-11-19T08:53:44Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Precision: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean precisions and kendall's <math>\tau</math>'s in graphs. An example of such plotting for mP_{20} is shown below. In the figure, TW1 represents the case where the time window size is a single month, TW2 two months, TW4 four months, and so on.<br />
<br />
<center>[[Image:mP_20.png|480px]]</center><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2602Webarc:Temporally Anchored Scoring Experiments2009-11-19T08:53:23Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Precision: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean precisions and kendall's <math>\tau</math>'s in graphs. An example of such plotting for mP_{20} is shown below. In the figure, TW1 represents a case where the time window size is a single month, TW2 two months, TW4 four months, and so on.<br />
<br />
<center>[[Image:mP_20.png|480px]]</center><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:MP_20.png&diff=2601File:MP 20.png2009-11-19T08:52:02Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2600Webarc:Temporally Anchored Scoring Experiments2009-11-19T08:51:35Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean Precision: <font size="4"><math>mP_r(x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT_r(x, y)</math></font><br />
<br />
We plot mean precisions and kendall's <math>\tau</math>'s in graphs. An example of such plotting for mP_{20} is shown below.<br />
<br />
<center>[[Image:mP_20.png|480px]]</center><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2599Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:51:59Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean precision: <font size="4"><math>mP(r, x, y)</math></font><br />
* Precision variance: <font size="4"><math>vP(r, x, y)</math></font><br />
* Precision variance: <font size="4"><math>vP(r, x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT(r, x, y)</math></font><br />
* Kendall's <math>\tau</math> variance: <font size="4"><math>vKT(r, x, y)</math></font><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2598Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:37:15Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean precision: <font size="4"><math>mP(r, x, y)</math></font><br />
* Precision variance: <font size="4"><math>vP(r, x, y)</math></font><br />
* Mean Kendall's <math>\tau</math>: <font size="4"><math>mKT(r, x, y)</math></font><br />
* Kendall's <math>\tau</math> variance: <font size="4"><math>vKT(r, x, y)</math></font><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2597Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:35:45Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where <math>r</math> is the number of search results, <math>x</math> is the query time span length, and <math>y</math> is the time window length.<br />
<br />
* Mean precision: <math>mP(r, x, y)</math><br />
* Precision variance: <math>vP(r, x, y)</math><br />
* Mean Kendall's <math>\tau</math>: <math>mKT(r, x, y)</math><br />
* Kendall's <math>\tau</math> variance: <math>vKT(r, x, y)</math><br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2596Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:33:52Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \in \{10, 20, 100\}</math>, <math>x \in \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \in \{2, 4, 8, 16, 32, 64, 83\}</math>, where ''r'' is the number of search results, ''x'' is the query time span length, and ''y'' is the time window length.<br />
<br />
* Mean precision: '''mP(''r'', ''x'', 'y'')''' <br />
* Precision variance: '''vP(''r'', ''x'', ''y'')'''<br />
* Mean Kendall's <math>\tau</math>: '''mKT(''r'', ''x'', 'y'')''' <br />
* Kendall's <math>\tau</math> variance: '''vKT(''r'', ''x'', 'y'')'''<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2595Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:31:03Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \subset \{10, 20, 100\}</math>, <math>x \subset \{1, 2, 4, 8, 16, 32, 64, 83\}</math>, and <math>y \subset \{2, 4, 8, 16, 32, 64, 83\}</math>, where ''r'' is the number of search results, ''x'' is the query time span length, and ''y'' is the time window length.<br />
<br />
* Mean precision: '''mP(''r'', ''x'', 'y'')''' <br />
* Precision variance: '''vP(''r'', ''x'', ''y'')'''<br />
* Mean Kendall's <math>\tau</math>: '''mKT(''r'', ''x'', 'y'')''' <br />
* Kendall's <math>\tau</math> variance: '''vKT(''r'', ''x'', 'y'')'''<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2594Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:30:08Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \subset \big(10, 20, 100\big)</math>, <math>x \subset \big(1, 2, 4, 8, 16, 32, 64, 83\big)</math>, and <math>y \subset \big(2, 4, 8, 16, 32, 64, 83\big)</math>, where ''r'' is the number of search results, ''x'' is the query time span length, and ''y'' is the time window length.<br />
<br />
* Mean precision: '''mP(''r'', ''x'', 'y'')''' <br />
* Precision variance: '''vP(''r'', ''x'', ''y'')'''<br />
* Mean Kendall's <math>\tau</math>: '''mKT(''r'', ''x'', 'y'')''' <br />
* Kendall's <math>\tau</math> variance: '''vKT(''r'', ''x'', 'y'')'''<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2593Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:28:39Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math>r \subset {10, 20, 100}</math>, <math>x \subset {1, 2, 4, 8, 16, 32, 64, 83}</math>, and <math>y \subset {2, 4, 8, 16, 32, 64, 83}</math>, where ''r'' is the number of search results, ''x'' is the query time span length, and ''y'' is the time window length.<br />
<br />
* Mean precision: '''mP(''r'', ''x'', 'y'')''' <br />
* Precision variance: '''vP(''r'', ''x'', ''y'')'''<br />
* Mean Kendall's <math>\tau</math>: '''mKT(''r'', ''x'', 'y'')''' <br />
* Kendall's <math>\tau</math> variance: '''vKT(''r'', ''x'', 'y'')'''<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2592Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:21:08Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math> r \subset {10, 20, 100}, <math>x \subset {1, 2, 4, 8, 16, 32, 64, 83}</math>, and <math> y \subset {2, 4, 8, 16, 32, 64, 83}</math>, where ''r'' is the number of search results, ''x'' is the query time span length, and ''y'' is the time window length.<br />
<br />
* Mean precision: '''mP(''r'', ''x'', 'y'')''' <br />
* Precision variance: '''vP(''r'', ''x'', ''y'')'''<br />
* Mean Kendall's <math>\tau</math>: '''mKT(''r'', ''x'', 'y'')''' <br />
* Kendall's <math>\tau</math> variance: '''vKT(''r'', ''x'', 'y'')'''<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2591Webarc:Temporally Anchored Scoring Experiments2009-11-19T00:20:36Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month.<br />
<br />
In particular, we compute the following metrics for all <math> r \subset {10, 20, 100}, <math>x \subset {1, 2, 4, 8, 16, 32, 64, 83}</math>, and <math> y \subset {2, 4, 8, 16, 32, 64, 83}</math>, where ''r'' is the number of search results, ''x'' is the query time span length, and ''y'' is the time window length.<br />
<br />
* Mean precision: '''mP(''r'', ''x'', 'y'')''' <br />
* Precision variance: '''vP(''r'', ''x'', ''y'')'''<br />
* Mean Kendall's \tau: '''mKT(''r'', ''x'', 'y'')''' <br />
* Kendall's \tau variance: '''vKT(''r'', ''x'', 'y'')'''<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2590Webarc:Temporally Anchored Scoring Experiments2009-11-18T23:27:21Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp<sub>1</sub> ~ tp<sub>2</sub>), we find a series of time windows tw<sub>i</sub> ~ tw<sub>j</sub> that completely cover (tp<sub>1</sub> ~ tp<sub>2</sub>). The statistics for the query time span is approximated by combining statistics of tw<sub>i</sub> ~ tw<sub>j</sub>.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw<sub>1</sub> ~ tw<sub>3</sub> completely overlaps a given query time span as in the figure below, we combine statistics of tw<sub>1</sub>, tw<sub>2</sub> and tw<sub>3</sub> and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month. In particular, we compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100 for all the possible ''x'' and ''y'' combination.<br />
<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2589Webarc:Temporally Anchored Scoring Experiments2009-11-18T23:22:54Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp1 ~ tp2), we find a series of time windows tw_i ~ tw_j that completely cover (tp1 ~ tp2). The statistics for the query time span is approximated by combining statistics of tw_i ~ tw_j.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw_1 ~ tw_3 completely overlaps a given query time span as in the figure below, we combine statistics of tw_1, tw_2 and tw_3 and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png|480px]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png|480px]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month. In particular, we compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100 for all the possible ''x'' and ''y'' combination.<br />
<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Tw1.png&diff=2588File:Tw1.png2009-11-18T23:22:16Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=File:Stat_approximation.png&diff=2587File:Stat approximation.png2009-11-18T23:21:56Z<p>Scsong: </p>
<hr />
<div></div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2586Webarc:Temporally Anchored Scoring Experiments2009-11-18T23:21:39Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
When scoring relevance, approximated query / document statistics are used. For a query time span (tp1 ~ tp2), we find a series of time windows tw_i ~ tw_j that completely cover (tp1 ~ tp2). The statistics for the query time span is approximated by combining statistics of tw_i ~ tw_j.<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
For example, if tw_1 ~ tw_3 completely overlaps a given query time span as in the figure below, we combine statistics of tw_1, tw_2 and tw_3 and use them when scoring. <br />
<br />
<center>[[Image:stat_approximation.png]]</center><br />
<br />
In the case where the time window size is a single month, a series of time windows always perfectly overlap with any query time span, and therefore we use the exact statistics found within the query time span.<br />
<br />
<center>[[Image:tw1.png]]</center><br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month. In particular, we compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100 for all the possible ''x'' and ''y'' combination.<br />
<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Temporally_Anchored_Scoring_Experiments&diff=2585Webarc:Temporally Anchored Scoring Experiments2009-11-18T23:07:22Z<p>Scsong: </p>
<hr />
<div>==Goals==<br />
There are two main goals that we want to achieve through the following experiments.<br />
# For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.<br />
# For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
<br />
==Input Dataset==<br />
We preprocess the entire revision history of [http://www.archive.org/details/enwiki-20080425 the English Wikipedia from 2001 to 2007]. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=649100 August 21 2002] is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=749002 February 7 2003], whereas the same article revised on [http://en.wikipedia.org/w/index.php?title=Economy_of_the_United_States&oldid=166885 August 16 2002] is not included in any of the snapshots.<br />
<br />
[[Webarc:Input Dataset Statistics |Statistics of monthly snapshots]]<br />
<br />
==Queries==<br />
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.<br />
<br />
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.<br />
<br />
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.<br />
<br />
[[Webarc:Temporal Query Load |The complete list of query terms and query time spans]]<br />
<br />
==Scoring Methods==<br />
We use two different scoring methods; 1. a tf-idf based method (OKAPI BM-25, in particular) and 2. a language-model based method (the KL divergence combined with the Dirichlet's smoothing, in particular). For each query, we take 100 top documents from each scoring method.<br />
<br />
==Experiments==<br />
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers. When scoring relevance, approximated query / document statistics are used. For example, if a query time span overlaps a part of two time windows, for scoring we use the combined document and term statistics found in the entire two time windows, rather than only in the query time span. In the case where the time window size is a single month, the query time span always entirely overlaps with time windows, and therefore we use the exact statistics found within the query time span.<br />
<br />
[[Webarc:Time Windows Used in Experiments|Time Windows Used in Experiments]]<br />
<br />
[[Webarc:Approximating Statistics|How We Approximate Statistics]]<br />
<br />
[[Webarc:Combining Statistics in Multiple Time Windows|How We Combine Statistics in Multiple Time Windows]]<br />
<br />
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table, each row represents an experiment session with a given time window size, and each column represents a query time span size. In the entries of the table, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries.<br />
<br />
{| class="wikitable" style="margin: 1em auto 1em auto" border="1"<br />
|-<br />
| tw\qts || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83<br />
|-<br />
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1<br />
|-<br />
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2<br />
|-<br />
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4<br />
|-<br />
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8<br />
|-<br />
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16<br />
|-<br />
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32<br />
|-<br />
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64<br />
|-<br />
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83<br />
|}<br />
<br />
We run the eight-session experiments one more time with a different scoring method.<br />
<br />
In total, we generate 1,648,416 queries (= 103,026 queries per session x 8 sessions x 2 scoring methods) and obtain about 160 million search results (each query returns up to 100 search results).<br />
<br />
==Analysis==<br />
===Impact of Temporally-anchored Scoring===<br />
To reiterate, the first goal of the experiments is to examine how temporally-anchored scoring can yield different search results. <br />
In the traditional search scheme, a temporal query is carried out by first scoring documents with the document/term statistics in the entire collection, followed by filtering in only those that belong to the specified query time span. Apparently, our experiment session for time window size 83 perfectly emulates the traditional scheme, resulting in the same results sets as the traditional scheme would have. Thus, in the first analysis, we compare the following two cases:<br />
<br />
# Scoring is performed based on the statistics of the entire history (i.e. the time window size is 83).<br />
# Scoring is performed based on the actual query time span specified.<br />
<br />
In particular, the search results set qts.''x''_tw.''1'' is compared against qts.''x''_tw.''83''. We compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100.<br />
<br />
We also find out one good temporal query example that can emphasize the importance of the temporally-anchored scoring.<br />
<br />
===Impact of Time-Window Based Approximation===<br />
The second goal is to examine how approximating scoring parameters can affect the search results, depending on the time window size.<br />
When the time window size is a single month, any query time span will perfectly match one or more of consecutive time windows. Thus, the statistics that we use for any query time span will be exact ones. On the other hand, when the time window size of two or more months, many query time spans will end up covering parts of time windows, and we approximate the statistics for a query time span by using the combined statistics of all overlapping time windows. See [xx] for how we combine statistics of multiple time windows.<br />
Therefore, we can examine the impact of time-window based approximation by comparing the search results set qts.''x''_tw.''y'' (y!=1) with qts.''x''_tw.''1''. for all ''x''s and ''y''s. For example, comparing qts.''1''_tw.''2'' with qts.''1''_tw.''1'' will show us how the search results are different when we use 2-month time windows in the case that the query time span is a single month. In particular, we compute precisions at r=10, 20 and 100 and Kendall's tau at r=10, 20 and 100 for all the possible ''x'' and ''y'' combination.<br />
<br />
<br />
<br />
==Further Information==<br />
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]]<br />
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]]<br />
* [[Webarc:Tools Developed |[2] Tools Developed]]</div>Scsonghttps://wiki.umiacs.umd.edu/adapt/index.php?title=Webarc:Combining_Statistics_in_Multiple_Time_Windows&diff=2584Webarc:Combining Statistics in Multiple Time Windows2009-11-18T23:05:10Z<p>Scsong: </p>
<hr />
<div>We maintain an index for each time window. In each index, we store four term-dependent statistics parameters and four term-independent (index-wide) statistics parameters as follows:<br />
<br />
* Term-dependent<br />
** <font size="4"><math>df(t, tw_k)</math></font>: the number of all documents containing term t within time window k.<br />
** <font size="4"><math>df(t, tw_k')</math></font>: the number of fresh documents containing term t within time window k.<br />
** <font size="4"><math>tf(t, tw_k)</math></font>: the number of occurrences of term t in all documents within time window k.<br />
** <font size="4"><math>tf(t, tw_k')</math></font>: the number of occurrences of term t in fresh documents within time window k.<br />
<br />
* Term-independent (index-wide)<br />
** <font size="4"><math>df(tw_k)</math></font>: the number of all documents within time window k.<br />
** <font size="4"><math>df(tw_k')</math></font>: the number of fresh documents within time window k.<br />
** <font size="4"><math>tf(tw_k)</math></font>: the number of all terms in all documents within time window k.<br />
** <font size="4"><math>tf(tw_k')</math></font>: the number of all terms in fresh documents within time window k.<br />
<br />
By ''fresh'' documents, we mean the documents that were newly updated or appeared in the given time window. Thus, the fresh documents do not appear any previous time windows.<br />
<br />
We combine statistics for time windows ''i''~''j'' as follows:<br />
<br />
*<math>df(t, tw_{i\sim j}) = df(t, tw_i) + \sum_{k=i+1}^j df(t, tw_k')</math><br />
*<math>tf(t, tw_{i\sim j}) = tf(t, tw_i) + \sum_{k=i+1}^j tf(t, tw_k')</math><br />
*<math>df(tw_{i\sim j}) = df(tw_i) + \sum_{k=i+1}^j df(tw_k')</math><br />
*<math>tf(tw_{i\sim j}) = tf(tw_i) + \sum_{k=i+1}^j tf(tw_k')</math></div>Scsong