Webarc:Temporally Anchored Scoring Experiments: Difference between revisions
From Adapt
No edit summary |
No edit summary |
||
Line 19: | Line 19: | ||
==Experiments== | ==Experiments== | ||
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83. | The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers. When scoring relevance, approximated query / document statistics are used. For example, if a query time span overlaps a part of two time windows, we use, for scoring, the document and term statistics found in the entire two time windows, rather than only in the query time span. In the case where the time window size is a single month, the query time span always entirely overlaps with time windows, and therefore we use the exact statistics found within the query time span. | ||
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table entries, the result set qts.''i''_tw.''j'' refers to the result set for the query time span size ''i'' and the time window size ''j''. The size of each result set varies, depending on the query time span size. For example, qts.''1''_tw.''x'' has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.''83''_tw.''x'' has search results for only 223 (= 223 x 1) queries. | |||
{| class="wikitable" style="margin: 1em auto 1em auto" border="1" | |||
|- | |||
| || 1 || 2 || 4 || 8 || 16 || 32 || 64 || 83 | |||
|- | |||
| 1 || qts.1_tw.1 || qts.2_tw.1 || qts.4_tw.1 || qts.8_tw.1 || qts.16_tw.1 || qts.32_tw.1 || qts.64_tw.1 || qts.83_tw.1 | |||
|- | |||
| 2 || qts.1_tw.2 || qts.2_tw.2 || qts.4_tw.2 || qts.8_tw.2 || qts.16_tw.2 || qts.32_tw.2 || qts.64_tw.2 || qts.83_tw.2 | |||
|- | |||
| 4 || qts.1_tw.4 || qts.2_tw.4 || qts.4_tw.4 || qts.8_tw.4 || qts.16_tw.4 || qts.32_tw.4 || qts.64_tw.4 || qts.83_tw.4 | |||
|- | |||
| 8 || qts.1_tw.8 || qts.2_tw.8 || qts.4_tw.8 || qts.8_tw.8 || qts.16_tw.8 || qts.32_tw.8 || qts.64_tw.8 || qts.83_tw.8 | |||
|- | |||
| 16 || qts.1_tw.16 || qts.2_tw.16 || qts.4_tw.16 || qts.8_tw.16 || qts.16_tw.16 || qts.32_tw.16 || qts.64_tw.16 || qts.83_tw.16 | |||
|- | |||
| 32 || qts.1_tw.32 || qts.2_tw.32 || qts.4_tw.32 || qts.8_tw.32 || qts.16_tw.32 || qts.32_tw.32 || qts.64_tw.32 || qts.83_tw.32 | |||
|- | |||
| 64 || qts.1_tw.64 || qts.2_tw.64 || qts.4_tw.64 || qts.8_tw.64 || qts.16_tw.64 || qts.32_tw.64 || qts.64_tw.64 || qts.83_tw.64 | |||
|- | |||
| 83 || qts.1_tw.83 || qts.2_tw.83 || qts.4_tw.83 || qts.8_tw.83 || qts.16_tw.83 || qts.32_tw.83 || qts.64_tw.83 || qts.83_tw.83 | |||
|} | |||
In the following table, each row represents an experiment session with a given time window size. | |||
For example, for an experiment session with the time window size 8, | |||
18,509 | |||
==Analysis== | |||
==Further Information== | ==Further Information== | ||
* [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]] | * [[Webarc:Input Dataset Statistics |[1] Input Dataset Statistics]] | ||
* [[Webarc:Temporal Query Load |[2] Temporal Query Load]] | |||
* [[Webarc:Tools Developed |[2] Tools Developed]] | * [[Webarc:Tools Developed |[2] Tools Developed]] |
Revision as of 20:09, 18 November 2009
Goals
There are two main goals that we want to achieve through the following experiments.
- For time-constrained queries, we examine how temporally-anchored scoring can affect the search results in real-world time series data.
- For time-constrained queries with a time window indexing scheme, we examine how approximating scoring parameters can affect the search results, depending on the time window size.
Input Dataset
We preprocess the entire revision history of the English Wikipedia from 2001 to 2007. After preprocessing, we obtain 84 monthly snapshots starting from January 2001 ending in December 2007. Included in each monthly snapshot are the latest revision of existing articles at the end of the month. For example, the Wikipedia article 'Economy of the United States' created on August 21 2002 is included in the six monthly snapshots of August 2002, September 2002, ..., January 2003 since there is no newer revision made until February 7 2003, whereas the same article revised on August 16 2002 is not included in any of the snapshots.
Statistics of monthly snapshots
Queries
Based on the AOL query log made briefly available in 2005, we build our temporal query load by extracting multi-term query phrases where the user selected an English Wikipedia article among the search results. Note that the different temporal contexts give different relative weights to each of the query terms, yielding different result rankings. In other words, for a query phrase with term t1 and t2, t1 may be given more weight than t2 at a certain query time span qts1, but it may be opposite at a different query time span qts2. This means that two document versions that belong to both qts1 and qts2 can have different rankings depending on the specified query time span. However, in the case of single-term queries, any two document versions will retain their ranking regardless of the specified query time span, as long as both belong to the query time span. In our experiments, we highlight the impact on the search results from different query time spans by focusing only on multi-term queries. The more general case where single-term queries are also included will be able to be inferred from the results from query log analysis studies that report about 80% of web queries are multi-termed.
Once we extract qualified query phrases, we further filter out less frequently appearing phrases. In particular, we take the query phrases that appear 10 times or more in the log. As a result, we gather a total of 223 such query phrases.
Using the 223 query phrases, we now construct a temporal query load by combining query time spans with each of the query phrases. Specifically, we use 8 query time span lengths: 1, 2, 4, 8, 16, 32, 64, and 83 months. For each query time span length, we make all possible query time spans, starting from the first month. For example, for the query time span length of two months, we make 82 query time spans: (February 1st 2001 ~ March 31st 2001), (March 1st 2001 ~ April 30th 2001), ..., (November 1st 2007 ~ December 31st 2007). A total of 462 (= 83 + 82 + 80 + 76 + 68 + 52 + 20 + 1) different query time spans are resulted. In the end, we generate 103,026 (= 223 x 462) queries in each experiment session.
The complete list of query terms and query time spans
Experiments
The experiments consist of eight experiment sessions, each with one of the following time window sizes: 1, 2, 4, 8, 16, 32, 64, and 83 month. For each experiment session, the temporal query load described above is fed into our search servers. When scoring relevance, approximated query / document statistics are used. For example, if a query time span overlaps a part of two time windows, we use, for scoring, the document and term statistics found in the entire two time windows, rather than only in the query time span. In the case where the time window size is a single month, the query time span always entirely overlaps with time windows, and therefore we use the exact statistics found within the query time span.
Once we run all the eight experiment sessions, we obtain 64 result sets as shown in the table below. In the table entries, the result set qts.i_tw.j refers to the result set for the query time span size i and the time window size j. The size of each result set varies, depending on the query time span size. For example, qts.1_tw.x has search results for 18,509 (= 223 query phrases x 83 query time spans) queries, while qts.83_tw.x has search results for only 223 (= 223 x 1) queries.
1 | 2 | 4 | 8 | 16 | 32 | 64 | 83 | |
1 | qts.1_tw.1 | qts.2_tw.1 | qts.4_tw.1 | qts.8_tw.1 | qts.16_tw.1 | qts.32_tw.1 | qts.64_tw.1 | qts.83_tw.1 |
2 | qts.1_tw.2 | qts.2_tw.2 | qts.4_tw.2 | qts.8_tw.2 | qts.16_tw.2 | qts.32_tw.2 | qts.64_tw.2 | qts.83_tw.2 |
4 | qts.1_tw.4 | qts.2_tw.4 | qts.4_tw.4 | qts.8_tw.4 | qts.16_tw.4 | qts.32_tw.4 | qts.64_tw.4 | qts.83_tw.4 |
8 | qts.1_tw.8 | qts.2_tw.8 | qts.4_tw.8 | qts.8_tw.8 | qts.16_tw.8 | qts.32_tw.8 | qts.64_tw.8 | qts.83_tw.8 |
16 | qts.1_tw.16 | qts.2_tw.16 | qts.4_tw.16 | qts.8_tw.16 | qts.16_tw.16 | qts.32_tw.16 | qts.64_tw.16 | qts.83_tw.16 |
32 | qts.1_tw.32 | qts.2_tw.32 | qts.4_tw.32 | qts.8_tw.32 | qts.16_tw.32 | qts.32_tw.32 | qts.64_tw.32 | qts.83_tw.32 |
64 | qts.1_tw.64 | qts.2_tw.64 | qts.4_tw.64 | qts.8_tw.64 | qts.16_tw.64 | qts.32_tw.64 | qts.64_tw.64 | qts.83_tw.64 |
83 | qts.1_tw.83 | qts.2_tw.83 | qts.4_tw.83 | qts.8_tw.83 | qts.16_tw.83 | qts.32_tw.83 | qts.64_tw.83 | qts.83_tw.83 |
In the following table, each row represents an experiment session with a given time window size.
For example, for an experiment session with the time window size 8,
18,509