Personal tools

Webarc:Trec Eval (modified)

From Adapt

Revision as of 01:32, 10 November 2009 by Scsong (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

What It Does

We added Kendall's Tau (http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient) support in the existing Trec_Eval tool (http://trec.nist.gov/trec_eval/).

How To Build

In the Trec_Eval directory,

make


How To Run

Below is copied from the README file that came with trec_eval.8.1

trec_eval [-h] [-q] [-a] [-o] [-c] [-l<num>  [-N<num>] [-M<num>] [-Ua<num>] [-Ub<num>] [-Uc<num>] [-Ud<num>] [-T] trec_rel_file trec_top_file 

Calculate and print various evaluation measures, evaluating the results in trec_top_file against the relevance judgements in trec_rel_file.

There are a fair number of options, of which only the lower case options are normally ever used.

-h: Print full help message and exit 
-q: In addition to summary evaluation, give evaluation for each query 
-a: Print all evaluation measures calculated, instead of just the 
    main official measures for TREC. 
-o: Print everything out in old, nonrelational format (default is relational) 
-c: Average over the complete set of queries in the relevance judgements  
     instead of the queries in the intersection of relevance judgements 
     and results.  Missing queries will contribute a value of 0 to all 
     evaluation measures (which may or may not be reasonable for a  
     particular evaluation measure, but is reasonable for standard TREC 
     measures.) 
-l<num>: Num indicates the minimum relevance judgement value needed for 
     a document to be called relevant. (All measures used by TREC eval are 
     based on binary relevance).  Used if trec_rel_file contains relevance 
     judged on a multi-relevance scale.  Default is 1. 
-N<num>: Number of docs in collection 
-M<num>: Max number of docs per topic to use in evaluation (discard rest). 
-Ua<num>: Value to use for 'a' coefficient of utility computation. 
                       relevant  nonrelevant 
     retrieved            a          b 
     nonretrieved         c          d 
-Ub<num>: Value to use for 'b' coefficient of utility computation. 
-Uc<num>: Value to use for 'c' coefficient of utility computation. 
-Ud<num>: Value to use for 'd' coefficient of utility computation. 
-J: Calculate all values only over the judged (either relevant or  
    nonrelevant) documents.  All unjudged documents are removed from the 
    retrieved set before any calculations (possibly leaving an empty set). 
    DO NOT USE, unless you really know what you're doing - very easy to get 
    reasonable looking, but invalid, numbers.  
-T: Treat similarity as time that document retrieved.  Compute 
     several time-based measures after ranking docs by time retrieved 
     (first doc (lowest sim) retrieved ranked highest).  
     Only done if -a selected. 


Input File

Example trec_rel_file:

3 Q0 Politics_of_the_Falkland_Islands04-29-2001 1 11.12743 qts.1_tw.1
3 Q0 Economy_of_the_Falkland_Islands04-29-2001 2 11.12743 qts.1_tw.1
3 Q0 History_of_the_Falkland_Islands04-29-2001 3 11.12743 qts.1_tw.1
3 Q0 Foreign_relations_of_Cyprus04-26-2001 4 8.11532 qts.1_tw.1
3 Q0 Economy_of_Finland04-29-2001 5 7.12460 qts.1_tw.1
3 Q0 Economy_of_the_Faroe_Islands04-29-2001 6 6.77037 qts.1_tw.1
3 Q0 Economy_of_Canada03-26-2001 7 5.95059 qts.1_tw.1
3 Q0 Economy_of_Germany03-10-2001 8 5.74032 qts.1_tw.1
3 Q0 Paul_Robeson04-20-2001 9 4.97028 qts.1_tw.1
3 Q0 Politics_of_Dominica04-26-2001 10 4.09356 qts.1_tw.1
3 Q0 Economy_of_the_Republic_of_the_Congo04-26-2001 11 3.78614 qts.1_tw.1
3 Q0 Economy_of_Ethiopia04-29-2001 12 3.74733 qts.1_tw.1
3 Q0 Politics_of_Cyprus04-26-2001 13 2.08520 qts.1_tw.1
4 Q0 Politics_of_the_Falkland_Islands04-29-2001 1 12.70288 qts.1_tw.1
4 Q0 Christ's_College,_Cambridge05-20-2001 2 12.24870 qts.1_tw.1
4 Q0 Politics_of_Gibraltar05-17-2001 3 12.16519 qts.1_tw.1
4 Q0 History_of_Hong_Kong05-04-2001 4 11.67782 qts.1_tw.1
4 Q0 Vittorio_De_Sica05-13-2001 5 11.17646 qts.1_tw.1
4 Q0 Aphex_Twin05-03-2001 6 11.02514 qts.1_tw.1

Example trec_top_filefiles:

3 Q0 Politics_of_Dominica04-26-2001 1 11.09356 qts.1_tw.1
3 Q0 Paul_Robeson04-20-2001 2 10.97028 qts.1_tw.1
3 Q0 Economy_of_Germany03-10-2001 3 9.74032 qts.1_tw.1
3 Q0 Economy_of_Canada03-26-2001 4 8.95059 qts.1_tw.1
3 Q0 Economy_of_the_Faroe_Islands04-29-2001 5 7.77037 qts.1_tw.1
3 Q0 Economy_of_Finland04-29-2001 6 6.12460 qts.1_tw.1
3 Q0 Foreign_relations_of_Cyprus04-26-2001 7 5.11532 qts.1_tw.1
3 Q0 History_of_the_Falkland_Islands04-29-2001 8 4.37087 qts.1_tw.1
3 Q0 Economy_of_the_Falkland_Islands04-29-2001 9 3.01286 qts.1_tw.1
3 Q0 Politics_of_the_Falkland_Islands04-29-2001 10 2.12743 qts.1_tw.1
3 Q0 Politics_of_Cyprus04-26-2001 13 2.08520 qts.1_tw.1
3 Q0 Economy_of_Ethiopia04-29-2001 12 1.74733 qts.1_tw.1
3 Q0 Economy_of_the_Republic_of_the_Congo04-26-2001 11 0.78614 qts.1_tw.1
4 Q0 Politics_of_the_Falkland_Islands04-29-2001 1 12.70288 qts.1_tw.1
4 Q0 Christ's_College,_Cambridge05-20-2001 2 12.24870 qts.1_tw.1
4 Q0 Politics_of_Gibraltar05-17-2001 3 12.16519 qts.1_tw.1
4 Q0 History_of_Hong_Kong05-04-2001 4 11.67782 qts.1_tw.1
4 Q0 Vittorio_De_Sica05-13-2001 5 11.17646 qts.1_tw.1
4 Q0 Aphex_Twin05-03-2001 6 11.02514 qts.1_tw.1

Output Files

Example output file for the example input files above (Note Kendall's Tau's in the middle):

num_q          	all	2
num_ret        	all	19
num_rel        	all	19
num_rel_ret    	all	19
map            	all	1.0000
gm_ap          	all	1.0000
R-prec         	all	1.0000
bpref          	all	1.0000
recip_rank     	all	1.0000
num_nonrel_judged_ret	all	0
exact_prec     	all	1.0000
exact_recall   	all	1.0000
11-pt_avg      	all	1.0000
3-pt_avg       	all	1.0000
avg_doc_prec   	all	1.0000
exact_relative_prec	all	1.0000
avg_relative_prec	all	1.0000
exact_unranked_avg_prec	all	1.0000
exact_relative_unranked_avg_prec	all	1.0000
map_at_R       	all	0.8782
int_map        	all	1.0000
exact_int_R_rcl_prec	all	1.0000
int_map_at_R   	all	0.8782
bpref_allnonrel	all	1.0000
bpref_retnonrel	all	1.0000
bpref_topnonrel	all	1.0000
bpref_top5Rnonrel	all	   nan
bpref_top10Rnonrel	all	   nan
bpref_top10pRnonrel	all	1.0000
bpref_top25pRnonrel	all	1.0000
bpref_top50pRnonrel	all	1.0000
bpref_top25p2Rnonrel	all	1.0000
bpref_retall   	all	1.0000
bpref_5        	all	1.0000
bpref_10       	all	1.0000
bpref_num_all  	all	0.0000
bpref_num_ret  	all	0.0000
bpref_num_correct	all	0
bpref_num_possible	all	0
old_bpref      	all	1.0000
old_bpref_top10pRnonrel	all	1.0000
infAP          	all	1.0000
gm_bpref       	all	1.0000
ircl_prn.0.00  	all	1.0000
ircl_prn.0.10  	all	1.0000
ircl_prn.0.20  	all	1.0000
ircl_prn.0.30  	all	1.0000
ircl_prn.0.40  	all	1.0000
ircl_prn.0.50  	all	1.0000
ircl_prn.0.60  	all	1.0000
ircl_prn.0.70  	all	1.0000
ircl_prn.0.80  	all	1.0000
ircl_prn.0.90  	all	1.0000
ircl_prn.1.00  	all	1.0000
P5             	all	1.0000
P10            	all	0.8000
P20            	all	0.4750
P30            	all	0.3167
P50            	all	0.1900
P100           	all	0.0950
recall5        	all	0.6090
recall10       	all	0.8846
recall20       	all	1.0000
recall30       	all	1.0000
recall50       	all	1.0000
recall100      	all	1.0000
0.20R-prec     	all	1.0000
0.40R-prec     	all	1.0000
0.60R-prec     	all	1.0000
0.80R-prec     	all	1.0000
1.00R-prec     	all	1.0000
1.20R-prec     	all	0.7812
1.40R-prec     	all	0.6754
1.60R-prec     	all	0.6095
1.80R-prec     	all	0.5436
2.00R-prec     	all	0.5000
relative_prec5 	all	1.0000
relative_prec10	all	1.0000
relative_prec20	all	1.0000
relative_prec30	all	1.0000
relative_prec50	all	1.0000
relative_prec100	all	1.0000
unranked_avg_prec5	all	0.6090
unranked_avg_prec10	all	0.6846
unranked_avg_prec20	all	0.4750
unranked_avg_prec30	all	0.3167
unranked_avg_prec50	all	0.1900
unranked_avg_prec100	all	0.0950
relative_unranked_avg_prec5	all	1.0000
relative_unranked_avg_prec10	all	0.6800
relative_unranked_avg_prec20	all	0.2562
relative_unranked_avg_prec30	all	0.1139
relative_unranked_avg_prec50	all	0.0410
relative_unranked_avg_prec100	all	0.0102
Kendall's_tau5 	all	0.0000
Kendall's_tau10	all	0.0000
Kendall's_tau20	all	0.3846
Kendall's_tau30	all	0.3846
Kendall's_tau50	all	0.3846
Kendall's_tau100	all	0.3846
utility_1.0_-1.0_0.0_0.0	all	9.5000
rcl_at_142_nonrel	all	1.0000
fallout_recall_0	all	0.0000
fallout_recall_14	all	1.0000
fallout_recall_28	all	1.0000
fallout_recall_42	all	1.0000
fallout_recall_56	all	1.0000
fallout_recall_71	all	1.0000
fallout_recall_85	all	1.0000
fallout_recall_99	all	1.0000
fallout_recall_113	all	1.0000
fallout_recall_127	all	1.0000
fallout_recall_142	all	1.0000
int_0.20R-prec 	all	1.0000
int_0.40R-prec 	all	1.0000
int_0.60R-prec 	all	1.0000
int_0.80R-prec 	all	1.0000
int_1.00R-prec 	all	1.0000
int_1.20R-prec 	all	0.7812
int_1.40R-prec 	all	0.6754
int_1.60R-prec 	all	0.6095
int_1.80R-prec 	all	0.5436
int_2.00R-prec 	all	0.5000
micro_prec     	all	1.0000
micro_recall   	all	1.0000
micro_bpref    	all	   nan

Notes

Use -a option to generate Kendall's Tau values.

Source Codes

svn co http://narasvn.umiacs.umd.edu/repository/src/webarc/trec_eval.8.1