MeasuresΒΆ
Measure objects speficy the measure to calculate, along with any
parameters they have. (They do not define the implementation β thatβs the job of a
Provider.)
This page provides a list of the Measures that are available in this package.
AccuracyΒΆ
Reports the probability that a relevant document is ranked before a non relevant one. This metric purpose is to be used for diagnosis (checking that train/test/validation accuracy match). As such, it only considers relevant documents which are within the returned ones.
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)
Supported by:
accuracy:
Accuracy(rel=ANY)@ANY
alpha_DCGΒΆ
A version of DCG that accounts for multiple possible query intents.
Citation
Clarke et al. Novelty and diversity in information retrieval evaluation. SIGIR 2008. [link]
@inproceedings{DBLP:conf/sigir/ClarkeKCVABM08,
author = {Charles L. A. Clarke and
Maheedhar Kolla and
Gordon V. Cormack and
Olga Vechtomova and
Azin Ashkan and
Stefan B{\"{u}}ttcher and
Ian MacKinnon},
editor = {Sung{-}Hyon Myaeng and
Douglas W. Oard and
Fabrizio Sebastiani and
Tat{-}Seng Chua and
Mun{-}Kew Leong},
title = {Novelty and diversity in information retrieval evaluation},
booktitle = {Proceedings of the 31st Annual International {ACM} {SIGIR} Conference
on Research and Development in Information Retrieval, {SIGIR} 2008,
Singapore, July 20-24, 2008},
pages = {659--666},
publisher = {{ACM}},
year = {2008},
url = {https://doi.org/10.1145/1390334.1390446},
doi = {10.1145/1390334.1390446},
timestamp = {Sun, 25 Oct 2020 23:03:58 +0100},
biburl = {https://dblp.org/rec/conf/sigir/ClarkeKCVABM08.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)alpha(float) - Redundancy intolerancejudged_only(bool) - calculate measure using only judged documents (i.e., discard unjudged documents)
Supported by:
pyndeval:
alpha_DCG(alpha=ANY,rel=ANY,judged_only=ANY)@ANY
alpha_nDCGΒΆ
A version of nDCG that accounts for multiple possible query intents.
Citation
Clarke et al. Novelty and diversity in information retrieval evaluation. SIGIR 2008. [link]
@inproceedings{DBLP:conf/sigir/ClarkeKCVABM08,
author = {Charles L. A. Clarke and
Maheedhar Kolla and
Gordon V. Cormack and
Olga Vechtomova and
Azin Ashkan and
Stefan B{\"{u}}ttcher and
Ian MacKinnon},
editor = {Sung{-}Hyon Myaeng and
Douglas W. Oard and
Fabrizio Sebastiani and
Tat{-}Seng Chua and
Mun{-}Kew Leong},
title = {Novelty and diversity in information retrieval evaluation},
booktitle = {Proceedings of the 31st Annual International {ACM} {SIGIR} Conference
on Research and Development in Information Retrieval, {SIGIR} 2008,
Singapore, July 20-24, 2008},
pages = {659--666},
publisher = {{ACM}},
year = {2008},
url = {https://doi.org/10.1145/1390334.1390446},
doi = {10.1145/1390334.1390446},
timestamp = {Sun, 25 Oct 2020 23:03:58 +0100},
biburl = {https://dblp.org/rec/conf/sigir/ClarkeKCVABM08.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)alpha(float) - Redundancy intolerancejudged_only(bool) - calculate measure using only judged documents (i.e., discard unjudged documents)
Supported by:
pyndeval:
alpha_nDCG(alpha=ANY,rel=ANY,judged_only=ANY)@ANY
APΒΆ
The [Mean] Average Precision ([M]AP). The average precision of a single query is the mean of the precision scores at each relevant item returned in a search results list.
AP is typically used for adhoc ranking tasks where getting as many relevant items as possible is. It is commonly referred to as MAP, by taking the mean of AP over the query set.
Citation
Harman. Evaluation Issues in Information Retrieval. Inf. Process. Manag. 1992. [link]
@article{DBLP:journals/ipm/Harman92,
author = {Donna Harman},
title = {Evaluation Issues in Information Retrieval},
journal = {Inf. Process. Manag.},
volume = {28},
number = {4},
pages = {439--440},
year = {1992},
url = {https://doi.org/10.1016/0306-4573(92)90001-G},
doi = {10.1016/0306-4573(92)90001-G},
timestamp = {Fri, 21 Feb 2020 13:11:30 +0100},
biburl = {https://dblp.org/rec/journals/ipm/Harman92.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
cwl_eval:
AP(rel=ANY,judged_only=False)@NOT_PROVIDEDpytrec_eval:
AP(rel=ANY,judged_only=ANY)@ANYtrectools:
AP(rel=1,judged_only=False)@ANYranx:
AP(rel=ANY,judged_only=False)@ANY
AP_IAΒΆ
Intent-aware (Mean) Average Precision
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - calculate measure using only judged documents (i.e., discard unjudged documents)
Supported by:
pyndeval:
AP_IA(rel=ANY,judged_only=ANY)
BPMΒΆ
The Bejeweled Player Model (BPM).
Citation
Zhang et al. Evaluating Web Search with a Bejeweled Player Model. SIGIR 2017. [link]
@inproceedings{DBLP:conf/sigir/ZhangLLZXM17,
author = {Fan Zhang and
Yiqun Liu and
Xin Li and
Min Zhang and
Yinghui Xu and
Shaoping Ma},
editor = {Noriko Kando and
Tetsuya Sakai and
Hideo Joho and
Hang Li and
Arjen P. de Vries and
Ryen W. White},
title = {Evaluating Web Search with a Bejeweled Player Model},
booktitle = {Proceedings of the 40th International {ACM} {SIGIR} Conference on
Research and Development in Information Retrieval, Shinjuku, Tokyo,
Japan, August 7-11, 2017},
pages = {425--434},
publisher = {{ACM}},
year = {2017},
url = {https://doi.org/10.1145/3077136.3080841},
doi = {10.1145/3077136.3080841},
timestamp = {Tue, 15 Nov 2022 13:06:00 +0100},
biburl = {https://dblp.org/rec/conf/sigir/ZhangLLZXM17.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdT(float) - total desired gain (normalized)min_rel(int) - minimum relevance scoremax_rel(int) - maximum relevance score
Supported by:
cwl_eval:
BPM(T=ANY,min_rel=ANY,max_rel=REQUIRED)@ANY
BprefΒΆ
Binary Preference (Bpref). This measure examines the relative ranks of judged relevant and non-relevant documents. Non-judged documents are not considered.
Citation
Buckley and Voorhees. Retrieval evaluation with incomplete information. SIGIR 2004. [link]
@inproceedings{DBLP:conf/sigir/BuckleyV04,
author = {Chris Buckley and
Ellen M. Voorhees},
editor = {Mark Sanderson and
Kalervo J{\"{a}}rvelin and
James Allan and
Peter Bruza},
title = {Retrieval evaluation with incomplete information},
booktitle = {{SIGIR} 2004: Proceedings of the 27th Annual International {ACM} {SIGIR}
Conference on Research and Development in Information Retrieval, Sheffield,
UK, July 25-29, 2004},
pages = {25--32},
publisher = {{ACM}},
year = {2004},
url = {https://doi.org/10.1145/1008992.1009000},
doi = {10.1145/1008992.1009000},
timestamp = {Thu, 14 Oct 2021 10:27:19 +0200},
biburl = {https://dblp.org/rec/conf/sigir/BuckleyV04.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)
Supported by:
pytrec_eval:
Bpref(rel=ANY)trectools:
Bpref(rel=1)
CompatΒΆ
Compatibility measure desribed in:
Citation
Clarke et al. Assessing Top- Preferences. ACM Trans. Inf. Syst. 2021. [link]
@article{DBLP:journals/tois/ClarkeVS21,
author = {Charles L. A. Clarke and
Alexandra Vtyurina and
Mark D. Smucker},
title = {Assessing Top- Preferences},
journal = {{ACM} Trans. Inf. Syst.},
volume = {39},
number = {3},
pages = {33:1--33:21},
year = {2021},
url = {https://doi.org/10.1145/3451161},
doi = {10.1145/3451161},
timestamp = {Sat, 09 Apr 2022 12:20:33 +0200},
biburl = {https://dblp.org/rec/journals/tois/ClarkeVS21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
p(float) - persistencenormalize(bool) - apply normalization for finite ideal rankings
Supported by:
compat:
Compat(p=ANY,normalize=ANY)
ERRΒΆ
The Expected Reciprocal Rank (ERR) is a precision-focused measure. In essence, an extension of reciprocal rank that encapsulates both graded relevance and a more realistic cascade-based user model of how users brwose a ranking.
Parameters:
cutoff(int) - ranking cutoff threshold
Supported by:
gdeval:
ERR@REQUIRED
ERR_IAΒΆ
Intent-Aware Expected Reciprocal Rank with collection-independent normalisation.
Citation
Chapelle et al. Expected reciprocal rank for graded relevance. CIKM 2009. [link]
@inproceedings{DBLP:conf/cikm/ChapelleMZG09,
author = {Olivier Chapelle and
Donald Metlzer and
Ya Zhang and
Pierre Grinspan},
editor = {David Wai{-}Lok Cheung and
Il{-}Yeol Song and
Wesley W. Chu and
Xiaohua Hu and
Jimmy Lin},
title = {Expected reciprocal rank for graded relevance},
booktitle = {Proceedings of the 18th {ACM} Conference on Information and Knowledge
Management, {CIKM} 2009, Hong Kong, China, November 2-6, 2009},
pages = {621--630},
publisher = {{ACM}},
year = {2009},
url = {https://doi.org/10.1145/1645953.1646033},
doi = {10.1145/1645953.1646033},
timestamp = {Mon, 11 Mar 2024 13:45:28 +0100},
biburl = {https://dblp.org/rec/conf/cikm/ChapelleMZG09.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - calculate measure using only judged documents (i.e., discard unjudged documents)
Supported by:
pyndeval:
ERR_IA(rel=ANY,judged_only=ANY)@ANY
infAPΒΆ
Inferred AP. AP implementation that accounts for pooled-but-unjudged documents by assuming that they are relevant at the same proportion as other judged documents. Essentially, skips documents that were pooled-but-not-judged, and assumes unjudged are non-relevant.
Pooled-but-unjudged indicated by a score of -1, by convention. Note that not all qrels use this convention.
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)
Supported by:
pytrec_eval:
infAP(rel=ANY)
INSQΒΆ
INSQ
Citation
Moffat et al. Models and metrics: IR evaluation as a user process. ADCS 2012. [link]
@inproceedings{DBLP:conf/adcs/MoffatST12,
author = {Alistair Moffat and
Falk Scholer and
Paul Thomas},
editor = {Andrew Trotman and
Sally Jo Cunningham and
Laurianne Sitbon},
title = {Models and metrics: {IR} evaluation as a user process},
booktitle = {The Seventeenth Australasian Document Computing Symposium, {ADCS}
'12, Dunedin, New Zealand, December 5-6, 2012},
pages = {47--54},
publisher = {{ACM}},
year = {2012},
url = {https://doi.org/10.1145/2407085.2407092},
doi = {10.1145/2407085.2407092},
timestamp = {Mon, 26 Jun 2023 20:48:56 +0200},
biburl = {https://dblp.org/rec/conf/adcs/MoffatST12.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
T(float) - total desired gain (normalized)min_rel(int) - minimum relevance scoremax_rel(int) - maximum relevance score
Supported by:
cwl_eval:
INSQ(T=ANY,min_rel=ANY,max_rel=REQUIRED)
INSTΒΆ
INST, a variant of INSQ
Citation
Bailey et al. User Variability and IR System Evaluation. SIGIR 2015. [link]
@inproceedings{DBLP:conf/sigir/BaileyMST15,
author = {Peter Bailey and
Alistair Moffat and
Falk Scholer and
Paul Thomas},
editor = {Ricardo Baeza{-}Yates and
Mounia Lalmas and
Alistair Moffat and
Berthier A. Ribeiro{-}Neto},
title = {User Variability and {IR} System Evaluation},
booktitle = {Proceedings of the 38th International {ACM} {SIGIR} Conference on
Research and Development in Information Retrieval, Santiago, Chile,
August 9-13, 2015},
pages = {625--634},
publisher = {{ACM}},
year = {2015},
url = {https://doi.org/10.1145/2766462.2767728},
doi = {10.1145/2766462.2767728},
timestamp = {Mon, 26 Jun 2023 20:45:16 +0200},
biburl = {https://dblp.org/rec/conf/sigir/BaileyMST15.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
T(float) - total desired gain (normalized)min_rel(int) - minimum relevance scoremax_rel(int) - maximum relevance score
Supported by:
cwl_eval:
INST(T=ANY,min_rel=ANY,max_rel=REQUIRED)
IPrecΒΆ
Interpolated Precision at a given recall cutoff. Used for building precision-recall graphs. Unlike most measures, where @ indicates an absolute cutoff threshold, here @ sets the recall cutoff.
Parameters:
recall(float) - recall thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
pytrec_eval:
IPrec(judged_only=ANY)@ANY
JudgedΒΆ
Percentage of results in the top k (cutoff) results that have relevance judgments. Equivalent to P@k with a rel lower than any judgment.
Parameters:
cutoff(int) - ranking cutoff threshold
Supported by:
judged:
Judged@ANY
nDCGΒΆ
The normalized Discounted Cumulative Gain (nDCG). Uses graded labels - systems that put the highest graded documents at the top of the ranking. It is normalized wrt. the Ideal NDCG, i.e. documents ranked in descending order of graded label.
Citation
JΓ€rvelin and KekΓ€lΓ€inen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 2002. [link]
@article{DBLP:journals/tois/JarvelinK02,
author = {Kalervo J{\"{a}}rvelin and
Jaana Kek{\"{a}}l{\"{a}}inen},
title = {Cumulated gain-based evaluation of {IR} techniques},
journal = {{ACM} Trans. Inf. Syst.},
volume = {20},
number = {4},
pages = {422--446},
year = {2002},
url = {http://doi.acm.org/10.1145/582415.582418},
doi = {10.1145/582415.582418},
timestamp = {Fri, 09 Jun 2017 11:03:19 +0200},
biburl = {https://dblp.org/rec/journals/tois/JarvelinK02.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholddcg(str) - DCG formulationgains(dict) - custom gain mapping (int-to-int)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
pytrec_eval:
nDCG(dcg='log2',gains=ANY,judged_only=ANY)@ANYgdeval:
nDCG(dcg='exp-log2',gains=NOT_PROVIDED,judged_only=False)@REQUIREDtrectools:
nDCG(dcg=ANY,gains=NOT_PROVIDED,judged_only=False)@ANYranx:
nDCG(dcg=('log2', 'exp-log2'),gains=NOT_PROVIDED,judged_only=False)@ANY
NERR10ΒΆ
Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure, version from Equation (10) of the the following paper.
Citation
Azzopardi et al. ERR is not C/W/L: Exploring the Relationship Between Expected Reciprocal Rank and Other Metrics. ICTIR 2021. [link]
@inproceedings{DBLP:conf/ictir/AzzopardiMM21,
author = {Leif Azzopardi and
Joel Mackenzie and
Alistair Moffat},
editor = {Faegheh Hasibi and
Yi Fang and
Akiko Aizawa},
title = {{ERR} is not {C/W/L:} Exploring the Relationship Between Expected
Reciprocal Rank and Other Metrics},
booktitle = {{ICTIR} '21: The 2021 {ACM} {SIGIR} International Conference on the
Theory of Information Retrieval, Virtual Event, Canada, July 11, 2021},
pages = {231--237},
publisher = {{ACM}},
year = {2021},
url = {https://doi.org/10.1145/3471158.3472239},
doi = {10.1145/3471158.3472239},
timestamp = {Fri, 10 Sep 2021 14:39:10 +0200},
biburl = {https://dblp.org/rec/conf/ictir/AzzopardiMM21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
p(float) - persistencemin_rel(int) - minimum relevance scoremax_rel(int) - maximum relevance score
Supported by:
cwl_eval:
NERR10(p=ANY,min_rel=ANY,max_rel=REQUIRED)
NERR11ΒΆ
Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure, version from Equation (12) of the the following paper.
Citation
Azzopardi et al. ERR is not C/W/L: Exploring the Relationship Between Expected Reciprocal Rank and Other Metrics. ICTIR 2021. [link]
@inproceedings{DBLP:conf/ictir/AzzopardiMM21,
author = {Leif Azzopardi and
Joel Mackenzie and
Alistair Moffat},
editor = {Faegheh Hasibi and
Yi Fang and
Akiko Aizawa},
title = {{ERR} is not {C/W/L:} Exploring the Relationship Between Expected
Reciprocal Rank and Other Metrics},
booktitle = {{ICTIR} '21: The 2021 {ACM} {SIGIR} International Conference on the
Theory of Information Retrieval, Virtual Event, Canada, July 11, 2021},
pages = {231--237},
publisher = {{ACM}},
year = {2021},
url = {https://doi.org/10.1145/3471158.3472239},
doi = {10.1145/3471158.3472239},
timestamp = {Fri, 10 Sep 2021 14:39:10 +0200},
biburl = {https://dblp.org/rec/conf/ictir/AzzopardiMM21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
T(float) - total desired gain (normalized)min_rel(int) - minimum relevance scoremax_rel(int) - maximum relevance score
Supported by:
cwl_eval:
NERR11(T=ANY,min_rel=ANY,max_rel=REQUIRED)
NERR8ΒΆ
Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure, version from Equation (8) of the the following paper.
Citation
Azzopardi et al. ERR is not C/W/L: Exploring the Relationship Between Expected Reciprocal Rank and Other Metrics. ICTIR 2021. [link]
@inproceedings{DBLP:conf/ictir/AzzopardiMM21,
author = {Leif Azzopardi and
Joel Mackenzie and
Alistair Moffat},
editor = {Faegheh Hasibi and
Yi Fang and
Akiko Aizawa},
title = {{ERR} is not {C/W/L:} Exploring the Relationship Between Expected
Reciprocal Rank and Other Metrics},
booktitle = {{ICTIR} '21: The 2021 {ACM} {SIGIR} International Conference on the
Theory of Information Retrieval, Virtual Event, Canada, July 11, 2021},
pages = {231--237},
publisher = {{ACM}},
year = {2021},
url = {https://doi.org/10.1145/3471158.3472239},
doi = {10.1145/3471158.3472239},
timestamp = {Fri, 10 Sep 2021 14:39:10 +0200},
biburl = {https://dblp.org/rec/conf/ictir/AzzopardiMM21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdmin_rel(int) - minimum relevance scoremax_rel(int) - maximum relevance score
Supported by:
cwl_eval:
NERR8(min_rel=ANY,max_rel=REQUIRED)@REQUIRED
NERR9ΒΆ
Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure, version from Equation (9) of the the following paper.
Citation
Azzopardi et al. ERR is not C/W/L: Exploring the Relationship Between Expected Reciprocal Rank and Other Metrics. ICTIR 2021. [link]
@inproceedings{DBLP:conf/ictir/AzzopardiMM21,
author = {Leif Azzopardi and
Joel Mackenzie and
Alistair Moffat},
editor = {Faegheh Hasibi and
Yi Fang and
Akiko Aizawa},
title = {{ERR} is not {C/W/L:} Exploring the Relationship Between Expected
Reciprocal Rank and Other Metrics},
booktitle = {{ICTIR} '21: The 2021 {ACM} {SIGIR} International Conference on the
Theory of Information Retrieval, Virtual Event, Canada, July 11, 2021},
pages = {231--237},
publisher = {{ACM}},
year = {2021},
url = {https://doi.org/10.1145/3471158.3472239},
doi = {10.1145/3471158.3472239},
timestamp = {Fri, 10 Sep 2021 14:39:10 +0200},
biburl = {https://dblp.org/rec/conf/ictir/AzzopardiMM21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdmin_rel(int) - minimum relevance scoremax_rel(int) - maximum relevance score
Supported by:
cwl_eval:
NERR9(min_rel=ANY,max_rel=REQUIRED)@REQUIRED
nERR_IAΒΆ
Intent-Aware Expected Reciprocal Rank with collection-dependent normalisation.
Citation
Chapelle et al. Expected reciprocal rank for graded relevance. CIKM 2009. [link]
@inproceedings{DBLP:conf/cikm/ChapelleMZG09,
author = {Olivier Chapelle and
Donald Metlzer and
Ya Zhang and
Pierre Grinspan},
editor = {David Wai{-}Lok Cheung and
Il{-}Yeol Song and
Wesley W. Chu and
Xiaohua Hu and
Jimmy Lin},
title = {Expected reciprocal rank for graded relevance},
booktitle = {Proceedings of the 18th {ACM} Conference on Information and Knowledge
Management, {CIKM} 2009, Hong Kong, China, November 2-6, 2009},
pages = {621--630},
publisher = {{ACM}},
year = {2009},
url = {https://doi.org/10.1145/1645953.1646033},
doi = {10.1145/1645953.1646033},
timestamp = {Mon, 11 Mar 2024 13:45:28 +0100},
biburl = {https://dblp.org/rec/conf/cikm/ChapelleMZG09.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - calculate measure using only judged documents (i.e., discard unjudged documents)
Supported by:
pyndeval:
nERR_IA(rel=ANY,judged_only=ANY)@ANY
nNRBPΒΆ
Novelty- and Rank-Biased Precision with collection-dependent normalisation.
Citation
Clarke et al. An Effectiveness Measure for Ambiguous and Underspecified Queries. ICTIR 2009. [link]
@inproceedings{DBLP:conf/ictir/ClarkeKV09,
author = {Charles L. A. Clarke and
Maheedhar Kolla and
Olga Vechtomova},
editor = {Leif Azzopardi and
Gabriella Kazai and
Stephen E. Robertson and
Stefan M. R{\"{u}}ger and
Milad Shokouhi and
Dawei Song and
Emine Yilmaz},
title = {An Effectiveness Measure for Ambiguous and Underspecified Queries},
booktitle = {Advances in Information Retrieval Theory, Second International Conference
on the Theory of Information Retrieval, {ICTIR} 2009, Cambridge, UK,
September 10-12, 2009, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {5766},
pages = {188--199},
publisher = {Springer},
year = {2009},
url = {https://doi.org/10.1007/978-3-642-04417-5\_17},
doi = {10.1007/978-3-642-04417-5\_17},
timestamp = {Sun, 25 Oct 2020 23:12:59 +0100},
biburl = {https://dblp.org/rec/conf/ictir/ClarkeKV09.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)alpha(float) - Redundancy intolerancebeta(float) - Patience
Supported by:
pyndeval:
nNRBP(alpha=ANY,beta=ANY,rel=ANY)
NRBPΒΆ
Novelty- and Rank-Biased Precision with collection-independent normalisation.
Citation
Clarke et al. An Effectiveness Measure for Ambiguous and Underspecified Queries. ICTIR 2009. [link]
@inproceedings{DBLP:conf/ictir/ClarkeKV09,
author = {Charles L. A. Clarke and
Maheedhar Kolla and
Olga Vechtomova},
editor = {Leif Azzopardi and
Gabriella Kazai and
Stephen E. Robertson and
Stefan M. R{\"{u}}ger and
Milad Shokouhi and
Dawei Song and
Emine Yilmaz},
title = {An Effectiveness Measure for Ambiguous and Underspecified Queries},
booktitle = {Advances in Information Retrieval Theory, Second International Conference
on the Theory of Information Retrieval, {ICTIR} 2009, Cambridge, UK,
September 10-12, 2009, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {5766},
pages = {188--199},
publisher = {Springer},
year = {2009},
url = {https://doi.org/10.1007/978-3-642-04417-5\_17},
doi = {10.1007/978-3-642-04417-5\_17},
timestamp = {Sun, 25 Oct 2020 23:12:59 +0100},
biburl = {https://dblp.org/rec/conf/ictir/ClarkeKV09.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)alpha(float) - Redundancy intolerancebeta(float) - Patience
Supported by:
pyndeval:
NRBP(alpha=ANY,beta=ANY,rel=ANY)
NumQΒΆ
The total number of queries.
Supported by:
pytrec_eval:
NumQ
NumRelΒΆ
The number of relevant documents the query has (independent of what the system retrieved).
Parameters:
rel(int) - minimum relevance score to be counted (inclusive)
Supported by:
pytrec_eval:
NumRel(rel=1)
NumRetΒΆ
The number of results returned. When rel is provided, counts the number of documents returned with at least that relevance score (inclusive).
Parameters:
rel(int) - minimum relevance score to be counted (inclusive), or all documents returned if NOT_PROVIDED
Supported by:
pytrec_eval:
NumRet(rel=ANY)ranx:
NumRet(rel=REQUIRED)
PΒΆ
Basic measure for that computes the percentage of documents in the top cutoff results that are labeled as relevant. cutoff is a required parameter, and can be provided as P@cutoff.
Citation
Rijsbergen. Information Retrieval. 1979.
@book{DBLP:books/bu/Rijsbergen79,
author = {C. J. van Rijsbergen},
title = {Information Retrieval},
publisher = {Butterworth},
year = {1979},
isbn = {0-408-70929-4},
timestamp = {Thu, 03 Jan 2002 11:51:10 +0100},
biburl = {https://dblp.org/rec/books/bu/Rijsbergen79.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
cwl_eval:
P(rel=ANY,judged_only=False)@ANYpytrec_eval:
P(rel=ANY,judged_only=ANY)@ANYtrectools:
P(rel=1,judged_only=False)@ANYranx:
P(rel=ANY,judged_only=False)@ANY
P_IAΒΆ
Intent-aware Precision@k.
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - calculate measure using only judged documents (i.e., discard unjudged documents)
Supported by:
pyndeval:
P_IA(rel=ANY,judged_only=ANY)@ANY
RΒΆ
Recall@k (R@k). The fraction of relevant documents for a query that have been retrieved by rank k.
NOTE: Some tasks define Recall@k as whether any relevant documents are found in the top k results. This software follows the TREC convention and refers to that measure as Success@k.
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
pytrec_eval:
R(judged_only=ANY)@ANYranx:
R(judged_only=False)@ANY
RBPΒΆ
The Rank-Biased Precision (RBP).
Citation
Moffat and Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 2008. [link]
@article{DBLP:journals/tois/MoffatZ08,
author = {Alistair Moffat and
Justin Zobel},
title = {Rank-biased precision for measurement of retrieval effectiveness},
journal = {{ACM} Trans. Inf. Syst.},
volume = {27},
number = {1},
pages = {2:1--2:27},
year = {2008},
url = {https://doi.org/10.1145/1416950.1416952},
doi = {10.1145/1416950.1416952},
timestamp = {Tue, 06 Nov 2018 12:51:56 +0100},
biburl = {https://dblp.org/rec/journals/tois/MoffatZ08.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdp(float) - persistencerel(int) - minimum relevance score to be considered relevant (inclusive), or NOT_PROVIDED to use graded relevance
Supported by:
RprecΒΆ
The precision at R, where R is the number of relevant documents for a given query. Has the cute property that it is also the recall at R.
Citation
Buckley and Voorhees. Retrieval System Evaluation. 2005. [link]
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
pytrec_eval:
Rprec(rel=ANY,judged_only=ANY)trectools:
Rprec(rel=1,judged_only=False)ranx:
Rprec(rel=ANY,judged_only=False)
RRΒΆ
The [Mean] Reciprocal Rank ([M]RR) is a precision-focused measure that scores based on the reciprocal of the rank of the highest-scoring relevance document. An optional cutoff can be provided to limit the depth explored. rel (default 1) controls which relevance level is considered relevant.
Citation
Kantor and Voorhees. The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text. Inf. Retr. 2000. [link]
@article{DBLP:journals/ir/KantorV00,
author = {Paul B. Kantor and
Ellen M. Voorhees},
title = {The {TREC-5} Confusion Track: Comparing Retrieval Methods for Scanned
Text},
journal = {Inf. Retr.},
volume = {2},
number = {2/3},
pages = {165--176},
year = {2000},
url = {https://doi.org/10.1023/A:1009902609570},
doi = {10.1023/A:1009902609570},
timestamp = {Thu, 14 Oct 2021 09:13:06 +0200},
biburl = {https://dblp.org/rec/journals/ir/KantorV00.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
cwl_eval:
RR(rel=ANY,judged_only=False)@NOT_PROVIDEDpytrec_eval:
RR(rel=ANY,judged_only=ANY)@NOT_PROVIDEDtrectools:
RR(rel=1,judged_only=False)@NOT_PROVIDEDmsmarco:
RR(rel=ANY,judged_only=False)@ANYranx:
RR(rel=ANY,judged_only=False)@NOT_PROVIDED
SDCGΒΆ
The Scaled Discounted Cumulative Gain (SDCG), a variant of nDCG that assumes more fully-relevant documents exist but are not labeled.
Parameters:
cutoff(int) - ranking cutoff thresholddcg(str) - DCG formulationmin_rel(int) - minimum relevance scoremax_rel(int) - maximum relevance score
Supported by:
cwl_eval:
SDCG(dcg='log2',min_rel=ANY,max_rel=REQUIRED)@REQUIRED
SetAPΒΆ
The unranked Set AP (SetAP); i.e., SetP * SetR
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
pytrec_eval:
SetAP(rel=ANY,judged_only=ANY)
SetFΒΆ
The Set F measure (SetF); i.e., the harmonic mean of SetP and SetR
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)beta(float) - relative importance of R to P in the harmonic meanjudged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
pytrec_eval:
SetF(rel=ANY,beta=ANY,judged_only=ANY)
SetPΒΆ
The Set Precision (SetP); i.e., the number of relevant docs divided by the total number retrieved
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)relative(bool) - calculate the measure using the maximum possible SetP for the provided result sizejudged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
pytrec_eval:
SetP(rel=ANY,relative=ANY,judged_only=ANY)ranx:
SetP(rel=ANY,judged_only=False)
SetRΒΆ
The Set Recall (SetR); i.e., the number of relevant docs divided by the total number of relevant documents
Parameters:
rel(int) - minimum relevance score to be considered relevant (inclusive)
Supported by:
pytrec_eval:
SetR(rel=ANY)ranx:
SetR(rel=ANY)
StRecallΒΆ
Subtopic recall (the number of subtopics covered by the top k docs)
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)
Supported by:
pyndeval:
StRecall(rel=ANY)@ANY
SuccessΒΆ
1 if a document with at least rel relevance is found in the first cutoff documents, else 0.
NOTE: Some refer to this measure as Recall@k. This software follows the TREC convention, where Recall@k is defined as the proportion of known relevant documents retrieved in the top k results.
Parameters:
cutoff(int) - ranking cutoff thresholdrel(int) - minimum relevance score to be considered relevant (inclusive)judged_only(bool) - ignore returned documents that do not have relevance judgments
Supported by:
pytrec_eval:
Success(rel=ANY,judged_only=ANY)@ANYranx:
Success(rel=ANY,judged_only=False)@REQUIRED
AliasesΒΆ
These provide shortcuts to βcanonicalβ measures, and are typically used when multiple
names or casings for the same measure exist. You can use them just like any other measure
and the identifiers are equal (e.g., AP == MAP) but the names will appear in the
canonical form when printed.
BPrefβ BprefMAPβ APMAP_IAβ AP_IAMRRβ RRNDCGβ nDCGNumRelRetβ NumRet(rel=1)Precisionβ PRecallβ RRPrecβ RprecSetRelPβ SetP(relative=True)Ξ±_DCGβ alpha_DCGΞ±_nDCGβ alpha_nDCG