Measures ========================= :class:`~ir_measures.measures.Measure` objects speficy the measure to calculate, along with any parameters they have. (They do not define the implementation --- that's the job of a :class:`~ir_measures.providers.Provider`.) This page provides a list of the Measures that are available in this package. .. _measures.Accuracy: ``Accuracy`` ------------------------- Reports the probability that a relevant document is ranked before a non relevant one. This metric purpose is to be used for diagnosis (checking that train/test/validation accuracy match). As such, it only considers relevant documents which are within the returned ones. **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) **Supported by:** - :ref:`accuracy `: ``Accuracy(rel=ANY)@ANY`` .. _measures.alpha_DCG: ``alpha_DCG`` ------------------------- A version of DCG that accounts for multiple possible query intents. .. cite.dblp:: conf/sigir/ClarkeKCVABM08 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``alpha`` (float) - Redundancy intolerance - ``judged_only`` (bool) - calculate measure using only judged documents (i.e., discard unjudged documents) **Supported by:** - :ref:`pyndeval `: ``alpha_DCG(alpha=ANY,rel=ANY,judged_only=ANY)@ANY`` .. _measures.alpha_nDCG: ``alpha_nDCG`` ------------------------- A version of nDCG that accounts for multiple possible query intents. .. cite.dblp:: conf/sigir/ClarkeKCVABM08 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``alpha`` (float) - Redundancy intolerance - ``judged_only`` (bool) - calculate measure using only judged documents (i.e., discard unjudged documents) **Supported by:** - :ref:`pyndeval `: ``alpha_nDCG(alpha=ANY,rel=ANY,judged_only=ANY)@ANY`` .. _measures.AP: ``AP`` ------------------------- The [Mean] Average Precision ([M]AP). The average precision of a single query is the mean of the precision scores at each relevant item returned in a search results list. AP is typically used for adhoc ranking tasks where getting as many relevant items as possible is. It is commonly referred to as MAP, by taking the mean of AP over the query set. .. cite.dblp:: journals/ipm/Harman92 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`cwl_eval `: ``AP(rel=ANY,judged_only=False)@NOT_PROVIDED`` - :ref:`pytrec_eval `: ``AP(rel=ANY,judged_only=ANY)@ANY`` - :ref:`trectools `: ``AP(rel=1,judged_only=False)@ANY`` - :ref:`ranx `: ``AP(rel=ANY,judged_only=False)@ANY`` .. _measures.AP_IA: ``AP_IA`` ------------------------- Intent-aware (Mean) Average Precision **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - calculate measure using only judged documents (i.e., discard unjudged documents) **Supported by:** - :ref:`pyndeval `: ``AP_IA(rel=ANY,judged_only=ANY)`` .. _measures.BPM: ``BPM`` ------------------------- The Bejeweled Player Model (BPM). .. cite.dblp:: conf/sigir/ZhangLLZXM17 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``T`` (float) - total desired gain (normalized) - ``min_rel`` (int) - minimum relevance score - ``max_rel`` (int) - maximum relevance score **Supported by:** - :ref:`cwl_eval `: ``BPM(T=ANY,min_rel=ANY,max_rel=REQUIRED)@ANY`` .. _measures.Bpref: ``Bpref`` ------------------------- Binary Preference (Bpref). This measure examines the relative ranks of judged relevant and non-relevant documents. Non-judged documents are not considered. .. cite.dblp:: conf/sigir/BuckleyV04 **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) **Supported by:** - :ref:`pytrec_eval `: ``Bpref(rel=ANY)`` - :ref:`trectools `: ``Bpref(rel=1)`` .. _measures.Compat: ``Compat`` ------------------------- Compatibility measure desribed in: .. cite.dblp:: journals/tois/ClarkeVS21 **Parameters:** - ``p`` (float) - persistence - ``normalize`` (bool) - apply normalization for finite ideal rankings **Supported by:** - :ref:`compat `: ``Compat(p=ANY,normalize=ANY)`` .. _measures.ERR: ``ERR`` ------------------------- The Expected Reciprocal Rank (ERR) is a precision-focused measure. In essence, an extension of reciprocal rank that encapsulates both graded relevance and a more realistic cascade-based user model of how users brwose a ranking. **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold **Supported by:** - :ref:`gdeval `: ``ERR@REQUIRED`` .. _measures.ERR_IA: ``ERR_IA`` ------------------------- Intent-Aware Expected Reciprocal Rank with collection-independent normalisation. .. cite.dblp:: conf/cikm/ChapelleMZG09 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - calculate measure using only judged documents (i.e., discard unjudged documents) **Supported by:** - :ref:`pyndeval `: ``ERR_IA(rel=ANY,judged_only=ANY)@ANY`` .. _measures.infAP: ``infAP`` ------------------------- Inferred AP. AP implementation that accounts for pooled-but-unjudged documents by assuming that they are relevant at the same proportion as other judged documents. Essentially, skips documents that were pooled-but-not-judged, and assumes unjudged are non-relevant. Pooled-but-unjudged indicated by a score of -1, by convention. Note that not all qrels use this convention. **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) **Supported by:** - :ref:`pytrec_eval `: ``infAP(rel=ANY)`` .. _measures.INSQ: ``INSQ`` ------------------------- INSQ .. cite.dblp:: conf/adcs/MoffatST12 **Parameters:** - ``T`` (float) - total desired gain (normalized) - ``min_rel`` (int) - minimum relevance score - ``max_rel`` (int) - maximum relevance score **Supported by:** - :ref:`cwl_eval `: ``INSQ(T=ANY,min_rel=ANY,max_rel=REQUIRED)`` .. _measures.INST: ``INST`` ------------------------- INST, a variant of INSQ .. cite.dblp:: conf/sigir/BaileyMST15 **Parameters:** - ``T`` (float) - total desired gain (normalized) - ``min_rel`` (int) - minimum relevance score - ``max_rel`` (int) - maximum relevance score **Supported by:** - :ref:`cwl_eval `: ``INST(T=ANY,min_rel=ANY,max_rel=REQUIRED)`` .. _measures.IPrec: ``IPrec`` ------------------------- Interpolated Precision at a given recall cutoff. Used for building precision-recall graphs. Unlike most measures, where @ indicates an absolute cutoff threshold, here @ sets the recall cutoff. **Parameters:** - ``recall`` (float) - recall threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`pytrec_eval `: ``IPrec(judged_only=ANY)@ANY`` .. _measures.Judged: ``Judged`` ------------------------- Percentage of results in the top k (cutoff) results that have relevance judgments. Equivalent to P@k with a rel lower than any judgment. **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold **Supported by:** - :ref:`judged `: ``Judged@ANY`` .. _measures.nDCG: ``nDCG`` ------------------------- The normalized Discounted Cumulative Gain (nDCG). Uses graded labels - systems that put the highest graded documents at the top of the ranking. It is normalized wrt. the Ideal NDCG, i.e. documents ranked in descending order of graded label. .. cite.dblp:: journals/tois/JarvelinK02 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``dcg`` (str) - DCG formulation - ``gains`` (dict) - custom gain mapping (int-to-int) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`pytrec_eval `: ``nDCG(dcg='log2',gains=ANY,judged_only=ANY)@ANY`` - :ref:`gdeval `: ``nDCG(dcg='exp-log2',gains=NOT_PROVIDED,judged_only=False)@REQUIRED`` - :ref:`trectools `: ``nDCG(dcg=ANY,gains=NOT_PROVIDED,judged_only=False)@ANY`` - :ref:`ranx `: ``nDCG(dcg=('log2', 'exp-log2'),gains=NOT_PROVIDED,judged_only=False)@ANY`` .. _measures.NERR10: ``NERR10`` ------------------------- Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure, version from Equation (10) of the the following paper. .. cite.dblp:: conf/ictir/AzzopardiMM21 **Parameters:** - ``p`` (float) - persistence - ``min_rel`` (int) - minimum relevance score - ``max_rel`` (int) - maximum relevance score **Supported by:** - :ref:`cwl_eval `: ``NERR10(p=ANY,min_rel=ANY,max_rel=REQUIRED)`` .. _measures.NERR11: ``NERR11`` ------------------------- Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure, version from Equation (12) of the the following paper. .. cite.dblp:: conf/ictir/AzzopardiMM21 **Parameters:** - ``T`` (float) - total desired gain (normalized) - ``min_rel`` (int) - minimum relevance score - ``max_rel`` (int) - maximum relevance score **Supported by:** - :ref:`cwl_eval `: ``NERR11(T=ANY,min_rel=ANY,max_rel=REQUIRED)`` .. _measures.NERR8: ``NERR8`` ------------------------- Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure, version from Equation (8) of the the following paper. .. cite.dblp:: conf/ictir/AzzopardiMM21 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``min_rel`` (int) - minimum relevance score - ``max_rel`` (int) - maximum relevance score **Supported by:** - :ref:`cwl_eval `: ``NERR8(min_rel=ANY,max_rel=REQUIRED)@REQUIRED`` .. _measures.NERR9: ``NERR9`` ------------------------- Version of the Not (but Nearly) Expected Reciprocal Rank (NERR) measure, version from Equation (9) of the the following paper. .. cite.dblp:: conf/ictir/AzzopardiMM21 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``min_rel`` (int) - minimum relevance score - ``max_rel`` (int) - maximum relevance score **Supported by:** - :ref:`cwl_eval `: ``NERR9(min_rel=ANY,max_rel=REQUIRED)@REQUIRED`` .. _measures.nERR_IA: ``nERR_IA`` ------------------------- Intent-Aware Expected Reciprocal Rank with collection-dependent normalisation. .. cite.dblp:: conf/cikm/ChapelleMZG09 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - calculate measure using only judged documents (i.e., discard unjudged documents) **Supported by:** - :ref:`pyndeval `: ``nERR_IA(rel=ANY,judged_only=ANY)@ANY`` .. _measures.nNRBP: ``nNRBP`` ------------------------- Novelty- and Rank-Biased Precision with collection-dependent normalisation. .. cite.dblp:: conf/ictir/ClarkeKV09 **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``alpha`` (float) - Redundancy intolerance - ``beta`` (float) - Patience **Supported by:** - :ref:`pyndeval `: ``nNRBP(alpha=ANY,beta=ANY,rel=ANY)`` .. _measures.NRBP: ``NRBP`` ------------------------- Novelty- and Rank-Biased Precision with collection-independent normalisation. .. cite.dblp:: conf/ictir/ClarkeKV09 **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``alpha`` (float) - Redundancy intolerance - ``beta`` (float) - Patience **Supported by:** - :ref:`pyndeval `: ``NRBP(alpha=ANY,beta=ANY,rel=ANY)`` .. _measures.NumQ: ``NumQ`` ------------------------- The total number of queries. **Supported by:** - :ref:`pytrec_eval `: ``NumQ`` .. _measures.NumRel: ``NumRel`` ------------------------- The number of relevant documents the query has (independent of what the system retrieved). **Parameters:** - ``rel`` (int) - minimum relevance score to be counted (inclusive) **Supported by:** - :ref:`pytrec_eval `: ``NumRel(rel=1)`` .. _measures.NumRet: ``NumRet`` ------------------------- The number of results returned. When rel is provided, counts the number of documents returned with at least that relevance score (inclusive). **Parameters:** - ``rel`` (int) - minimum relevance score to be counted (inclusive), or all documents returned if NOT_PROVIDED **Supported by:** - :ref:`pytrec_eval `: ``NumRet(rel=ANY)`` - :ref:`ranx `: ``NumRet(rel=REQUIRED)`` .. _measures.P: ``P`` ------------------------- Basic measure for that computes the percentage of documents in the top cutoff results that are labeled as relevant. cutoff is a required parameter, and can be provided as P@cutoff. .. cite.dblp:: books/bu/Rijsbergen79 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`cwl_eval `: ``P(rel=ANY,judged_only=False)@ANY`` - :ref:`pytrec_eval `: ``P(rel=ANY,judged_only=ANY)@ANY`` - :ref:`trectools `: ``P(rel=1,judged_only=False)@ANY`` - :ref:`ranx `: ``P(rel=ANY,judged_only=False)@ANY`` .. _measures.P_IA: ``P_IA`` ------------------------- Intent-aware Precision@k. **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - calculate measure using only judged documents (i.e., discard unjudged documents) **Supported by:** - :ref:`pyndeval `: ``P_IA(rel=ANY,judged_only=ANY)@ANY`` .. _measures.R: ``R`` ------------------------- Recall@k (R@k). The fraction of relevant documents for a query that have been retrieved by rank k. NOTE: Some tasks define Recall@k as whether any relevant documents are found in the top k results. This software follows the TREC convention and refers to that measure as Success@k. **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`pytrec_eval `: ``R(judged_only=ANY)@ANY`` - :ref:`ranx `: ``R(judged_only=False)@ANY`` .. _measures.RBP: ``RBP`` ------------------------- The Rank-Biased Precision (RBP). .. cite.dblp:: journals/tois/MoffatZ08 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``p`` (float) - persistence - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive), or NOT_PROVIDED to use graded relevance **Supported by:** - :ref:`cwl_eval `: ``RBP(rel=REQUIRED,p=ANY)@NOT_PROVIDED`` - :ref:`trectools `: ``RBP(p=ANY,rel=ANY)@ANY`` .. _measures.Rprec: ``Rprec`` ------------------------- The precision at R, where R is the number of relevant documents for a given query. Has the cute property that it is also the recall at R. .. cite:: retrieval-system-evaluation :citation: Buckley and Voorhees. Retrieval System Evaluation. 2005. :link: https://www.nist.gov/publications/retrieval-system-evaluation **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`pytrec_eval `: ``Rprec(rel=ANY,judged_only=ANY)`` - :ref:`trectools `: ``Rprec(rel=1,judged_only=False)`` - :ref:`ranx `: ``Rprec(rel=ANY,judged_only=False)`` .. _measures.RR: ``RR`` ------------------------- The [Mean] Reciprocal Rank ([M]RR) is a precision-focused measure that scores based on the reciprocal of the rank of the highest-scoring relevance document. An optional cutoff can be provided to limit the depth explored. rel (default 1) controls which relevance level is considered relevant. .. cite.dblp:: journals/ir/KantorV00 **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`cwl_eval `: ``RR(rel=ANY,judged_only=False)@NOT_PROVIDED`` - :ref:`pytrec_eval `: ``RR(rel=ANY,judged_only=ANY)@NOT_PROVIDED`` - :ref:`trectools `: ``RR(rel=1,judged_only=False)@NOT_PROVIDED`` - :ref:`msmarco `: ``RR(rel=ANY,judged_only=False)@ANY`` - :ref:`ranx `: ``RR(rel=ANY,judged_only=False)@NOT_PROVIDED`` .. _measures.SDCG: ``SDCG`` ------------------------- The Scaled Discounted Cumulative Gain (SDCG), a variant of nDCG that assumes more fully-relevant documents exist but are not labeled. **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``dcg`` (str) - DCG formulation - ``min_rel`` (int) - minimum relevance score - ``max_rel`` (int) - maximum relevance score **Supported by:** - :ref:`cwl_eval `: ``SDCG(dcg='log2',min_rel=ANY,max_rel=REQUIRED)@REQUIRED`` .. _measures.SetAP: ``SetAP`` ------------------------- The unranked Set AP (SetAP); i.e., SetP * SetR **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`pytrec_eval `: ``SetAP(rel=ANY,judged_only=ANY)`` .. _measures.SetF: ``SetF`` ------------------------- The Set F measure (SetF); i.e., the harmonic mean of SetP and SetR **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``beta`` (float) - relative importance of R to P in the harmonic mean - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`pytrec_eval `: ``SetF(rel=ANY,beta=ANY,judged_only=ANY)`` .. _measures.SetP: ``SetP`` ------------------------- The Set Precision (SetP); i.e., the number of relevant docs divided by the total number retrieved **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``relative`` (bool) - calculate the measure using the maximum possible SetP for the provided result size - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`pytrec_eval `: ``SetP(rel=ANY,relative=ANY,judged_only=ANY)`` - :ref:`ranx `: ``SetP(rel=ANY,judged_only=False)`` .. _measures.SetR: ``SetR`` ------------------------- The Set Recall (SetR); i.e., the number of relevant docs divided by the total number of relevant documents **Parameters:** - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) **Supported by:** - :ref:`pytrec_eval `: ``SetR(rel=ANY)`` - :ref:`ranx `: ``SetR(rel=ANY)`` .. _measures.StRecall: ``StRecall`` ------------------------- Subtopic recall (the number of subtopics covered by the top k docs) **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) **Supported by:** - :ref:`pyndeval `: ``StRecall(rel=ANY)@ANY`` .. _measures.Success: ``Success`` ------------------------- 1 if a document with at least rel relevance is found in the first cutoff documents, else 0. NOTE: Some refer to this measure as Recall@k. This software follows the TREC convention, where Recall@k is defined as the proportion of known relevant documents retrieved in the top k results. **Parameters:** - ``cutoff`` (int) - ranking cutoff threshold - ``rel`` (int) - minimum relevance score to be considered relevant (inclusive) - ``judged_only`` (bool) - ignore returned documents that do not have relevance judgments **Supported by:** - :ref:`pytrec_eval `: ``Success(rel=ANY,judged_only=ANY)@ANY`` - :ref:`ranx `: ``Success(rel=ANY,judged_only=False)@REQUIRED`` Aliases ------------------------- These provide shortcuts to "canonical" measures, and are typically used when multiple names or casings for the same measure exist. You can use them just like any other measure and the identifiers are equal (e.g., ``AP == MAP``) but the names will appear in the canonical form when printed. - ``BPref`` → :ref:`Bpref ` - ``MAP`` → :ref:`AP ` - ``MAP_IA`` → :ref:`AP_IA ` - ``MRR`` → :ref:`RR ` - ``NDCG`` → :ref:`nDCG ` - ``NumRelRet`` → :ref:`NumRet(rel=1) ` - ``Precision`` → :ref:`P ` - ``Recall`` → :ref:`R ` - ``RPrec`` → :ref:`Rprec ` - ``SetRelP`` → :ref:`SetP(relative=True) ` - ``α_DCG`` → :ref:`alpha_DCG ` - ``α_nDCG`` → :ref:`alpha_nDCG `