ir-measures Documentation¶
ir-measures is a Python package that interfaces with several information retrieval (IR) evaluation tools, including pytrec_eval, gdeval, trectools, and others.
This package aims to simplify IR evaluation by providing an easy and flexible evaluation interface and by standardizing measure names (and their parameters).
Quick Start¶
You can install ir-measures
using pip:
$ pip install ir-measures
Now that it’s installed, you can use it to compute evaluation measures! See the tabs below for examples using the Command Line Interface, the Python Interface, and the PyTerrier interface.
$ ir_measures path/to/qrels path/to/run nDCG@10 P@5 'P(rel=2)@5' Judged@10
nDCG@10 0.6251
P@5 0.7486
P(rel=2)@5 0.6000
Judged@10 0.9486
You can alternatively use a dataset ID from ir_datasets.
$ ir_measures dataset_id path/to/run nDCG@10 P@5 'P(rel=2)@5' Judged@10
nDCG@10 0.6251
P@5 0.7486
P(rel=2)@5 0.6000
Judged@10 0.9486
>>> import ir_measures
>>> from ir_measures import nDCG, P, Judged
>>> qrels = ir_measures.read_trec_qrels('path/to/qrels')
>>> run = ir_measures.read_trec_run('path/to/run')
>>> ir_measures.calc_aggregate([nDCG@10, P@5, P(rel=2)@5, Judged@10], qrels, run)
{
nDCG@10: 0.6251,
P@5: 0.7486,
P(rel=2)@5: 0.6000,
Judged@10: 0.9486
}
You can also use qrels
from ir_datasets instead of loading them from a file.
>>> import ir_datasets
>>> qrels = ir_datasets.load('dataset_id').qrels
>>> ...
ir_measures is used by the PyTerrier platform to evaluate ranking pipelines. In the following example, BM25 is evaluated using the standard measures for the TREC Deep Learning benchmark, provided by ir_measures:
>>> import pyterrier as pt
>>> from ir_measures import RR, nDCG, AP
>>> dataset = pt.get_dataset("irds:msmarco-passage/trec-dl-2019/judged")
>>> bm25 = pt.terrier.Retriever.from_dataset('msmarco_passage', 'terrier_stemmed', wmodel="BM25")
>>> pt.Experiment(
>>> [bm25],
>>> dataset.get_topics(),
>>> dataset.get_qrels(),
>>> eval_metrics=[RR(rel=2), nDCG@10, nDCG@100, AP(rel=2)],
>>> )
name RR(rel=2) nDCG@10 nDCG@100 AP(rel=2)
0 TerrierRetr(BM25) 0.641565 0.47954 0.487416 0.286448
Table of Contents¶
Acknowledgements¶
This extension was written by Sean MacAvaney and Craig Macdonald at the University of Glasgow, with contributions from Charlie Clarke, Benjamin Piwowarski, and Harry Scells. For a full list of contributors, see here.
If you use this package, be sure to cite:
Citation
MacAvaney et al. Streamlining Evaluation with ir-measures. ECIR (2) 2022. [link]
@inproceedings{DBLP:conf/ecir/MacAvaneyMO22a, author = {Sean MacAvaney and Craig Macdonald and Iadh Ounis}, editor = {Matthias Hagen and Suzan Verberne and Craig Macdonald and Christin Seifert and Krisztian Balog and Kjetil N{\o}rv{\aa}g and Vinay Setty}, title = {Streamlining Evaluation with ir-measures}, booktitle = {Advances in Information Retrieval - 44th European Conference on {IR} Research, {ECIR} 2022, Stavanger, Norway, April 10-14, 2022, Proceedings, Part {II}}, series = {Lecture Notes in Computer Science}, volume = {13186}, pages = {305--310}, publisher = {Springer}, year = {2022}, url = {https://doi.org/10.1007/978-3-030-99739-7\_38}, doi = {10.1007/978-3-030-99739-7\_38}, timestamp = {Thu, 07 Apr 2022 18:19:50 +0200}, biburl = {https://dblp.org/rec/conf/ecir/MacAvaneyMO22a.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }