*** CAUTION: This page us still under construction ***

Task Overview

NTCIR-12 MathIR is a shared task for retrieving mathematical information in documents. Queries are some combination of keywords and formulae. Participating systems need to return a ranked list of retrieval units (document excerpts) containing a formula and text matching a query. Document excerpts are provided by the task organizers.


Two datasets will be used for NTCIR-12 MathIR. The first is a large collection of about 100,000 scientific articles (the arXiv dataset, provided by the KWARC project: http://kwarc.info/) and 35,000 Wikipedia articles (the Wikipedia dataset). The dataset will include about 30-60M math formulae (estimated; including monomial expressions). Formulae encodings will be generated using the LaTeX to XHTML format conversion tool LaTeXML. and will include LaTeX, Presentation MathML and Content MathML encodings (see the Introduction for an example).

Each document in the corpus will be divided into excerpts (‘retrieval units’), containing roughly a paragraph each.  The relevance of each excerpt should be scored by the participating systems, e.g. between 0 (irrelevant) and 1 (highly relevant). Participating systems must return a ranked list of the retrieval units matching the query.

Queries and Topics

Given a set of queries, systems must return a ranked list of search results for each of the arXiv and Wikipedia datasets. The total number of queries will be around 50. We plan to use the same topics for both the arXiv and Wikipedia datasets.

A topic contains:

  • Topic ID
  • Query (formulae + key words)
  • Narrative (precise description of the user situation, information need, and relevance criteria; used only for assessment and will not be included in the query set delivered to participants)

The tasks will be designed so that all topics include at least a single relevant document in arxiv.org or wikipedia.

Search Results (Runs)

Results should include automatic runs using the queries defined for the task. The retrieval unit identifier and relevance score for each retrieval units should be included. Participants are strongly encouraged to include formulaIDs (formula identifiers) in ‘justification’ fields included for each matched excerpt in their results, as these will be used during evaluation of hits by assessors.

Participants are also encouraged to submit ‘manual’ runs using additional queries manually generated by the system designers, for the purpose of illustrating the behavior of their search engine.

The query and submission format will be announced in the near future.

Scoring Systems through Human Hit Relevance Assessments

For each task query, fixed number of retrieval units will be selected from the union of the top 1,000 retrieval units from each run. The selected retrieval units will then be assessed by human reviewers. We plan to apply two reviewers (double relevance assessment) for every retrieval unit selected.

After the assessment, the list of retrieval units and their relevance judgments will be returned to participants. This evaluation is pooling based; for practical reasons, not all submitted retrieval units will be evaluated.

Baseline Systems

Two baseline systems will be provided by organizers: (1) a text-based IR system using Terrier and a MathML converter, and (2) a strong math search engine from NTCIR-11.