Datasets

(Updated on Dec 7, 2012)

Math Retrieval Subtask

NTICR-sandbox.tar.gz (7.2G) 100,000 xhtml documents transformed to XHTML+MathML with the LATEXML converter
NTCIR-Math-formula-search.xml (98K) Topics for formula search
NTCIR-Math-fulltext-search.xml (66K) Topics for fulltext search
NTCIR-Math-open-mir.xml (52K) Topics for open MIR
topics.pdf (271K) Documentation for search topics

Math Understanding Subtask

Description_Extraction_Ver01.zip (3.8M) arxiv_msc_10.zip
  • 10 papers from ArXiv.org dataset (the papers are the same as the initial dataset but the annotations are updated)

arxiv_msc_25.zip

  • Additional 25 papers from ArXiv.org dataset

Documentation_Agreement_2.0.pdf

  • Annotator agreement report

Documentation_Evaluation.pdf

  • Formats of the dataset and submissions

eval1.py

  • Evaluation script
Description_Extraction_test.zip (216K) 10 unannotated papers for evaluation