Datasets – ntcir10-math

(Updated on Dec 7, 2012)

Math Retrieval Subtask

NTICR-sandbox.tar.gz (7.2G)	100,000 xhtml documents transformed to XHTML+MathML with the LATEXML converter
NTCIR-Math-formula-search.xml (98K)	Topics for formula search
NTCIR-Math-fulltext-search.xml (66K)	Topics for fulltext search
NTCIR-Math-open-mir.xml (52K)	Topics for open MIR
topics.pdf (271K)	Documentation for search topics

Math Understanding Subtask

Description_Extraction_Ver01.zip (3.8M)

arxiv_msc_10.zip

10 papers from ArXiv.org dataset (the papers are the same as the initial dataset but the annotations are updated)

arxiv_msc_25.zip

Documentation_Agreement_2.0.pdf

Documentation_Evaluation.pdf

eval1.py

Description_Extraction_test.zip (216K)

10 unannotated papers for evaluation