Data
Initial document set
- 9,982 scientific papers divided automatically into total 501,156 search units
- Converted into HTML5 and XHTML5 formats by the KWARC project (http://kwarc.info/).
- Each search unit is stored as independent HTML5 and XHTML5 files.
Full document set
- 105,120 scientific papers divided automatically into total 8,301,578 search units, total math formulae: about 60 M
- Converted into HTML5 and XHTML5 formats by the KWARC project (http://kwarc.info/).
- Each search unit is stored as independent HTML5 and XHTML5 files. (One of HTML5 and XHTML5 is sufficient for the task. Please select one according to your preference.)
- From the following arXiv categories: math, cs, physics:math-ph, stat, physics:hep-th, physics:nlin
- WARNING: Requires about 173G for each of HTML5 and XHTML5 directories when being uncompressed.
Topics and submissions formats
- Each topic includes (i) a list of keywords and (ii) a list of formulae. Both information should be considered in the run.
- Topics are distributed in XML form. Submissions can be either tsv or XML forms. Please use XML form if the submission contains “justifications”, i.e., formulae that support the returned document.
- Detailed description can be found in Formats for topics and submissions for NTCIR-11 Math-2 Task (Updated 2014/06/02)
- Sample topics can be found in “NTCIR11-Math2 Topic examples”. (zip compressed XML file)
- Submission validation script is now available : Download