Skip to content

MIR-MU/MIaS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIaS – Math-aware full-text search engine

ci

MIaS (Math Indexer and Searcher) is a math-aware full-text search engine. It is based on Apache Lucene; however, its maths processing capabilities are standalone and can be easily integrated into any Lucene/Solr based system, as in EuDML search service.

Usage

Install Docker, place your dataset to a directory named dataset/, and index your dataset using the miratmu/mias Docker image:

docker run -v "$PWD"/dataset:/dataset:ro -v "$PWD"/index:/index:rw \
    --rm miratmu/mias

When running MIaS, you may control the configuration of the Java Virtual Machine by changing the JAVA_OPTS environmental veriable: -e JAVA_OPTS='-Xms128g -Xmx192g'.

Usage without Docker

Setting up mias.properties

Create a file named mias.properties in some location, e.g. /home/MIaS/conf/mias.properties, and set up the following properties:

  • INDEXDIR – Path to the directory, where the index is / will be located.
  • UPDATE – If TRUE, the files that are already indexed and are about to be indexed again, will be updated. If FALSE, the indexer will skip them and only add new files.
  • MAXRESULTS – The maximum number of results that the system retrieves.
  • DOCLIMIT - The limit for the number of the documents that are indexed during one run. -1 means no limit.
  • THREADS - The number of threads that will be used for processing.

The resulting file might have the following content:

INDEXDIR=/home/data/index
UPDATE=false
MAXRESULTS=10000
DOCLIMIT=-1
THREADS=8

In Windows, backslashes in paths need to be escaped, e.g. you would insert C:\\MIaS\\index instead of C:\MIaS\index.

Running MIaS

To run MIaS, locate the JAR file of MIaS and run the following command:

java -jar MIaS.jar [OPTIONS]

To see the available options, run the following command:

java -jar MIaS.jar -help

There must exist a directory named lib containing necessary dependencies located within the same directory as the jar file.

Citing MIaS

Text

SOJKA, Petr and Martin LÍŠKA. The Art of Mathematics Retrieval. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA: ACM, 2011. p. 57–60. ISBN 978-1-4503-0863-2. doi:10.1145/2034691.2034703.

BibTeX

@inproceedings{doi:10.1145:2034691.2034703,
     author = "Petr Sojka and Martin L\'{i}\v{s}ka",
      title = "{The Art of Mathematics Retrieval}",
  booktitle = "{Proceedings of the ACM Conference on Document Engineering,
                DocEng 2011}",
  publisher = "{Association of Computing Machinery}",
    address = "{Mountain View, CA}",
       year = 2011,
      month = Sep,
       isbn = "978-1-4503-0863-2",
      pages = "57--60",
        url = {http://doi.acm.org/10.1145/2034691.2034703},
        doi = {10.1145/2034691.2034703},
   abstract = {The design and architecture of MIaS (Math Indexer and Searcher), 
               a system for mathematics retrieval is presented, and design
               decisions are discussed. We argue for an approach based on
               Presentation MathML using a similarity of math subformulae. The
               system was implemented as a math-aware search engine based on
               the state-of-the-art system Apache Lucene. Scalability issues
               were checked against more than 400,000 arXiv documents with 158
               million mathematical formulae. Almost three billion MathML
               subformulae were indexed using a Solr-compatible Lucene.},
}

About

Indexes text with math in Lucene/Solr-based full-text search engines

Resources

License

Stars

Watchers

Forks

Packages

No packages published