Skip to content

Latest commit

 

History

History
323 lines (260 loc) · 15.1 KB

README.md

File metadata and controls

323 lines (260 loc) · 15.1 KB

Made With JavaMade With LaTeX Tests Maintainability Test Coverage

LaCASt - A LaTeX Translator for Computer Algebra Systems

LaCASt is the first context-aware translator for mathematical LaTeX expressions. LaCASt includes natural language processing to analyze textual contexts, custom semantic LaTeX parser to analyze math inputs, and CAS interfaces (currently Maple and Mathematica) to compute and verify translated expressions automatically.

Publications

If you want to reference to this tool in general, please use the most recent publication from TPAMI 2023. If you want to refer to automatic evaluations only, use the 2nd latest publication in TACAS 2022.

A. Greiner-Petter, M. Schubotz, C. Breitinger, P. Scharpf, A. Aizawa, B. Gipp (2023) "Do the Math: Making Mathematics in Wikipedia Computable". In TPAMI 2023: 4384-4395
@Article{GreinerPetter23,
  author       = {Andr{\'{e}} Greiner{-}Petter and
                  Moritz Schubotz and
                  Corinna Breitinger and
                  Philipp Scharpf and
                  Akiko Aizawa and
                  Bela Gipp},
  title        = {Do the Math: Making Mathematics in Wikipedia Computable},
  journal      = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume       = {45},
  number       = {4},
  pages        = {4384--4395},
  year         = {2023},
  url          = {https://doi.org/10.1109/TPAMI.2022.3195261},
  doi          = {10.1109/TPAMI.2022.3195261},
  timestamp    = {Mon, 28 Aug 2023 21:37:38 +0200},
  biburl       = {https://dblp.org/rec/journals/pami/GreinerPetterSBSAG23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
A. Greiner-Petter, H. S. Cohl, A. Youssef, M. Schubotz, A. Trost, R. Dey, A. Aizawa, B. Gipp (2020) "Comparative Verification of the Digital Library of Mathematical Functions and Computer Algebra Systems". In TACAS 2022: 87-105
@InProceedings{Greiner-PetterC22,
  author    = {Andr{\'{e}} Greiner{-}Petter and
               Howard S. Cohl and
               Abdou Youssef and
               Moritz Schubotz and
               Avi Trost and
               Rajen Dey and
               Akiko Aizawa and
               Bela Gipp},
  title     = {Comparative Verification of the Digital Library of Mathematical Functions
               and Computer Algebra Systems},
  booktitle = {Tools and Algorithms for the Construction and Analysis of Systems
               - 28th International Conference, {TACAS} 2022, Held as Part of the
               European Joint Conferences on Theory and Practice of Software, {ETAPS}
               2022, Munich, Germany, April 2-7, 2022, Proceedings, Part {I}},
  series    = {Lecture Notes in Computer Science},
  volume    = {13243},
  pages     = {87--105},
  publisher = {Springer},
  year      = {2022},
  url       = {https://doi.org/10.1007/978-3-030-99524-9\_5},
  doi       = {10.1007/978-3-030-99524-9\_5}
}
A. Greiner-Petter, M. Schubotz, H. S. Cohl, B. Gipp (2019) "Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems". In: Aslib Journal of Information Management. 71(3): 415-439
@Article{Greiner-Petter19,
  author    = {Andr{\'{e}} Greiner{-}Petter and
               Moritz Schubotz and
               Howard S. Cohl and
               Bela Gipp},
  title     = {Semantic preserving bijective mappings for expressions involving special
               functions between computer algebra systems and document preparation
               systems},
  journal   = {Aslib Journal of Information Management},
  volume    = {71},
  number    = {3},
  pages     = {415--439},
  year      = {2019},
  url       = {https://doi.org/10.1108/AJIM-08-2018-0185},
  doi       = {10.1108/AJIM-08-2018-0185}
}
H. S. Cohl, A. Greiner-Petter, M. Schubotz (2018) "Automated Symbolic and Numerical Testing of DLMF Formulae Using Computer Algebra Systems". In: CICM: 39-52
@InProceedings{Cohl18,
  author    = {Howard S. Cohl and
               Andr{\'{e}} Greiner{-}Petter and
               Moritz Schubotz},
  title     = {Automated Symbolic and Numerical Testing of {DLMF} Formulae Using
               Computer Algebra Systems},
  booktitle = {Intelligent Computer Mathematics - 11th International Conference,
               {CICM} 2018, Hagenberg, Austria, August 13-17, 2018, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {11006},
  pages     = {39--52},
  publisher = {Springer},
  year      = {2018},
  url       = {https://doi.org/10.1007/978-3-319-96812-4\_4},
  doi       = {10.1007/978-3-319-96812-4\_4}
}
H. S. Cohl, M. Schubotz, A. Youssef, A. Greiner-Petter, J. Gerhard, B. V. Saunders, M. A. McClain, J. Bang, K. Chen (2017) "Semantic Preserving Bijective Mappings of Mathematical Formulae Between Document Preparation Systems and Computer Algebra Systems". In: CICM: 115-131
@InProceedings{Cohl17,
  author    = {Howard S. Cohl and
               Moritz Schubotz and
               Abdou Youssef and
               Andr{\'{e}} Greiner{-}Petter and
               J{\"{u}}rgen Gerhard and
               Bonita V. Saunders and
               Marjorie A. McClain and
               Joon Bang and
               Kevin Chen},
  title     = {Semantic Preserving Bijective Mappings of Mathematical Formulae Between
               Document Preparation Systems and Computer Algebra Systems},
  booktitle = {Intelligent Computer Mathematics - 10th International Conference,
               {CICM} 2017, Edinburgh, UK, July 17-21, 2017, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {10383},
  pages     = {115--131},
  publisher = {Springer},
  year      = {2017},
  url       = {https://doi.org/10.1007/978-3-319-62075-6\_9},
  doi       = {10.1007/978-3-319-62075-6\_9}
}

How to use our program

The following provides a high level introduction on how to use the JARs and LaCASt in general. If you want to dive into the source code, we advice you to check our contribution guidelines first for more details on the structure.

The bin directory contains a couple of executable jars. Any of these programs require the lacast.config.yaml. Copy the config/template-lacast.config.yaml to the main directory and rename it to lacast.config.yaml. Afterward, update the entries in the template file to the properties that are applicable for you. LaCASt tries to load the config by following these rules:

  1. The system variable LACAST_CONFIG specifies the config location, e.g., export LACAST_CONFIG="path/to/lacast.config.yaml".
  2. The config file is in the current working directory.
  3. Loads the default config from the internal resources in the jar, see default config in interpreter.common/src/main/resources/

If none of the rules above point to a valid config, LaCASt stops with an error.


LaCASt contains several executable JARs as standalone applications. The following list explains the functionality of each JAR in more detail.

latex-to-cas-converter.jar: The forward translator (LaTeX -> CAS)

The executable jar for the translator can be found in the bin subdirectory. A standalone version can be found in the bin/*.zip file. Unzip the archive where you want and run the jar from the root folder of the respository

java -jar bin/latex-to-cas-converter.jar

Without additional information, the jar runs as an interactive program. You can start the program to directly trigger the translation process or set further flags (every flag is optional):

  • -CAS=<NameOfCAS>: Sets the computer algebra system you want to translate to, e.g., -CAS=Maple for Maple;
  • -Expression="<exp>": Sets the expression you want to translate. Double qutation marks are mandatory;
  • --clean or -c: Only returns the translated expression without any other information. (since v1.0.1)
  • --debug or -d: Returns extra information for debugging, such as computation time and list of elements. (--clean overrides this setting).
  • --extra or -x: Shows further information about translation of functions, e.g., branch cuts, DLMF-links and more. (--clean flag overrides this setting)

lexicon-creator.jar: Maintain the translation dictionary

Is used to maintain the internal translation dictionaries. Once the translation pattern is defined in the CSV files it must be trasformed to the dictionaries. The typical workflow is:

andre@agp:~$ java -jar bin/lexicon-creator.jar 
Welcome, this converter translates given CSV files to lexicon files.
You didn't specified CSV files (do not add DLMFMacro.csv).
Add a new CSV file and hit enter or enter '-end' to stop the adding process.
all
Current list: [CAS_Maple.csv, CAS_Mathematica.csv]
-end

maple-translator.jar: The backward translator for Maple (Maple -> Semantic LaTeX)

This jar requires an installed Maple license on the machine! To start the translator, you have to set the environment variables to properly run Maple (see here Building and Running a Java OpenMaple Application) In my case, Maple is installed in /opt/maple2019 and I'm on a Linux machine which requires to set MAPLE and LD_LIBRARY_PATH. In addition, you have to provide more heap size via -Xss50M, otherwise Maple crashes. Here is an example:

andre@agp:~$ export MAPLE="/opt/maple2019"
andre@agp:~$ export LD_LIBRARY_PATH="/opt/maple2019/bin.X86_64_LINUX"
andre@agp:~$ java -Xss50M -jar bin/maple-translator.jar 

To get the Maple paths, you can start maple and enter the following commands:

kernelopts( bindir );   <- returns <Maple-BinDir>
kernelopts( mapledir ); <- returns <Maple-Directory>

symbolic-tester.jar: Symbolic verification program

This is only for advanced users! First, setup the properties:

  1. config/symbolic_tests.properties Critical and required settings are:
# the path to the dataset
dlmf_dataset=/home/andreg-p/Howard/together.txt

# the lines that should be tested in the provided dataset
subset_tests=7209,7483

# the output path
output=/home/andreg-p/Howard/Results/AutoMaple/22-JA-symbolic.txt

# the output path for missing macros
missing_macro_output=/home/andreg-p/Howard/Results/AutoMaple/22-JA-missing.txt
  1. symbolic-tester.jar program arguments:
    • -maple to run the tests with Maple
    • -mathematica to run the tests with Mathematica (you can only specify one at a time, maple or mathematica)
    • -Xmx8g increase the java memory, that's not required but useful
    • -Xss50M increase the heap size if you use Maple

Additionally, you have to set environment variables if you work with Maple (see the maple-translator.jar instructions above for more details about required variables).

  1. Since you may want to run automatically evaluations on subsets, you can use the scripts/symbolic-evaluator.sh. Of course you need to update the paths in the script. With config/together-lines.txt you can control what subsets the script shall evaluate, e.g.,
04-EF: 1465,1994
05-GA: 1994,2179

The second argument is excluded (i.e., 1,2 runs only one line, 1 but not 2). To test the lines 1465-1994 and 1994-2179 and store the results in 04-EF-symbolic.txgt and 05-GA-symbolic.txt file.


numeric-tester.jar: Numeric verification program

This is only for advanced users! First, setup the properties:

  1. config/numerical_tests.properties Critical and required settings are:
# the path to the dataset
dlmf_dataset=/home/andreg-p/Howard/together.txt

# either you define a subset of lines to test or you define the results file of symbolic evaluation, which is recommended
# subset_tests=7209,7483
symbolic_results_data=/home/andreg-p/Howard/Results/AutoMath/11-ST-symbolic.txt

# the output path
output=/home/andreg-p/Howard/Results/MathNumeric/11-ST-numeric.txt
  1. numeric-tester.jar program arguments:

    • -maple to run the tests with Maple
    • -mathematica to run the tests with Mathematica
    • -Xmx8g increase the java memory, that's not required but useful
    • -Xss50M increase the heap size if you use Maple
  2. Since you may want to run automatically evaluations on subsets, you can use the scripts/numeric-evaluator.sh. Of course you need to update the paths in the script. With config/together-lines.txt you can control what subsets the script shall evaluate, e.g.,

04-EF: 1465,1994
05-GA: 1994,2179

This will automatically load the symbolic result files 04-EF-symbolic.txg and 05-GA-symbolic.txt and start the evaluation.


Update Translation Patterns

The translation patterns are defined in libs/ReferenceData/CSVTables. If you wish to add translation patterns you need to compile the changes before the translator can use them. To update the translations, use the lexicon-creator.jar (see the explanations above).

Update Pre-Processing Replacement Rules

The pre-processing replacement rules are defined in config/replacements.yml and config/dlmf-replacements.yml. Each config contains further explanations how to add replacement rules. The replacement rules are applied without further compilation. Just change the files to add, modify, or remove rules.

Contributors

Role Name Contact
Main Developer André Greiner-Petter greinerpetter (at) wuppertal.de
Supervisor Dr. Howard Cohl howard.cohl (at) nist.gov
Advisor Dr. Moritz Schubotz schubotz (at) uni-wuppertal.de
Advisor Prof. Abdou Youssef abdou.youssef (at) nist.gov
Student Developers Avi Trost, Rajen Dey, Claude, Jagan