Skip to content
Gaël de Chalendar edited this page Mar 12, 2020 · 51 revisions

Table of Contents generated with DocToc

LIMA - The Libre Multilingual Analyzer, a Natural Language Processing (NLP) toolkit

LIMA is a multilingual linguistic analyzer developed by the CEA LIST [1], LASTI laboratory [2] (French acronym for Text and Image Semantic Analysis Laboratory). LIMA is available under a dual licensing model.

The Free/Libre Open Source (FLOSS) version available under the Affero General Public License (AGPL) is fully functional with modules and resources to analyse English, French and Portuguese texts. You can thus use LIMA to any purpose as soon as any software linked to it or running it through Web services is Free software too.

The commercial version is completed on the one hand with modules useful to some CEA LIST industrial partners and on the other hand with modules and resources necessary to analyze the other supported languages (Arabic, Chinese, German, Spanish, etc.). The commercial version is available directly from CEA LIST through R&D partnerships or through our partner ANT'inno [3] with offers including support and adaptation to one's needs.

We welcome external contributions in the form of comments, suggestions, bug reports, bugs corrections, resources, etc. However, let note that before merging your contributions, we will ask you to sign a Copyright Assignment Agreement in order to allow the proper functioning of the dual licensing model.

FEATURES

  • easy to use simple GUI ;
  • tokenization ;
  • morphologic analysis including:
    • full-form dictionaries;
    • hyphen-words splitting;
    • concatenated words splitting (we're,...);
    • idiomatic expression recognizing;
    • part of speech tagging (two taggers are available. The LIMA legacy one, which is a little bit less performant but very useful for resources development, and a SVMTool++-based one [4]);
  • Named Entities Recognition (standard rule-based and neural network-based):
  • coreference resolution;
  • parsing (surface rule-based dependency parsing and soon neural network-based);
  • semantic analysis (disambiguation and semantic role labeling);
  • manual corpus annotation GUI;
  • regression testing;
  • evaluation tools.

DOWNLOAD and INSTALLATION

We provide a Docker container. We also provide packages for several different GNU/Linux versions (Debian and Ubuntu 16.04 and 18.04) and Microsoft Windows. There is finally instructions for building from the source code under GNU/Linux:

LIMA is known to work under Max OS X, but there is currently no binary package available and no detailed build instructions. If you know how to build cmake/c++/qt/boost-based software under MacOS, you should be able to build it by yourself!

DOCUMENTATION

Most of the available documentation is currently distributed among the various doc folders of the different modules. It is usually DocBook files. Some are still in French and should be translated soon.

There is nevertheless a number of information available on this Wiki:

CREDITS

LIMA uses several open source libraries and linguistic resources. See the COPYING file for details.

LICENCE

The Free/Libre Open Source (FLOSS) version of LIMA is available under the Affero General Public License (AGPL). A commercial version exists too.

CONTACT

For any discussion, please use the mailing list. Or open a GitHub issue.

You can also contact directly [the LIMA maintainer](mailto:gael DOT de-chalendar AT cea DOT fr)

REFERENCES

<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','//www.google-analytics.com/analytics.js','ga');ga('create', 'UA-48448560-1', 'github.com');ga('send', 'pageview');</script>