-
Notifications
You must be signed in to change notification settings - Fork 21
Home
Table of Contents generated with DocToc
LIMA is a multilingual linguistic analyzer developed by the CEA LIST [1], LASTI laboratory [2] (French acronym for Text and Image Semantic Analysis Laboratory). LIMA is available under a dual licensing model.
The Free/Libre Open Source (FLOSS) version available under the Affero General Public License (AGPL) is fully functional with modules and resources to analyse English, French and Portuguese texts. You can thus use LIMA to any purpose as soon as any software linked to it or running it through Web services is Free software too.
The commercial version is completed on the one hand with modules useful to some CEA LIST industrial partners and on the other hand with modules and resources necessary to analyze the other supported languages (Arabic, Chinese, German, Spanish, etc.). The commercial version is available directly from CEA LIST through R&D partnerships or through our partner ANT'inno [3] with offers including support and adaptation to one's needs.
We welcome external contributions in the form of comments, suggestions, bug reports, bugs corrections, resources, etc. However, let note that before merging your contributions, we will ask you to sign a Copyright Assignment Agreement in order to allow the proper functioning of the dual licensing model.
- easy to use simple GUI ;
- tokenization ;
- morphologic analysis including:
- full-form dictionaries;
- hyphen-words splitting;
- concatenated words splitting (we're,...);
- idiomatic expression recognizing;
- part of speech tagging (two taggers are available. The LIMA legacy one, which is a little bit less performant but very useful for resources development, and a SVMTool++-based one [4]);
- Named Entities Recognition (standard rule-based and neural network-based):
- coreference resolution;
- parsing (surface rule-based dependency parsing and soon neural network-based);
- semantic analysis (disambiguation and semantic role labeling);
- manual corpus annotation GUI;
- regression testing;
- evaluation tools.
We provide a Docker container. We also provide packages for several different GNU/Linux versions (Debian and Ubuntu 16.04 and 18.04) and Microsoft Windows. There is finally instructions for building from the source code under GNU/Linux:
- Docker container
- Packages for various Linux distributions
- Packages for MS Windows 64
- Building from source code
LIMA is known to work under Max OS X, but there is currently no binary package available and no detailed build instructions. If you know how to build cmake/c++/qt/boost-based software under MacOS, you should be able to build it by yourself!
Most of the available documentation is currently distributed among the various doc folders of the different modules. It is usually DocBook files. Some are still in French and should be translated soon.
There is nevertheless a number of information available on this Wiki:
- The LIMA User Manual;
- Explanation on the Linguistic Processing Steps in LIMA;
- Explanation on Linguistic Processing Steps Not Included in the AGPL version of LIMA.
LIMA uses several open source libraries and linguistic resources. See the COPYING file for details.
The Free/Libre Open Source (FLOSS) version of LIMA is available under the Affero General Public License (AGPL). A commercial version exists too.
For any discussion, please use the mailing list. Or open a GitHub issue.
You can also contact directly [the LIMA maintainer](mailto:gael DOT de-chalendar AT cea DOT fr)
<script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)})(window,document,'script','//www.google-analytics.com/analytics.js','ga');ga('create', 'UA-48448560-1', 'github.com');ga('send', 'pageview');</script>Table of Contents generated with DocToc