Skip to content

CLI and Imported Tool #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 24, 2018
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
*~
train
test
files2

bk.txt
blockIDs.txt
Expand Down Expand Up @@ -115,3 +116,4 @@ venv.bak/

# mypy
.mypy_cache/

70 changes: 0 additions & 70 deletions README.md

This file was deleted.

88 changes: 88 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
``rnlp``
========

*Relational NLP Preprocessing: A Python package and tool for converting text into a set of relational facts.*

.. image:: https://img.shields.io/pypi/pyversions/rnlp.svg?style=flat-square
.. image:: https://img.shields.io/pypi/v/rnlp.svg?style=flat-square
.. image:: https://img.shields.io/pypi/l/rnlp.svg?style=flat-square
.. image:: https://img.shields.io/readthedocs/rnlp/stable.svg?flat-square
:target: http://rnlp.readthedocs.io/en/stable/

**Kaushik Roy** (`@kkroy36`_) and **Alexander L. Hayes** (`@batflyer`_)

Installation
------------

Stable builds on PyPi

.. code-block:: bash

pip install rnlp

Development builds on GitHub

.. code-block:: bash

pip install git+git://github.com/starling-lab/rnlp.git

Quick-Start
-----------

``rnlp`` can be used either as a CLI tool or as an imported Python Package.

+---------------------------------------+--------------------------------------+
| **CLI** | **Imported** |
+---------------------------------------+--------------------------------------+
|.. code-block:: bash |.. code-block:: python |
| | |
| $ python -m rnlp -f files/doi.txt | from rnlp.corpus import declaration |
| | import rnlp |
| | |
| | doi = declaration() |
| | rnlp.converter(doi) |
+---------------------------------------+--------------------------------------+

Text will be converted into relational facts, relations encoded are:

- between blocks of size 'n' (i.e. 2 sentences) in the blocks.

- between block's of size n (i.e. 'n' sentences) and sentences in the blocks.

- between sentences and words in the sentences.

---

The relationships currently encoded are:

1. earlySentenceInBlock - sentence occurs within a third of the block length

2. earlyWordInSentence - word occurs within a third of the sentence length

3. lateSentenceInBlock - sentence occurs after two-thirds of the block length

4. midWayWordInSentence - word occurs between a third and two-thirds of the block length

5. nextSentenceInBlock - sentence that follows a sentence in a block

6. nextWordInSentence - word that follows a word in a sentence in a block

7. sentenceInBlock - sentence occurs in a block

8. wordInSentence - word occurs in a sentence.

9. wordString - the string contained in the word.

10. partOfSpeech - the part of speech of the word.

---

Files contain a toy corpus (``files/``) and an image of a BoostSRL tree for predicting if a word in a sentence is the word "you".

.. image:: https://raw.githubusercontent.com/starling-lab/rnlp/master/docs/img/output.png

The tree says that if the word string contained in word 'b' is "you" then 'b' is the word "you". (This is of course true).
A more interesting inference is the False branch that says that if word 'b' is an early word in sentence 'a' and word 'anon12035' is also an early word in sentence 'a' and if the word string contained in word 'anon12035' is "Thank", then the word 'b' has decent change of being the word "you". (The model was able to learn that the word "you" often occurs with the word "Thank" in the same sentence when "Thank" appears early in that sentence).

.. _`@kkroy36`: https://github.com/kkroy36/
.. _`@batflyer`: https://github.com/batflyer/
98 changes: 91 additions & 7 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,102 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Welcome to rnlp's documentation!
================================
``rnlp``
========

*Relational NLP Preprocessing*: A Python package and tool for converting text
into a set of relational facts.

:Authors:
Kaushik Roy (`@kkroy36 <https://github.com/kkroy36/>`_), Alexander L. Hayes (`@batflyer <https://github.com/batflyer/>`_)

:Index: :ref:`genindex`
:Modules: :ref:`modindex`
:Source: `GitHub <https://github.com/starling-lab/rnlp>`_
:Bugtracker: `GitHub Issues <https://github.com/starling-lab/rnlp/issues/>`_

.. image:: https://img.shields.io/pypi/pyversions/rnlp.svg?style=flat-square
.. image:: https://img.shields.io/pypi/v/rnlp.svg?style=flat-square
.. image:: https://img.shields.io/pypi/l/rnlp.svg?style=flat-square
.. image:: https://img.shields.io/readthedocs/rnlp/stable.svg?flat-square
:target: http://rnlp.readthedocs.io/en/stable/

.. toctree::
:maxdepth: 2
:caption: Contents:

Installation
------------

Stable builds on PyPi

.. code-block:: bash

pip install rnlp

Development builds on GitHub

.. code-block:: bash

pip install git+git://github.com/starling-lab/rnlp.git

Quick-Start
-----------

``rnlp`` can be used either as a CLI tool or as an imported Python Package.

+---------------------------------------+--------------------------------------+
| **CLI** | **Imported** |
+---------------------------------------+--------------------------------------+
|.. code-block:: bash |.. code-block:: python |
| | |
| $ python -m rnlp -f files/doi.txt | from rnlp.corpus import declaration |
| | import rnlp |
| | |
| | doi = declaration() |
| | rnlp.converter(doi) |
+---------------------------------------+--------------------------------------+

Text will be converted into relational facts, relations encoded are:

- between blocks of size 'n' (i.e. 2 sentences) in the blocks.

- between block's of size n (i.e. 'n' sentences) and sentences in the blocks.

- between sentences and words in the sentences.

---

The relationships currently encoded are:

1. earlySentenceInBlock - sentence occurs within a third of the block length

2. earlyWordInSentence - word occurs within a third of the sentence length

3. lateSentenceInBlock - sentence occurs after two-thirds of the block length

4. midWayWordInSentence - word occurs between a third and two-thirds of the block length

5. nextSentenceInBlock - sentence that follows a sentence in a block

6. nextWordInSentence - word that follows a word in a sentence in a block

7. sentenceInBlock - sentence occurs in a block

8. wordInSentence - word occurs in a sentence.

9. wordString - the string contained in the word.

10. partOfSpeech - the part of speech of the word.

---

Files contain a toy corpus (``files/``) and an image of a BoostSRL tree for predicting if a word in a sentence is the word "you".

.. image:: https://raw.githubusercontent.com/starling-lab/rnlp/master/docs/img/output.png

Indices and tables
==================
The tree says that if the word string contained in word 'b' is "you" then 'b' is the word "you". (This is of course true).
A more interesting inference is the False branch that says that if word 'b' is an early word in sentence 'a' and word 'anon12035' is also an early word in sentence 'a' and if the word string contained in word 'anon12035' is "Thank", then the word 'b' has decent change of being the word "you". (The model was able to learn that the word "you" often occurs with the word "Thank" in the same sentence when "Thank" appears early in that sentence).

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. _`@kkroy36`: https://github.com/kkroy36/
.. _`@batflyer`: https://github.com/batflyer/
9 changes: 8 additions & 1 deletion docs/source/rnlp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,21 @@ rnlp\.corpus module
:undoc-members:
:show-inheritance:

rnlp\.parseInputCorpus module
rnlp\.parse module
-----------------------------

.. automodule:: rnlp.parse
:members:
:undoc-members:
:show-inheritance:

rnlp\.textprocessing module
-----------------------------

.. automodule:: rnlp.textprocessing
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------
Expand Down
Loading