Skip to content

Commit 9aa27cb

Browse files
committed
Updating docstrings and documentation.
1 parent 08deb68 commit 9aa27cb

File tree

6 files changed

+250
-53
lines changed

6 files changed

+250
-53
lines changed

Diff for: docs/source/index.rst

+91-7
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,102 @@
33
You can adapt this file completely to your liking, but it should at least
44
contain the root `toctree` directive.
55
6-
Welcome to rnlp's documentation!
7-
================================
6+
``rnlp``
7+
========
8+
9+
*Relational NLP Preprocessing*: A Python package and tool for converting text
10+
into a set of relational facts.
11+
12+
:Authors:
13+
Kaushik Roy (`@kkroy36 <https://github.com/kkroy36/>`_), Alexander L. Hayes (`@batflyer <https://github.com/batflyer/>`_)
14+
15+
:Index: :ref:`genindex`
16+
:Modules: :ref:`modindex`
17+
:Source: `GitHub <https://github.com/starling-lab/rnlp>`_
18+
:Bugtracker: `GitHub Issues <https://github.com/starling-lab/rnlp/issues/>`_
19+
20+
.. image:: https://img.shields.io/pypi/pyversions/rnlp.svg?style=flat-square
21+
.. image:: https://img.shields.io/pypi/v/rnlp.svg?style=flat-square
22+
.. image:: https://img.shields.io/pypi/l/rnlp.svg?style=flat-square
23+
.. image:: https://img.shields.io/readthedocs/rnlp/stable.svg?flat-square
24+
:target: http://rnlp.readthedocs.io/en/stable/
825

926
.. toctree::
1027
:maxdepth: 2
1128
:caption: Contents:
1229

30+
Installation
31+
------------
32+
33+
Stable builds on PyPi
34+
35+
.. code-block:: bash
36+
37+
pip install rnlp
38+
39+
Development builds on GitHub
40+
41+
.. code-block:: bash
42+
43+
pip install git+git://github.com/starling-lab/rnlp.git
44+
45+
Quick-Start
46+
-----------
47+
48+
``rnlp`` can be used either as a CLI tool or as an imported Python Package.
49+
50+
+---------------------------------------+--------------------------------------+
51+
| **CLI** | **Imported** |
52+
+---------------------------------------+--------------------------------------+
53+
|.. code-block:: bash |.. code-block:: python |
54+
| | |
55+
| $ python -m rnlp -f files/doi.txt | from rnlp.corpus import declaration |
56+
| | import rnlp |
57+
| | |
58+
| | doi = declaration() |
59+
| | rnlp.converter(doi) |
60+
+---------------------------------------+--------------------------------------+
61+
62+
Text will be converted into relational facts, relations encoded are:
63+
64+
- between blocks of size 'n' (i.e. 2 sentences) in the blocks.
65+
66+
- between block's of size n (i.e. 'n' sentences) and sentences in the blocks.
67+
68+
- between sentences and words in the sentences.
69+
70+
---
71+
72+
The relationships currently encoded are:
73+
74+
1. earlySentenceInBlock - sentence occurs within a third of the block length
75+
76+
2. earlyWordInSentence - word occurs within a third of the sentence length
77+
78+
3. lateSentenceInBlock - sentence occurs after two-thirds of the block length
79+
80+
4. midWayWordInSentence - word occurs between a third and two-thirds of the block length
81+
82+
5. nextSentenceInBlock - sentence that follows a sentence in a block
83+
84+
6. nextWordInSentence - word that follows a word in a sentence in a block
85+
86+
7. sentenceInBlock - sentence occurs in a block
87+
88+
8. wordInSentence - word occurs in a sentence.
89+
90+
9. wordString - the string contained in the word.
91+
92+
10. partOfSpeech - the part of speech of the word.
93+
94+
---
95+
96+
Files contain a toy corpus (``files/``) and an image of a BoostSRL tree for predicting if a word in a sentence is the word "you".
1397

98+
.. image:: https://raw.githubusercontent.com/starling-lab/rnlp/master/docs/img/output.png
1499

15-
Indices and tables
16-
==================
100+
The tree says that if the word string contained in word 'b' is "you" then 'b' is the word "you". (This is of course true).
101+
A more interesting inference is the False branch that says that if word 'b' is an early word in sentence 'a' and word 'anon12035' is also an early word in sentence 'a' and if the word string contained in word 'anon12035' is "Thank", then the word 'b' has decent change of being the word "you". (The model was able to learn that the word "you" often occurs with the word "Thank" in the same sentence when "Thank" appears early in that sentence).
17102

18-
* :ref:`genindex`
19-
* :ref:`modindex`
20-
* :ref:`search`
103+
.. _`@kkroy36`: https://github.com/kkroy36/
104+
.. _`@batflyer`: https://github.com/batflyer/

Diff for: docs/source/rnlp.rst

+8-1
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,21 @@ rnlp\.corpus module
1212
:undoc-members:
1313
:show-inheritance:
1414

15-
rnlp\.parseInputCorpus module
15+
rnlp\.parse module
1616
-----------------------------
1717

1818
.. automodule:: rnlp.parse
1919
:members:
2020
:undoc-members:
2121
:show-inheritance:
2222

23+
rnlp\.textprocessing module
24+
-----------------------------
25+
26+
.. automodule:: rnlp.textprocessing
27+
:members:
28+
:undoc-members:
29+
:show-inheritance:
2330

2431
Module contents
2532
---------------

Diff for: rnlp/__init__.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,9 @@
7575
]
7676

7777
from . import parse
78-
from . import textprocessing
78+
79+
from .textprocessing import getBlocks
80+
from .textprocessing import getSentences
7981

8082
def converter(input_string, block_size=2):
8183
"""

Diff for: rnlp/corpus.py

+38-7
Original file line numberDiff line numberDiff line change
@@ -14,33 +14,64 @@
1414
# along with this program (at the base of this repository). If not,
1515
# see <http://www.gnu.org/licenses/>
1616

17+
"""
18+
rnlp.corpus
19+
-----------
20+
21+
Built-in corpus and utilities for reading corpora from files.
22+
23+
.. code-block:: python
24+
25+
# rnlp.corpus is not imported by default.
26+
import rnlp.corpus
27+
"""
28+
1729
from os import listdir
1830
from tqdm import tqdm
1931

20-
def readCorpus(file):
32+
def readCorpus(location):
2133
"""
2234
Returns the contents of a file or a group of files as a string.
2335
24-
:param file: .txt file or a directory to read files from.
25-
:type file: str.
36+
:param location: .txt file or a directory to read files from.
37+
:type location: str.
2638
2739
:returns: A string of all contents joined together.
2840
:rtype: str.
41+
42+
.. note::
43+
44+
This function takes a ``location`` on disk as a parameter. Location is
45+
assumed to be a string representing a text file or a directory. A text
46+
file is further assumed to contain ``.txt`` as a file extension while
47+
a directory may be a path.
48+
49+
Example:
50+
51+
.. code-block:: python
52+
53+
from rnlp.corpus import readCorpus
54+
55+
# If you have a text file:
56+
doi = readCorpus('files/doi.txt')
57+
58+
# If you have multiple files to read from:
59+
corpus = readCorpus('files')
2960
"""
3061
print("Reading corpus from file(s)...")
3162

3263
corpus = ''
3364

34-
if '.txt' in file:
35-
with open(file) as fp:
65+
if '.txt' in location:
66+
with open(location) as fp:
3667
corpus = fp.read()
3768
else:
3869

39-
dirFiles = listdir(file)
70+
dirFiles = listdir(location)
4071
nFiles = len(dirFiles)
4172

4273
for f in tqdm(dirFiles):
43-
with open(file+"/"+f) as fp:
74+
with open(location+"/"+f) as fp:
4475
corpus += fp.read()
4576

4677
return corpus

0 commit comments

Comments
 (0)