Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence checker #4592

Draft
wants to merge 7 commits into
base: rolling
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ multiversion: Makefile
@$(BUILD) -M $@ "$(SOURCE)" "$(OUT)" $(OPTS)

lint:
sphinx-lint source
./sphinx-lint-with-ros source

test:
doc8 --ignore D001 --ignore-path build
Expand Down
118 changes: 118 additions & 0 deletions plugins/ros_checkers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
#!/usr/bin/python3
from sphinxlint.checkers import checker
from sphinxlint.utils import paragraphs
import re

# Derived from https://stackoverflow.com/a/31505798
DOT = '•'
QUESTION = '✇'
EXCLAM = '‼'
SP_LOOKUP = {
'.': DOT,
'?': QUESTION,
'!': EXCLAM,
}
STOP = '¶'

# Classes and patterns
ALPHABETIC = r'([A-Za-z])'
DIGITS = '([0-9])'
NOT_BREAK = r'[^!.?]'
BREAK = r'([!.?])'
PREFIXES = r'(Mr|St|Mrs|Ms|Dr|etc|vol|cf|et al|vs|eg|Proc|Mon|Tue|Wed|Thu|Fri)\.'
SUFFIXES = r'(Inc|Ltd|Jr|Sr|Co)'
WEBSITES = r'\.(com|net|org|io|gov|edu|me|ros)'
EXTENSIONS = r'\.(xml|cfg|py|launch|frame_id|ini|md|log|h|cpp|bash|patch|gz|yaml|txt|NET|js|msg|srv|action|rst|exe|so|iso|stl|idl)'
MULTIPLE_DOTS = r'(\.{2,})([^\.])'
STARTERS = r'(Mr|Mrs|Ms|Dr|Prof|Capt|Cpt|Lt|He\s|She\s|It\s|They\s|Their\s|Our\s|We\s|But\s|However\s|That\s|This\s|Wherever)'
ACRONYMS = r'([A-Z][.][A-Z][.](?:[A-Z][.])?)'
DIGITS_DOT_DIGITS = re.compile(DIGITS + '([.])' + DIGITS)
SINGLE_LETTER = re.compile(r'(\s[A-Za-z])(\.)(\s)')
ON_THE_INSIDE = re.compile(r'()' + BREAK + r'(["\)])')
DOT_PAREN = re.compile(r'\.( \()')

# RST Formatting Patterns
HYPERLINK = re.compile(r'(<[^>.!?]*)' + BREAK + r'([^>]*>)')
BACKTICK = re.compile(r'(`[^`.!?]*)' + BREAK + r'([^`]*`)')
LIST_PREFIX = re.compile(r'^(\s*[\d#]+)(\.)(.*)')
TRAILING_FORMATTING = re.compile(r'()' + BREAK + r'(\s*[\*\)]*\s*)$')


def split_into_sentences(text):
"""
Split the text into sentences.

Assumes text does not contain the special characters DOT or STOP

:param text: text to be split into sentences
:type text: str

:return: list of sentences
:rtype: list[str]
"""

text = ' ' + text + ' '
text = re.sub('Lu!!', f'Lu{EXCLAM}{EXCLAM}', text)

# Convert nonbreaking punctuation to special characters
for pattern in [DIGITS_DOT_DIGITS, SINGLE_LETTER,
HYPERLINK, BACKTICK, LIST_PREFIX, TRAILING_FORMATTING,
ON_THE_INSIDE,
]:
m = pattern.search(text)
while m:
text = text.replace(m.group(0), m.group(1) + SP_LOOKUP[m.group(2)] + m.group(3))
m = pattern.search(text)

text = re.sub(MULTIPLE_DOTS, lambda match: DOT * len(match.group(1)) + match.group(2), text)

for pattern in [PREFIXES, SUFFIXES]:
text = re.sub(pattern, '\\1' + DOT, text)
for pattern in [DOT_PAREN, WEBSITES, EXTENSIONS]:
text = re.sub(pattern, DOT + '\\1', text)

text = re.sub('Ph\.D\.', f'Ph{DOT}D{DOT}', text)
text = re.sub('i\.e\.', f'i{DOT}e{DOT}', text)

text = re.sub('Steven!', f'Steven{EXCLAM}', text)
text = re.sub('vd\. Hoorn', f'vd{DOT} Hoorn', text)

text = re.sub(ACRONYMS + ' ' + STARTERS,
f'\\1{STOP} \\2',
text)
text = re.sub(ALPHABETIC + '[.]' + ALPHABETIC + '[.]' + ALPHABETIC + '[.]',
f'\\1{DOT}\\2{DOT}\\3' + DOT,
text
)
text = re.sub(ALPHABETIC + '[.]' + ALPHABETIC + '[.]', f'\\1{DOT}\\2{DOT}', text)

# Convert breaking punctuation to include STOP character
# and convert special characters back to normal
for stopper, replacement in SP_LOOKUP.items():
text = text.replace(stopper, stopper + STOP)
text = text.replace(replacement, stopper)

sentences = text.split(STOP)
sentences = [s.strip() for s in sentences]

return list(filter(None, sentences))


@checker('.rst', '.md')
def check_sentence_count(file, lines, options=None):
for paragraph_lno, paragraph in paragraphs(lines):
for special_char in [DOT, STOP, QUESTION]:
if special_char in paragraph:
yield paragraph_lno, f'Contains the special character {special_char}'

if paragraph.lstrip().startswith('.. '):
continue

for i, line in enumerate(paragraph.split('\n')):
sentences = split_into_sentences(line)
if len(sentences) <= 1:
continue

sentence0_words = ' '.join(sentences[0].split(' ')[-3:])
sentence1_words = ' '.join(sentences[1].split(' ')[:3])
yield paragraph_lno + i, f'Each sentence must start on a new line. Break between "{sentence0_words}" and "{sentence1_words}"'
38 changes: 19 additions & 19 deletions source/Concepts/Basic/About-Interfaces.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,63 +74,63 @@ Field types can be:
- `DDS type <https://design.ros2.org/articles/mapping_dds_types.html>`__
* - bool
- bool
- builtins.bool
- ``builtins.bool``
- boolean
* - byte
- uint8_t
- builtins.bytes*
- ``builtins.bytes*``
- octet
* - char
- char
- builtins.int*
- ``builtins.int*``
- char
* - float32
- float
- builtins.float*
- ``builtins.float*``
- float
* - float64
- double
- builtins.float*
- ``builtins.float*``
- double
* - int8
- int8_t
- builtins.int*
- ``builtins.int*``
- octet
* - uint8
- uint8_t
- builtins.int*
- ``builtins.int*``
- octet
* - int16
- int16_t
- builtins.int*
- ``builtins.int*``
- short
* - uint16
- uint16_t
- builtins.int*
- ``builtins.int*``
- unsigned short
* - int32
- int32_t
- builtins.int*
- ``builtins.int*``
- long
* - uint32
- uint32_t
- builtins.int*
- ``builtins.int*``
- unsigned long
* - int64
- int64_t
- builtins.int*
- ``builtins.int*``
- long long
* - uint64
- uint64_t
- builtins.int*
- ``builtins.int*``
- unsigned long long
* - string
- std::string
- builtins.str
- ``builtins.str``
- string
* - wstring
- std::u16string
- builtins.str
- ``builtins.str``
- wstring

*Every built-in-type can be used to define arrays:*
Expand All @@ -144,19 +144,19 @@ Field types can be:
- `DDS type <https://design.ros2.org/articles/mapping_dds_types.html>`__
* - static array
- std::array<T, N>
- builtins.list*
- ``builtins.list*``
- T[N]
* - unbounded dynamic array
- std::vector
- builtins.list
- ``builtins.list``
- sequence
* - bounded dynamic array
- custom_class<T, N>
- builtins.list*
- ``builtins.list*``
- sequence<T, N>
* - bounded string
- std::string
- builtins.str*
- ``builtins.str*``
- string

All types that are more permissive than their ROS definition enforce the ROS constraints in range and length by software.
Expand Down
31 changes: 23 additions & 8 deletions source/Concepts/Intermediate/About-Logging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ If the node's name is externally remapped to something other than what is define
Non-node loggers can also be created that use a specific name.

Logger names represent a hierarchy.
If the level of a logger named "abc.def" is unset, it will defer to the level of its parent named "abc", and if that level is also unset, the default logger level will be used.
When the level of logger "abc" is changed, all of its descendants (e.g. "abc.def", "abc.ghi.jkl") will have their level impacted unless their level has been explicitly set.
If the level of a logger named ``abc.def`` is unset, it will defer to the level of its parent named ``abc``, and if that level is also unset, the default logger level will be used.
When the level of logger ``abc`` is changed, all of its descendants (e.g. ``abc.def``, ``abc.ghi.jkl``) will have their level impacted unless their level has been explicitly set.

APIs
----
Expand Down Expand Up @@ -92,12 +92,27 @@ Environment variables
The following environment variables control some aspects of the ROS 2 loggers.
For each of the environment settings, note that this is a process-wide setting, so it applies to all nodes in that process.

* ``ROS_LOG_DIR`` - Control the logging directory that is used for writing logging messages to disk (if that is enabled). If non-empty, use the exact directory as specified in this variable. If empty, use the contents of the ``ROS_HOME`` environment variable to construct a path of the form ``$ROS_HOME/.log``. In all cases, the ``~`` character is expanded to the user's HOME directory.
* ``ROS_HOME`` - Control the home directory that is used for various ROS files, including logging and config files. In the context of logging, this variable is used to construct a path to a directory for log files. If non-empty, use the contents of this variable for the ROS_HOME path. In all cases, the ``~`` character is expanded to the users's HOME directory.
* ``RCUTILS_LOGGING_USE_STDOUT`` - Control what stream output messages go to. If this is unset or 0, use stderr. If this is 1, use stdout.
* ``RCUTILS_LOGGING_BUFFERED_STREAM`` - Control whether the logging stream (as configured in ``RCUTILS_LOGGING_USE_STDOUT``) should be line buffered or unbuffered. If this is unset, use the default of the stream (generally line buffered for stdout, and unbuffered for stderr). If this is 0, force the stream to be unbuffered. If this is 1, force the stream to be line buffered.
* ``RCUTILS_COLORIZED_OUTPUT`` - Control whether colors are used when outputting messages. If unset, automatically determine based on the platform and whether the console is a TTY. If 0, force disable using colors for output. If 1, force enable using colors for output.
* ``RCUTILS_CONSOLE_OUTPUT_FORMAT`` - Control the fields that are output for each log message. The available fields are:
* ``ROS_LOG_DIR`` - Control the logging directory that is used for writing logging messages to disk (if that is enabled).
If non-empty, use the exact directory as specified in this variable.
If empty, use the contents of the ``ROS_HOME`` environment variable to construct a path of the form ``$ROS_HOME/.log``.
In all cases, the ``~`` character is expanded to the user's HOME directory.
* ``ROS_HOME`` - Control the home directory that is used for various ROS files, including logging and config files.
In the context of logging, this variable is used to construct a path to a directory for log files.
If non-empty, use the contents of this variable for the ROS_HOME path.
In all cases, the ``~`` character is expanded to the users's HOME directory.
* ``RCUTILS_LOGGING_USE_STDOUT`` - Control what stream output messages go to.
If this is unset or 0, use stderr.
If this is 1, use stdout.
* ``RCUTILS_LOGGING_BUFFERED_STREAM`` - Control whether the logging stream (as configured in ``RCUTILS_LOGGING_USE_STDOUT``) should be line buffered or unbuffered.
If this is unset, use the default of the stream (generally line buffered for stdout, and unbuffered for stderr).
If this is 0, force the stream to be unbuffered.
If this is 1, force the stream to be line buffered.
* ``RCUTILS_COLORIZED_OUTPUT`` - Control whether colors are used when outputting messages.
If unset, automatically determine based on the platform and whether the console is a TTY.
If 0, force disable using colors for output.
If 1, force enable using colors for output.
* ``RCUTILS_CONSOLE_OUTPUT_FORMAT`` - Control the fields that are output for each log message.
The available fields are:

* ``{severity}`` - The severity level.
* ``{name}`` - The name of the logger (may be empty).
Expand Down
Loading