Skip to content

Commit

Permalink
Fix some typos in documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Bodyhealer authored and ThePhD committed Aug 9, 2023
1 parent eb48c0e commit 3ead068
Show file tree
Hide file tree
Showing 24 changed files with 39 additions and 39 deletions.
2 changes: 1 addition & 1 deletion documentation/source/api/encodings/execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Execution

This is the locale-based, runtime encoding. It uses a number of compile-time and runtime heuristics to eventually be resolved to an implementation-defined encoding. It is not required to work in constant expressions either: for this, use :doc:`ztd::text::literal </api/encodings/literal>`, which represents the compile-time string (e.g. ``"my string"``) encoding.

Currently, the hierachy of behaviors is like so:
Currently, the hierarchy of behaviors is like so:

- If the platform is MacOS, then it assumes this is :doc:`UTF-8 </api/encodings/utf8>`;
- Otherwise, if the :term:`cuneicode`, then Cuneicode will be used.
Expand Down
4 changes: 2 additions & 2 deletions documentation/source/api/encodings/petscii.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,10 @@
..
.. =============================================================================>
PETSCII (Shifted & Unhsifted, Combined) / CBM ASCII
PETSCII (Shifted & Unshifted, Combined) / CBM ASCII
===================================================

PET Standard Code of Information Interchange (PETSCII) was used for Commodore Buisiness Maschines and then moved into other Commodore machines (and adjacent machines). It has a "shifted" version (when the shift key was held) and an "unshifted" version (when the shift key was not being held).
PET Standard Code of Information Interchange (PETSCII) was used for Commodore Business Machines and then moved into other Commodore machines (and adjacent machines). It has a "shifted" version (when the shift key was held) and an "unshifted" version (when the shift key was not being held).

The state object for this encoding contains an enumeration that allows the user to select the shifted or unshifted versions at-will.

Expand Down
2 changes: 1 addition & 1 deletion documentation/source/api/encodings/shift_jis_x0208.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ SHIFT-JISX0208

The version of SHIFT-JIS that corresponds to standard X0208, one of the more prevalent versions.

Note that many versions of SHIFT-JIS do not correspond to one standard and often have different interpretations or characteristics. The communities which use them label them, indiscriminatly, as SHIFT-JIS without any kind of specific indicator or even out-of-band modifier. The text community surrounding this is, with all due respect, one gigantic mess. Most industry professionals inside and outside of Japan dealing with such text tend to gravitate towards the SHIFT-JISX0208 release, and simply use replacement characters / invalid indicators for such input text.
Note that many versions of SHIFT-JIS do not correspond to one standard and often have different interpretations or characteristics. The communities which use them label them, indiscriminately, as SHIFT-JIS without any kind of specific indicator or even out-of-band modifier. The text community surrounding this is, with all due respect, one gigantic mess. Most industry professionals inside and outside of Japan dealing with such text tend to gravitate towards the SHIFT-JISX0208 release, and simply use replacement characters / invalid indicators for such input text.

As such, it is advisable to perhaps attempt to find some out-of-band data to see if a specific data is, indeed, meant to be SHIFT-JISX0208.

Expand Down
2 changes: 1 addition & 1 deletion documentation/source/api/encodings/wide_execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Wide Execution

This is the locale-based, wide runtime encoding. It uses a number of compile-time and runtime heuristics to eventually be resolved to an implementation-defined encoding. It is not required to work in constant expressions either: for this, use :doc:`ztd::text::wide_literal </api/encodings/wide_literal>`, which represents the compile-time wide string (e.g. ``L"my string"``) encoding.

Currently, the hierachy of behaviors is like so:
Currently, the hierarchy of behaviors is like so:

- If the platform is Windows, then it assumes this is :doc:`UTF-16 </api/encodings/utf16>`;
- If the platform is MacOS or ``__STDC_ISO10646__``, then it assumed this is :doc:`UTF-32 </api/encodings/utf32>` of some kind;
Expand Down
2 changes: 1 addition & 1 deletion documentation/source/api/is_code_points_replaceable.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
is_code_points_(maybe\ _)replaceable
====================================

These two traits detect whether or not the given Encoding type have calls on them which return either a replacement range (``is_code_points_repleacable``) or a ``std::optional`` of a replacement range (``is_code_points_maybe_replaceable``).
These two traits detect whether or not the given Encoding type have calls on them which return either a replacement range (``is_code_points_replaceable``) or a ``std::optional`` of a replacement range (``is_code_points_maybe_replaceable``).

The former is useful when it is guaranteed that your encoding will have a replacement range on it and does not need the extra cost of an indirection from not knowing. The latter is useful when something like a wrapped encoding may or may not have a replacement sequence.

Expand Down
2 changes: 1 addition & 1 deletion documentation/source/api/is_code_units_replaceable.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
is_code_units_(maybe\ _)replaceable
===================================

These two traits detect whether or not the given Encoding type have calls on them which return either a replacement range (``is_code_units_repleacable``) or a ``std::optional`` of a replacement range (``is_code_units_maybe_replaceable``).
These two traits detect whether or not the given Encoding type have calls on them which return either a replacement range (``is_code_units_replaceable``) or a ``std::optional`` of a replacement range (``is_code_units_maybe_replaceable``).

The former is useful when it is guaranteed that your encoding will have a replacement range on it and does not need the extra cost of an indirection from not knowing. The latter is useful when something like a wrapped encoding may or may not have a replacement sequence.

Expand Down
2 changes: 1 addition & 1 deletion documentation/source/benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ The specification for these benchmarks is as follows:

- The latest of each library was used as of 23 December, 2022.
- Windows 10 Pro machine, general user processes running in the background (but machine not being used).
- AMD Ryzen 5 3600 6-Core @ 3600 MHz (12 Logcal Processors), 32.0 GB Physical Memory
- AMD Ryzen 5 3600 6-Core @ 3600 MHz (12 Logical Processors), 32.0 GB Physical Memory
- Clang 15.0.3, latest available Clang at the time of generation with MSVC ABI.
- Entire software stack for every dependency build under default CMake flags (including ICU and libiconv from vcpkg).
- Anywhere from 150 to 10million samples per iteration, with mean (average) of 100 iterations forming transparent dots on graph.
Expand Down
2 changes: 1 addition & 1 deletion documentation/source/bibliography.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ These are all the resources that this documentation links to, in alphabetical or
Bob Steagall. "Fast Conversion from UTF-8 with C++, DFAs, and SSE Intrinsics". September 26th, 2019. URL: `https://www.youtube.com/watch?v=5FQ87-Ecb-A <https://www.youtube.com/watch?v=5FQ87-Ecb-A>`_. This presentation demonstrates one of the ways an underlying fast decoder for UTF-8 can be written, rather than just letting the default work. This work can be hooked into the :doc:`conversion function extension points </design/converting>` location.

Fast UTF-8 Validation
Daniel Lemire. "Ridiculously fast unicode (UTF-8) validation". October 20th, 2020. URL: `https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/ <https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/>`_. This blog post is one of many that presents a faster, more optimized way to validate that UTF-8 is in its correcty form.
Daniel Lemire. "Ridiculously fast unicode (UTF-8) validation". October 20th, 2020. URL: `https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/ <https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/>`_. This blog post is one of many that presents a faster, more optimized way to validate that UTF-8 is in its correctly form.
2 changes: 1 addition & 1 deletion documentation/source/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ There are various configuration macros and CMake/build-time switches that will c
.. _config-ZTD_TEXT_UNICODE_SCALAR_VALUE_DISTINCT_TYPE:

- ``ZTD_TEXT_UNICODE_SCALAR_VALUE_DISTINCT_TYPE``
- Turns ``ztd::text::unicode_scalar_value`` from a type definition to ``char32_t`` to an implemenation-defined class type which enforces the various invariants of being a :term:`unicode scalar value`.
- Turns ``ztd::text::unicode_scalar_value`` from a type definition to ``char32_t`` to an implementation-defined class type which enforces the various invariants of being a :term:`unicode scalar value`.
- Default: **on**.
- Not turned off by-default under any conditions.

Expand Down
6 changes: 3 additions & 3 deletions documentation/source/definitions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Occasionally, we may need to use precise language to describe what we want. This
:sorted:

code unit
A single unit of encoded information. This is typically, 8-, 16-, or 32-bit entites arranged in some sequential fashion that, when read or treated in a certain manner, end up composing higher-level units which make up readable text. Much of the world's most useful encodings that encode text use multiple code units in sequence to give a specific meaning to something, which makes most encodings variable length encodings.
A single unit of encoded information. This is typically, 8-, 16-, or 32-bit entities arranged in some sequential fashion that, when read or treated in a certain manner, end up composing higher-level units which make up readable text. Much of the world's most useful encodings that encode text use multiple code units in sequence to give a specific meaning to something, which makes most encodings variable length encodings.

code point
A single unit of decoded information. Most typically associated with :term:`unicode code points <unicode code point>`, but they can be other things such as :term:`unicode scalar values <unicode scalar value>` or even a 13-bit value.
Expand Down Expand Up @@ -73,7 +73,7 @@ Occasionally, we may need to use precise language to describe what we want. This
Converting from a stream of input, typically code points, to a stream of output, typically code units. The output may be less suitable for general interchange or consumption, or is in a specific interchange format for the interoperation. Frequently, this library expects and works with the goal that any decoding process is producing :term:`unicode code points <unicode code point>` or :term:`unicode scalar values <unicode scalar value>` from some set of :term:`code units <code unit>`.

decode
Converting from a stream of input, typically code units, to a stream of output, typically code points. The output is generally in a form that is more widely consummable or easier to process than when it started. Frequently, this library expects and works with the goal that any decoding process is producing :term:`unicode code points <unicode code point>` or :term:`unicode scalar values <unicode scalar value>` from some set of :term:`code units <code unit>`.
Converting from a stream of input, typically code units, to a stream of output, typically code points. The output is generally in a form that is more widely consumable or easier to process than when it started. Frequently, this library expects and works with the goal that any decoding process is producing :term:`unicode code points <unicode code point>` or :term:`unicode scalar values <unicode scalar value>` from some set of :term:`code units <code unit>`.

transcode
Converting from one form of encoded information to another form of encoded information. In the context of this library, it means going from an input in one :term:`encoding <encoding>`'s code units to an output of another encoding's code units. Typically, this is done by invoking the :term:`decode <decode>` of the original encoding to reach a common interchange format (such as :term:`unicode code points <unicode code point>`) before taking that intermediate output and piping it through the :term:`encode <encode>` step of the other encoding. Different transcode operations may not need to go through a common interchange, and may transcode "directly", as a way to improve space utilization, time spent, or both.
Expand All @@ -82,7 +82,7 @@ Occasionally, we may need to use precise language to describe what we want. This
An operation which can map all input information to an output. This is used for this library, particularly, to determine whether an operation is lossy (loses information) or not. For example, UTF-8 to UTF-32 is an injective operation because the values in a UTF-8 encoding are preserved in a UTF-32 encoding. UTF-16 to GB18030 is also an injective operation. But, converting something like Latin-1 to ASCII is a lossy operation, or UTF-8 to SHIFT-JIS.

mojibake
(Japanese: 文字化け Pronunciation: [modʑibake] "unintelligible sequence of characters".) From Japanese 文字 (moji), meaning "character" and 化け (bake), meaning change, is an occurence of incorrect unreadable characters displayed when computer software fails to render text correctly to its associated character encoding.
(Japanese: 文字化け Pronunciation: [modʑibake] "unintelligible sequence of characters".) From Japanese 文字 (moji), meaning "character" and 化け (bake), meaning change, is an occurrence of incorrect unreadable characters displayed when computer software fails to render text correctly to its associated character encoding.

execution encoding
The locale-based encoding related to "multibyte characters" (C and C++ magic words) processed during program evaluation/execution. It is directly related to the ``std::set_locale(LC_CTYPE, ...)`` calls. Note that this is different from :term:`literal encoding`, which is the encoding of string literals. The two may not be (and many times, are not) the same.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Counting code units is the action of finding out how many code points will resul

Thusly, we use the algorithm as below to do the work. Given an ``input`` of ``code_unit``\ s with an ``encoding``, an initial ``count`` set at 0, and any necessary additional ``state``, we can generically predict how many code units will result from a decoding operation by running the following loop:

* ⏩ Is the ``input`` value empty? If so, is the ``state`` finished and have nothing to output? If both are true, return the current results with the the empty ``input``, curent ``count``, and ``state``, everything is okay ✅!
* ⏩ Is the ``input`` value empty? If so, is the ``state`` finished and have nothing to output? If both are true, return the current results with the the empty ``input``, current ``count``, and ``state``, everything is okay ✅!
* ⏩ Otherwise,

0. Set up an ``intermediate`` storage location of ``code_point``\ s, using the ``max_code_points`` of the input encoding, for the next operations.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Counting encodable data is the action of finding out how many code units will re

Thusly, we use the algorithm as below to do the work. Given an ``input`` of ``code_unit``\ s with an ``encoding``, an initial ``count`` set at 0, and any necessary additional ``state``, we can generically predict how many code units will result from a decoding operation by running the following loop:

* ⏩ Is the ``input`` value empty? If so, is the ``state`` finished and have nothing to output? If both are true, return the current results with the the empty ``input``, curent ``count``, and ``state``, everything is okay ✅!
* ⏩ Is the ``input`` value empty? If so, is the ``state`` finished and have nothing to output? If both are true, return the current results with the the empty ``input``, current ``count``, and ``state``, everything is okay ✅!
* ⏩ Otherwise,

0. Set up an ``intermediate`` storage location of ``code_unit``\ s, using the ``max_code_units`` of the input encoding, for the next operations.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,15 @@ This operation counts how much text will result from a transcode operation. Esse

Thusly, we use the algorithm as below to do the work. Given an ``input`` of ``code_unit``\ s with an ``encoding``, an initial ``count`` set at 0, and any necessary additional ``state``, we can generically predict how many code units will result from a decoding operation by running the following loop:

* ⏩ Is the ``input`` value empty? If so, is the ``state`` finished and have nothing to output? If both are true, return the current results with the the empty ``input``, curent ``count``, and ``state``, everything is okay ✅!
* ⏩ Is the ``input`` value empty? If so, is the ``state`` finished and have nothing to output? If both are true, return the current results with the the empty ``input``, current ``count``, and ``state``, everything is okay ✅!
* ⏩ Otherwise,

0. Set up an ``intermediate`` storage location of ``code_point``\ s (of the input encoding), using the ``max_code_points`` of the input encoding; and, set up an ``intermediate_output`` storage location of ``code_unit``\ s (of the output encoding), for the next operations.
1. Do the ``decode_one`` step from ``input`` (using its ``begin()`` and ``end()``) into the ``intermediate`` ``code_point`` storage location, saving the returned ``intermediate_output`` from the ``decode_one`` call.

* 🛑 If it failed, return with the current ``input`` (unmodified from before this iteration, if possible), current ``count``, and ``state``\ s.

2. Do the ``encode_one`` step from ``intermdiate`` (using its ``begin()`` and ``end()``) into the ``intermediate_output`` ``code_unit`` storage location, saving the returned ``intermediate_output`` from the ``encode_one`` call.
2. Do the ``encode_one`` step from ``intermediate`` (using its ``begin()`` and ``end()``) into the ``intermediate_output`` ``code_unit`` storage location, saving the returned ``intermediate_output`` from the ``encode_one`` call.

* 🛑 If it failed, return with the current ``input`` (unmodified from before this iteration, if possible), current ``count``, and ``state``\ s.

Expand Down
2 changes: 1 addition & 1 deletion documentation/source/design/converting/recode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,6 @@ Thusly, we use the algorithm as below to do the work. Given an ``input`` of ``co
* ⏩ Update ``input``\ 's ``begin()`` value to point to after what was read by the ``decode_one`` step.
* ⤴️ Go back to the start.

This fundamental process works for any 2 encoding pairs, and does not require the first encoding ``from_encoding`` to know any details about the second encoding ``to_encoding``! This means a user is only responsible for upholding their end of the bargain with their encoding object, and can thusly interoperate with every other encoding that speaks in the same intermediade, decoded values (i.e. :term:`unicode code points <unicode code point>`).
This fundamental process works for any 2 encoding pairs, and does not require the first encoding ``from_encoding`` to know any details about the second encoding ``to_encoding``! This means a user is only responsible for upholding their end of the bargain with their encoding object, and can thusly interoperate with every other encoding that speaks in the same intermediate, decoded values (i.e. :term:`unicode code points <unicode code point>`).

Check out the API documentation for :doc:`ztd::text::recode </api/conversions/recode>` to learn more.
2 changes: 1 addition & 1 deletion documentation/source/design/converting/transcode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,6 @@ Thusly, we use the algorithm as below to do the work. Given an ``input`` of ``co
* ⏩ Update ``input``\ 's ``begin()`` value to point to after what was read by the ``decode_one`` step.
* ⤴️ Go back to the start.

This fundamental process works for any 2 encoding pairs, and does not require the first encoding ``from_encoding`` to know any details about the second encoding ``to_encoding``! This means a user is only responsible for upholding their end of the bargain with their encoding object, and can thusly interoperate with every other encoding that speaks in the same intermediade, decoded values (i.e. :term:`unicode code points <unicode code point>`).
This fundamental process works for any 2 encoding pairs, and does not require the first encoding ``from_encoding`` to know any details about the second encoding ``to_encoding``! This means a user is only responsible for upholding their end of the bargain with their encoding object, and can thusly interoperate with every other encoding that speaks in the same intermediate, decoded values (i.e. :term:`unicode code points <unicode code point>`).

Check out the API documentation for :doc:`ztd::text::transcode </api/conversions/transcode>` to learn more.
Loading

0 comments on commit 3ead068

Please sign in to comment.