diff --git a/developing/architecture.rst b/developing/architecture.rst index cddf9ac..9eed0be 100644 --- a/developing/architecture.rst +++ b/developing/architecture.rst @@ -27,7 +27,7 @@ Access to Strider is managed by membership in a GitHub organization or in teams Strider is prepopulated with a build for the instance's control repository that preprocesses and submits site-wide assets to the content service, and automatically creates new content builds based on a list in a configuration file. -The asset preparer process and any content build preparer processes are run in isolated Docker containers, sharing a workspace with Strider by a data volume container. +The asset preparer, content preparer, and submitter processes are run in isolated Docker containers, sharing a workspace with Strider by a data volume container. Components ---------- @@ -39,29 +39,25 @@ Components :term:`metadata envelopes`, each of which contains one page of rendered HTML and associated metadata. - If the current branch is live, the generated envelopes are then submitted to the - :term:`content service` for storage and indexing. Otherwise, a local :term:`presenter` is - invoked to complete a full build of this subtree of the final site, which is then published to - CDN and linked on the pull request. - - There will be one preparer for each supported format of :term:`content repository`; initially, - Sphinx and Jekyll. The preparer will be executed by a CI/CD system on each commit to the + There is one preparer for each supported format of :term:`content repository`; current, + Sphinx and Jekyll. The preparer is executed by a CI/CD system on each commit to the repository. + submitter + Process responsible for traversing directories populated with :term:`metadata envelopes` and asset files and submitting them to the :term:`content service`. The submitter submits content and assets in bulk transactions and avoids submitting unchanged content. + content service Service that accepts submissions and queries for the most recent :term:`metadata envelope` - associated with a specific :term:`content ID`. Content submitted here will have its structure - validated and indexed. + associated with a specific :term:`content ID`. presenter - Accept HTTP requests from users. Map the requested :term:`presented URL` to :term:`content ID` - using the latest known version of the content mapping within the control repository, then access the requested :term:`metadata envelope` using the :term:`content service`. Inject the envelope into an appropriate :term:`template` and send the final HTML back in an HTTP response. + Accepts HTTP requests from users. Maps the requested :term:`presented URL` to a :term:`content ID` using the latest known version of the content mapping within the control repository, then accesses the requested :term:`metadata envelope` using the :term:`content service`. Injects the envelope into an appropriate :term:`template` and send the final HTML back in an HTTP response. nginx Reverse proxy that accepts requests from off of the host, terminates TLS, and delegates to the local :term:`presenter` and :term:`content service`. strider - A continuous integration server integrated with Deconst to provide on-cluster preparer runs. + A continuous integration server integrated with Deconst to provide on-cluster preparer and submitter runs. Lifecycle of an HTTP Request ---------------------------- @@ -85,13 +81,16 @@ When a change is merged into the live branch of the :term:`control repository`: #. Once all assets have been published, the preparer sends the latest git commit SHA of the control repository to the :term:`content service`, where it's stored in MongoDB. #. Each entry within the ``content-repositories.json`` file is checked against the list of :term:`strider` builds. If any new entries have been added, a content build is created and configured with a newly issued API key. #. During each request, each :term:`presenter` queries its linked :term:`content service` for the active control repository SHA. If it doesn't match last-loaded control repository SHA, the presenter triggers an asynchronous update. -#. If successful, the new content and template mappings, redirects, and templates will be atomically installed. Otherwise, the presenter will log an error with the details and wait for further changes before attempting to reload. +#. If successful, the new content and template mappings, redirects, and templates are atomically installed. Otherwise, the presenter logs an error with the details and waits for further changes before attempting to reload. Lifecycle of a Content Repository Update ---------------------------------------- When a change is merged into the live branch of a :term:`content repository`: -#. A Strider build scans the latest commit of the repository for directories containing ``_deconst.json`` files and executes the appropriate :term:`preparer` within a new Docker container that's given the context of each one. -#. The preparer generates a :term:`metadata envelope` for each page that would be rendered, assigns it a :term:`content ID` using a configured base ID, and submits it to the :term:`content service`. -#. Each static resource (images, mostly) are submitted to the :term:`content service` and published to the CDN as non-global assets. The response includes the CDN URL, which is then used within the generated envelopes. +#. A Strider build scans the latest commit of the repository for directories containing ``_deconst.json`` files and executes the appropriate :term:`preparer` within a Docker container that's given each context. +#. The preparer copies each referenced asset to an asset output directory within the shared workspace container. The offset of the asset reference is saved in an "asset_offsets" map. +#. The preparer generates a :term:`metadata envelope` for each page that would be rendered, assigns it a :term:`content ID` using a configured base ID, and writes it to the envelope output directory. +#. The submitter queries the :term:`content service` with the SHA-256 fingerprints of each asset in the asset directory. If any assets are missing or have changed, the submitter bulk-uploads them to the :term:`content service` API. If more than 30MB of assets need to be uploaded, assets are uploaded in batches of just over 30MB to avoid overwhelming the upload process. +#. The submitter inserts the public CDN URLs of each asset into the body of each metadata envelope at the recorded offsets and removes the "asset_offsets" key. +#. The submitter queries the content service with the SHA-256 fingerprint of a stable (key-sorted) representation of each envelope. Any envelopes that have been changed are bulk-uploaded to the content service. diff --git a/developing/envelope.rst b/developing/envelope.rst index 8d8b1d9..feeeefc 100644 --- a/developing/envelope.rst +++ b/developing/envelope.rst @@ -32,6 +32,10 @@ This is an example envelope that demonstrates the full document structure, inclu "previous": { "title": "The previous article", "url": "/blog/previous-article" + }, + "asset_offsets": { + "local/path/image.jpg": [23, 1456], + "other/asset.gif": [451] } } @@ -96,6 +100,9 @@ This is an example envelope that demonstrates the full document structure, inclu If the ``url`` key is absolute (rooted at the document root, like ``/blog/other-post``), the presenter will re-root it based on the current mapping of the content repository. If it's relative, it will be left as-is. + asset_offsets + This key must only be present in the intermediate representation used to communicate between a preparer and the submitter. Its keys are local paths to asset files relative to the asset directory. Each value is an array of character offsets into ``body`` that should be replaced by the full, public URL of the asset. + The documents retrieved from the content store consist of the requested envelope and a number of additional attributes that are derived and injected at retrieval time. The full content document looks like this: .. code-block:: json diff --git a/developing/preparer.rst b/developing/preparer.rst index 7c2ea68..3cb72e7 100644 --- a/developing/preparer.rst +++ b/developing/preparer.rst @@ -7,32 +7,23 @@ If you want to include content from a new :term:`content repository` format, you #. Parse the markup language, configuration files, and other metadata for some content format. When possible, you should use the format's native libraries and tooling to do so. #. Parse the ``_deconst.json`` file. Consult the :ref:`new content repository section ` for its schema. -#. Submit assets (usually images) to the :term:`content service` API's `/asset endpoint `_. Omit the `?named=true` parameter; it's used by the control repository's preparer. The response payload maps each uploaded asset to a final URL that the preparer should remember. -#. Use the markup to produce rendered HTML. The preparer should use the final asset URLs provided before. +#. Copy assets (usually images) to the directory specified by the environment variable ``ASSET_DIR``. It's best to preserve as much of the local directory structure as possible from the source repository, unless two assets in different subdirectories have the same filename. +#. Use the markup to produce rendered HTML. The preparer should use a single-character placeholder for each asset URL. As it does so, it should generate a map that associates the path of each asset relative to ``ASSET_DIR`` to a collection of character offsets within the body text at which that asset is referenced. As a rule, the rendered HTML *should omit any layouts* from the content repository itself and only render the page content, unadorned. In Deconst, templates will be applied :ref:`later, from the control repository `. This is important to ensure a consistent look and feel across many content repositories published to the same site, as well as allowing users to take advantage of presenter-implemented features like :ref:`search `. -#. Assemble the content into one or more :term:`metadata envelopes` that match the :ref:`envelope schema `. -#. Submit each prepared envelope to the :term:`content service` API's `/content endpoint `_. - -Each HTTP request sent to the content service should be accompanied by an ``Authorization`` header containing a valid API key: - -.. code-block:: text - - PUT /content/https%3A%2F%2Fgithub.com%2Fsomeuser%2Fsomerepo%2Fsomeid - Authorization: deconst apikey="12345" +#. Assemble the content into one or more :term:`metadata envelopes` that match the :ref:`envelope schema `. If any assets were referenced, include the asset offset map as the ``asset_offsets`` element. Write each completed envelope to the directory specified by the environment variable ``ENVELOPE_DIR`` as a file with the filename pattern ``.json``. Docker Container Protocol ------------------------- -If you're running your preparer in an independent environment (like a non-Deconst continuous integration server), anything that implements the process above will work fine. If you want your preparer to work within the Deconst client or to be available to :ref:`automatically created Strider builds `, you'll need to package your preparer in a Docker container image that obeys the container protocol described here. +If you run your preparer in an independent environment (like a non-Deconst continuous integration server), anything that implements the process above will work fine. If you want your preparer to work within the Deconst client or to be available to :ref:`automatically created Strider builds `, you need to package your preparer in a Docker container image that obeys the container protocol described here. Deconst preparer containers should respect the following configuration values: -* ``CONTENT_STORE_URL``: The base URL of the :term:`content service`, with a trailing slash. For example: ``"https://deconst.horse:9000/"``. -* ``CONTENT_STORE_APIKEY``: A valid API key for the content service. -* ``CONTENT_STORE_TLSVERIFY``: If set to ``"false"``, TLS certificate validity should not be checked for content store connections. **Never use this option in production,** as it potentially allows your connection to be subjected to a `man-in-the-middle attack `_. -* ``CONTENT_ID_BASE``: If set, this should *override* the content ID base specified in ``_deconst.json`` for this preparation run, preferably with some kind of message if they differ. -* ``CONTENT_ROOT``: If specified, the preparer should prepare content mounted to a volume at this path within the container. Otherwise, it should default to preparing ``/usr/content-repo``. +* ``ASSET_DIR``: The preparer must copy assets to this directory tree. +* ``ENVELOPE_DIR``: The preparer must write completed envelopes to this directory. +* ``CONTENT_ID_BASE``: *(optional)* If set, this should *override* the content ID base specified in ``_deconst.json`` for this preparation run, preferably with some kind of message if they differ. +* ``CONTENT_ROOT``: *(optional)* If specified, the preparer should prepare content mounted to a volume at this path within the container. Otherwise, it should default to preparing ``/usr/content-repo``. When run with no arguments, the preparer container should prepare the content as described above, then exit with an exit status of 0 if preparation was successful, or nonzero if it was not. diff --git a/writing-docs/author/index.rst b/writing-docs/author/index.rst index 6b31789..2213347 100644 --- a/writing-docs/author/index.rst +++ b/writing-docs/author/index.rst @@ -99,9 +99,7 @@ And you're currently mapped to the ``books/example/`` subpath of *mysite.com* by As you work, you can freely create new pages and directories and they will automatically be available within that subpath. -.. warning:: - - Currently, *deleting* pages doesn't actually remove the content from deconst. An administrator needs to remove documents from Cloud Files manually to delete content. +Content that you delete is also automatically deleted from the site. Be careful! When you rename or delete content, you may break users' existing bookmarks or links from other sites. Consider copying the content to its new path, creating a redirect, then deleting it from its old path to avoid disrupting the site's user experience. Content mapping is determined by :ref:`content mapping configuration files ` within the control repository. Open an issue on the control repository to discuss the addition of new content, or modify the content mapping files yourself in a pull request if you're also a site coordinator.