Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the new preparer workflow. #239

Merged
merged 7 commits into from
Apr 27, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 16 additions & 17 deletions developing/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Access to Strider is managed by membership in a GitHub organization or in teams

Strider is prepopulated with a build for the instance's control repository that preprocesses and submits site-wide assets to the content service, and automatically creates new content builds based on a list in a configuration file.

The asset preparer process and any content build preparer processes are run in isolated Docker containers, sharing a workspace with Strider by a data volume container.
The asset preparer, content preparer, and submitter processes are run in isolated Docker containers, sharing a workspace with Strider by a data volume container.

Components
----------
Expand All @@ -39,29 +39,25 @@ Components
:term:`metadata envelopes`, each of which contains one page of rendered HTML and associated
metadata.

If the current branch is live, the generated envelopes are then submitted to the
:term:`content service` for storage and indexing. Otherwise, a local :term:`presenter` is
invoked to complete a full build of this subtree of the final site, which is then published to
CDN and linked on the pull request.

There will be one preparer for each supported format of :term:`content repository`; initially,
Sphinx and Jekyll. The preparer will be executed by a CI/CD system on each commit to the
There is one preparer for each supported format of :term:`content repository`; current,
Sphinx and Jekyll. The preparer is executed by a CI/CD system on each commit to the
repository.

submitter
Process responsible for traversing directories populated with :term:`metadata envelopes` and asset files and submitting them to the :term:`content service`. The submitter submits content and assets in bulk transactions and avoids submitting unchanged content.

content service
Service that accepts submissions and queries for the most recent :term:`metadata envelope`
associated with a specific :term:`content ID`. Content submitted here will have its structure
validated and indexed.
associated with a specific :term:`content ID`.

presenter
Accept HTTP requests from users. Map the requested :term:`presented URL` to :term:`content ID`
using the latest known version of the content mapping within the control repository, then access the requested :term:`metadata envelope` using the :term:`content service`. Inject the envelope into an appropriate :term:`template` and send the final HTML back in an HTTP response.
Accepts HTTP requests from users. Maps the requested :term:`presented URL` to a :term:`content ID` using the latest known version of the content mapping within the control repository, then accesses the requested :term:`metadata envelope` using the :term:`content service`. Injects the envelope into an appropriate :term:`template` and send the final HTML back in an HTTP response.

nginx
Reverse proxy that accepts requests from off of the host, terminates TLS, and delegates to the local :term:`presenter` and :term:`content service`.

strider
A continuous integration server integrated with Deconst to provide on-cluster preparer runs.
A continuous integration server integrated with Deconst to provide on-cluster preparer and submitter runs.

Lifecycle of an HTTP Request
----------------------------
Expand All @@ -85,13 +81,16 @@ When a change is merged into the live branch of the :term:`control repository`:
#. Once all assets have been published, the preparer sends the latest git commit SHA of the control repository to the :term:`content service`, where it's stored in MongoDB.
#. Each entry within the ``content-repositories.json`` file is checked against the list of :term:`strider` builds. If any new entries have been added, a content build is created and configured with a newly issued API key.
#. During each request, each :term:`presenter` queries its linked :term:`content service` for the active control repository SHA. If it doesn't match last-loaded control repository SHA, the presenter triggers an asynchronous update.
#. If successful, the new content and template mappings, redirects, and templates will be atomically installed. Otherwise, the presenter will log an error with the details and wait for further changes before attempting to reload.
#. If successful, the new content and template mappings, redirects, and templates are atomically installed. Otherwise, the presenter logs an error with the details and waits for further changes before attempting to reload.

Lifecycle of a Content Repository Update
----------------------------------------

When a change is merged into the live branch of a :term:`content repository`:

#. A Strider build scans the latest commit of the repository for directories containing ``_deconst.json`` files and executes the appropriate :term:`preparer` within a new Docker container that's given the context of each one.
#. The preparer generates a :term:`metadata envelope` for each page that would be rendered, assigns it a :term:`content ID` using a configured base ID, and submits it to the :term:`content service`.
#. Each static resource (images, mostly) are submitted to the :term:`content service` and published to the CDN as non-global assets. The response includes the CDN URL, which is then used within the generated envelopes.
#. A Strider build scans the latest commit of the repository for directories containing ``_deconst.json`` files and executes the appropriate :term:`preparer` within a Docker container that's given each context.
#. The preparer copies each referenced asset to an asset output directory within the shared workspace container. The offset of the asset reference is saved in an "asset_offsets" map.
#. The preparer generates a :term:`metadata envelope` for each page that would be rendered, assigns it a :term:`content ID` using a configured base ID, and writes it to the envelope output directory.
#. The submitter queries the :term:`content service` with the SHA-256 fingerprints of each asset in the asset directory. If any assets are missing or have changed, the submitter bulk-uploads them to the :term:`content service` API. If more than 30MB of assets need to be uploaded, assets are uploaded in batches of just over 30MB to avoid overwhelming the upload process.
#. The submitter inserts the public CDN URLs of each asset into the body of each metadata envelope at the recorded offsets and removes the "asset_offsets" key.
#. The submitter queries the content service with the SHA-256 fingerprint of a stable (key-sorted) representation of each envelope. Any envelopes that have been changed are bulk-uploaded to the content service.
7 changes: 7 additions & 0 deletions developing/envelope.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ This is an example envelope that demonstrates the full document structure, inclu
"previous": {
"title": "The previous article",
"url": "/blog/previous-article"
},
"asset_offsets": {
"local/path/image.jpg": [23, 1456],
"other/asset.gif": [451]
}
}

Expand Down Expand Up @@ -96,6 +100,9 @@ This is an example envelope that demonstrates the full document structure, inclu

If the ``url`` key is absolute (rooted at the document root, like ``/blog/other-post``), the presenter will re-root it based on the current mapping of the content repository. If it's relative, it will be left as-is.

asset_offsets
This key must only be present in the intermediate representation used to communicate between a preparer and the submitter. Its keys are local paths to asset files relative to the asset directory. Each value is an array of character offsets into ``body`` that should be replaced by the full, public URL of the asset.

The documents retrieved from the content store consist of the requested envelope and a number of additional attributes that are derived and injected at retrieval time. The full content document looks like this:

.. code-block:: json
Expand Down
25 changes: 8 additions & 17 deletions developing/preparer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,23 @@ If you want to include content from a new :term:`content repository` format, you

#. Parse the markup language, configuration files, and other metadata for some content format. When possible, you should use the format's native libraries and tooling to do so.
#. Parse the ``_deconst.json`` file. Consult the :ref:`new content repository section <adding-new-content-repository>` for its schema.
#. Submit assets (usually images) to the :term:`content service` API's `/asset endpoint <https://github.com/deconst/content-service#post-assetnamedtrue>`_. Omit the `?named=true` parameter; it's used by the control repository's preparer. The response payload maps each uploaded asset to a final URL that the preparer should remember.
#. Use the markup to produce rendered HTML. The preparer should use the final asset URLs provided before.
#. Copy assets (usually images) to the directory specified by the environment variable ``ASSET_DIR``. It's best to preserve as much of the local directory structure as possible from the source repository, unless two assets in different subdirectories have the same filename.
#. Use the markup to produce rendered HTML. The preparer should use a single-character placeholder for each asset URL. As it does so, it should generate a map that associates the path of each asset relative to ``ASSET_DIR`` to a collection of character offsets within the body text at which that asset is referenced.

As a rule, the rendered HTML *should omit any layouts* from the content repository itself and only render the page content, unadorned. In Deconst, templates will be applied :ref:`later, from the control repository <control-template>`. This is important to ensure a consistent look and feel across many content repositories published to the same site, as well as allowing users to take advantage of presenter-implemented features like :ref:`search <control-search>`.

#. Assemble the content into one or more :term:`metadata envelopes` that match the :ref:`envelope schema <envelope-schema>`.
#. Submit each prepared envelope to the :term:`content service` API's `/content endpoint <https://github.com/deconst/content-service#put-contentid>`_.

Each HTTP request sent to the content service should be accompanied by an ``Authorization`` header containing a valid API key:

.. code-block:: text

PUT /content/https%3A%2F%2Fgithub.com%2Fsomeuser%2Fsomerepo%2Fsomeid
Authorization: deconst apikey="12345"
#. Assemble the content into one or more :term:`metadata envelopes` that match the :ref:`envelope schema <envelope-schema>`. If any assets were referenced, include the asset offset map as the ``asset_offsets`` element. Write each completed envelope to the directory specified by the environment variable ``ENVELOPE_DIR`` as a file with the filename pattern ``<content ID, URL-encoded>.json``.

Docker Container Protocol
-------------------------

If you're running your preparer in an independent environment (like a non-Deconst continuous integration server), anything that implements the process above will work fine. If you want your preparer to work within the Deconst client or to be available to :ref:`automatically created Strider builds <adding-new-content-repository>`, you'll need to package your preparer in a Docker container image that obeys the container protocol described here.
If you run your preparer in an independent environment (like a non-Deconst continuous integration server), anything that implements the process above will work fine. If you want your preparer to work within the Deconst client or to be available to :ref:`automatically created Strider builds <adding-new-content-repository>`, you need to package your preparer in a Docker container image that obeys the container protocol described here.

Deconst preparer containers should respect the following configuration values:

* ``CONTENT_STORE_URL``: The base URL of the :term:`content service`, with a trailing slash. For example: ``"https://deconst.horse:9000/"``.
* ``CONTENT_STORE_APIKEY``: A valid API key for the content service.
* ``CONTENT_STORE_TLSVERIFY``: If set to ``"false"``, TLS certificate validity should not be checked for content store connections. **Never use this option in production,** as it potentially allows your connection to be subjected to a `man-in-the-middle attack <https://en.wikipedia.org/wiki/Man-in-the-middle_attack>`_.
* ``CONTENT_ID_BASE``: If set, this should *override* the content ID base specified in ``_deconst.json`` for this preparation run, preferably with some kind of message if they differ.
* ``CONTENT_ROOT``: If specified, the preparer should prepare content mounted to a volume at this path within the container. Otherwise, it should default to preparing ``/usr/content-repo``.
* ``ASSET_DIR``: The preparer must copy assets to this directory tree.
* ``ENVELOPE_DIR``: The preparer must write completed envelopes to this directory.
* ``CONTENT_ID_BASE``: *(optional)* If set, this should *override* the content ID base specified in ``_deconst.json`` for this preparation run, preferably with some kind of message if they differ.
* ``CONTENT_ROOT``: *(optional)* If specified, the preparer should prepare content mounted to a volume at this path within the container. Otherwise, it should default to preparing ``/usr/content-repo``.

When run with no arguments, the preparer container should prepare the content as described above, then exit with an exit status of 0 if preparation was successful, or nonzero if it was not.
4 changes: 1 addition & 3 deletions writing-docs/author/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,7 @@ And you're currently mapped to the ``books/example/`` subpath of *mysite.com* by

As you work, you can freely create new pages and directories and they will automatically be available within that subpath.

.. warning::

Currently, *deleting* pages doesn't actually remove the content from deconst. An administrator needs to remove documents from Cloud Files manually to delete content.
Content that you delete is also automatically deleted from the site. Be careful! When you rename or delete content, you may break users' existing bookmarks or links from other sites. Consider copying the content to its new path, creating a redirect, then deleting it from its old path to avoid disrupting the site's user experience.

Content mapping is determined by :ref:`content mapping configuration files <control-map>` within the control repository. Open an issue on the control repository to discuss the addition of new content, or modify the content mapping files yourself in a pull request if you're also a site coordinator.

Expand Down