diff --git a/docs/images/graph_schema.svg b/docs/images/graph_schema.svg
new file mode 100644
index 000000000..c42c5b78a
--- /dev/null
+++ b/docs/images/graph_schema.svg
@@ -0,0 +1,410 @@
+
+
diff --git a/docs/images/graph_simple.svg b/docs/images/graph_simple.svg
new file mode 100644
index 000000000..464975f7e
--- /dev/null
+++ b/docs/images/graph_simple.svg
@@ -0,0 +1,369 @@
+
+
diff --git a/docs/requirements-doc.txt b/docs/requirements-doc.txt
index e5229f453..bf03f3dcb 100644
--- a/docs/requirements-doc.txt
+++ b/docs/requirements-doc.txt
@@ -18,5 +18,4 @@ mergedeep<1.4
docker<5.1
PyGithub<1.56
semver>=2,<3
-pytest<7.3
-coverage<6.6
+
diff --git a/docs/source/about-dag.md b/docs/source/about-dag.md
new file mode 100644
index 000000000..5778ed4a9
--- /dev/null
+++ b/docs/source/about-dag.md
@@ -0,0 +1,84 @@
+# About the Merlin Directed Acyclic Graph
+
+```{contents}
+---
+depth: 2
+local: true
+backlinks: none
+---
+```
+
+Merlin uses a directed acyclic graph (DAG) to represent operations on data such as filtering or bucketing and to represent operations in a recommender system such as creating an ensemble or filtering candidate items during inference.
+
+Understanding the Merlin DAG is helpful if you want to develop your own operator (Op) or building a recommender system with Merlin.
+
+## Graph Terminology
+
+node
+: A node in the DAG is a group of columns and at least one _operator_.
+ The columns are specified with a _column selector_.
+ A node has an _input schema_ and an _output schema_.
+ Resolution of the schemas is delayed until you run `fit` or `transform` on a dataset.
+
+column selector
+: A column selector specifies the columns to select from a dataset using column names or _tags_.
+
+operator
+: An operator performs a transformation on data and return a new _node_.
+ The data is identified by the _column selector_.
+ Some simple operators like `+` and `-` add or remove columns.
+ More complex operations are applied by shifting the operators onto the column selector with the `>>` notation.
+
+schema
+: A Merlin schema is metadata that describes the columns in a dataset.
+ Each column has its own schema that identifies the column name and can specify _tags_ and properties.
+
+tag
+: A Merlin tag categorizes information about a column.
+ Adding a tag to a column enables you to select columns for operations by tag rather than name.
+
+ For example, you can add the `USER` and `ITEM` tags to columns.
+ Modeling and inference operations can use that information to act accordingly on the dataset.
+
+## Understanding Operators, Columns, Nodes, and Schema
+
+Merlin enables you to chain together Operators with the `>>` syntax to create feature-processing workflows.
+The `>>` syntax means "take the output of the left-hand side and feed it into the input of the right-hand side."
+
+You can specify an explicit list of columns to run an Operator on just the specified columns.
+The following code block shows the syntax for explicit column names:
+
+```python
+result = ["col1", "col2",] >> SomeOperator(...)
+```
+
+Or, you can use the `>>` syntax between Operators to run one Operator on all the output columns from the preceding Operator:
+
+```python
+result = AnOperator(...) >> OtherOperator(...)
+```
+
+Chaining Operators together builds a graph.
+The following figure shows how each node in the graph has an Operator.
+
+![A directed graph with two nodes. The first node is a Selection Operator and selects columns "col1" and "col2." The second node receives the two columns as its input. The second node has a fictional SomeOperator Operator.](../images/graph_simple.svg)
+
+Each node in a graph has an input schema and an output schema that describe the columns that go into an Operator and the columns that go out of an Operator.
+The following figure represents an Operator that adds `colB` to a dataset.
+
+![Part of a directed graph that shows the input schema to a fictional SomeOperator Operator as "colA". The fictional Operator adds adds "colB" and the result is an output schema with "colA" and "colB."](../images/graph_schema.svg)
+
+In practice, the workflow does not know which columns are processed or produced immediately when Merlin builds the graph.
+This is for two reasons:
+
+1. Merlin enables you to build graphs that process categories of columns.
+ The categories are specified by _tags_ instead of an explicit list of column names.
+
+ For example, you can select the continuous columns from your dataset with code like the following example:
+
+ ```python
+ [Tags.CONTINUOUS] >> Operator(...)
+ ```
+
+1. You can chain Operators together into a graph, such as an NVTabular workflow, before you specify a dataset.
+ The graph, Operators, and schema do not know which columns will be selected by tag until the software accesses the dataset and determines the column names.
\ No newline at end of file
diff --git a/docs/source/about-model-blocks.md b/docs/source/about-model-blocks.md
new file mode 100644
index 000000000..f8850fce7
--- /dev/null
+++ b/docs/source/about-model-blocks.md
@@ -0,0 +1,3 @@
+# About Merlin Model Blocks
+
+FIXME
\ No newline at end of file
diff --git a/docs/source/about-operators.md b/docs/source/about-operators.md
new file mode 100644
index 000000000..9a25e221e
--- /dev/null
+++ b/docs/source/about-operators.md
@@ -0,0 +1,85 @@
+# About Merlin Operators
+
+```{contents}
+---
+depth: 2
+local: true
+backlinks: none
+---
+```
+
+## Understanding Operators
+
+Merlin uses Operators to perform computation on datasets such as normalizing continuous variables, bucketing continuous variables, clipping variables between minimum and maximum values, and so on.
+
+An Operator implements two key methods:
+
+Fit
+: The `fit` method performs any pre-computation steps that are required before operating on data.
+
+ For example, the `Normalize` Operator normalizes the values of a continuous column between 0 and 1.
+ The `fit` method determines the minimum and maximum values.
+
+ The method is optional.
+ For example, the `Bucketize` and `Clip` Operators do not implement the method because you specify the bucket boundaries or the minimum and maximum values for clipping.
+ These Operators do not need to access the data to perform any pre-computation steps.
+
+Transform
+: The `transform` method operates on the dataset such as normalizing values, bucketing, or clipping.
+
+Another difference between the two methods is that the `fit` method accepts a Merlin dataset object and the `transform` method accepts a DataFrame object.
+The difference is an implementation detail---the `fit` method must access all the data and the `transform` method processes each part of the dataset one at a time.
+
+```python
+# Typical signature of a fit method.
+def fit(
+ self,
+ selector: ColumnSelector,
+ dataset: Dataset
+) -> Any
+
+# Typical signature of a transform method.
+def transform(
+ self,
+ selector: ColumnSelector,
+ df: DataFrame
+) -> DataFrame
+```
+
+## Operators and Columns: Column Selector
+
+In most cases, you want an Operator to process a subset of the columns in your input dataset.
+Both the `fit` and `transform` methods have a `selector` argument that specifies the columns to operate on.
+Merlin uses a `ColumnSelector` class to represent the columns.
+
+The simplest column selector is a list of strings that specify some column names.
+In the following sample code, `["col1", "col2"]` become an instance of a `ColumnSelector` class.
+
+```python
+result = ["col1", "col2"] >> SomeOperator(...)
+```
+
+Column selectors also offer a more powerful and flexible way to specify columns.
+You can specify the input columns to an Operator with tags.
+In the following sample code, the Operator processes all the continuous variables in a dataset.
+
+```python
+result = [Tags.CONTINUOUS] >> SomeOperator(...)
+```
+
+Using tags to create a column selector offers the following advantages:
+
+- Enables you to apply several Operators to the same kind of columns, such as categorical or continuous variables.
+- Reduces code maintenance by enabling your code to automatically operate on newly added columns in a dataset.
+- Simplifies code by avoiding lists of strings for column names.
+
+## How to Build an Operator
+
+Blah.
+
+## Reference Documentation
+
+- {py:class}`merlin.dag.BaseOperator`
+- {py:class}`merlin.dag.ColumnSelector`
+- {py:class}`merlin.schema.Tags`
+- {py:class}`merlin.io.DataSet`
\ No newline at end of file
diff --git a/docs/source/about-schema.md b/docs/source/about-schema.md
new file mode 100644
index 000000000..aac112ae9
--- /dev/null
+++ b/docs/source/about-schema.md
@@ -0,0 +1,3 @@
+# About the Merlin Schema
+
+FIXME
\ No newline at end of file
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 779dc587d..a51bd04eb 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -118,6 +118,14 @@
autosummary_generate = True
+intersphinx_mapping = {
+ "python": ("https://docs.python.org/3", None),
+ "merlin-core": ("https://nvidia-merlin.github.io/core/main", None),
+ "merlin-systems": ("https://nvidia-merlin.github.io/systems/main", None),
+ "merlin-models": ("https://nvidia-merlin.github.io/models/main", None),
+ "NVTabular": ("https://nvidia-merlin.github.io/NVTabular/main", None),
+}
+
copydirs_additional_dirs = ["../../examples/", "../../README.md"]
copydirs_file_rename = {
diff --git a/docs/source/technical-concepts.md b/docs/source/technical-concepts.md
new file mode 100644
index 000000000..290d1da7f
--- /dev/null
+++ b/docs/source/technical-concepts.md
@@ -0,0 +1,4 @@
+# Merlin Technical Concepts
+
+The following pages provide a deeper technical understanding of Merlin concepts.
+These concepts can help you to develop your own operator to implement a more sophisticated recommender system.
\ No newline at end of file
diff --git a/docs/source/toc.yaml b/docs/source/toc.yaml
index aa384075a..bcc2d1949 100644
--- a/docs/source/toc.yaml
+++ b/docs/source/toc.yaml
@@ -46,5 +46,13 @@ subtrees:
title: Deploy the HugeCTR Model with Triton
- file: examples/scaling-criteo/04-Triton-Inference-with-Merlin-Models-TensorFlow.ipynb
title: Deploy the TensorFlow Model with Triton
+ - title: Merlin Technical Concepts
+ file: technical-concepts.md
+ entries:
+ - file: about-dag.md
+ title: Graph Concepts
+ - file: about-schema.md
+ - file: about-operators.md
+ - file: about-model-blocks.md
- file: containers.rst
- file: support_matrix/index.rst
\ No newline at end of file
diff --git a/requirements/docs.txt b/requirements/docs.txt
new file mode 100644
index 000000000..e5229f453
--- /dev/null
+++ b/requirements/docs.txt
@@ -0,0 +1,22 @@
+# docs
+ipython==8.2.0
+Sphinx==3.5.4
+jinja2<3.1
+markupsafe==2.0.1
+natsort==8.1.0
+sphinx_rtd_theme
+sphinx_markdown_tables
+sphinx-multiversion@git+https://github.com/mikemckiernan/sphinx-multiversion.git@v0.3.0
+sphinxcontrib-copydirs@git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@v0.3.3
+sphinx-external-toc<0.4
+myst-nb
+linkify-it-py
+Markdown==3.3.7
+
+# smx
+mergedeep<1.4
+docker<5.1
+PyGithub<1.56
+semver>=2,<3
+pytest<7.3
+coverage<6.6
diff --git a/tox.ini b/tox.ini
index 80406dc4f..5a76a130c 100644
--- a/tox.ini
+++ b/tox.ini
@@ -36,14 +36,14 @@ commands =
; Generates documentation with sphinx. There are other steps in the Github Actions workflow
; to publish the documentation on release.
changedir = {toxinidir}
-deps = -rrequirements/docs.txt
+deps = -r requirements/docs.txt
commands =
- python -m sphinx.cmd.build -P -b html docs/source docs/build/html
+ python -m sphinx.cmd.build -P -b {posargs:html} docs/source docs/build/{posargs:html}
[testenv:docs-multi]
; Run the multi-version build that is shown on GitHub Pages.
changedir = {toxinidir}
-deps = -rrequirements/docs.txt
+deps = -r requirements/docs.txt
commands =
sphinx-multiversion --dump-metadata docs/source docs/build/html | jq "keys"
sphinx-multiversion docs/source docs/build/html