-
Notifications
You must be signed in to change notification settings - Fork 119
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add information about the Merlin DAG
Define the important terms of the DAG.
- Loading branch information
1 parent
980e297
commit 5470be3
Showing
3 changed files
with
79 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# About the Merlin Directed Acyclic Graph | ||
|
||
Merlin uses a directed acyclic graph (DAG) to represent operations on data such as filtering or bucketing and to represent operations in a recommender system such as creating an ensemble or filtering candidate items during inference. | ||
|
||
Understanding the Merlin DAG is helpful if you want to develop your own operator (Op) or building a recommender system with Merlin. | ||
|
||
## DAG Terminology | ||
|
||
node | ||
: A node in the DAG is a group of columns and at least one _operator_. | ||
The columns are specified with a _column selector_. | ||
A node has an _input schema_ and an _output schema_. | ||
Resolution of the schemas is delayed until you run `fit` or `transform` on a dataset. | ||
|
||
column selector | ||
: A column selector specifies the columns to select from a dataset using column names or _tags_. | ||
|
||
operator | ||
: An operator performs a transformation on data and return a new _node_. | ||
The data is identified by the _column selector_. | ||
Some simple operators like `+` and `-` add or remove columns. | ||
More complex operations are applied by shifting the operators onto the column selector with the `>>` notation. | ||
|
||
schema | ||
: A Merlin schema is metadata that describes the columns in a dataset. | ||
Each column has its own schema that identifies the column name and can specify _tags_ and properties. | ||
|
||
tag | ||
: A Merlin tag categorizes information about a column. | ||
Adding a tag to a column enables you to select columns for operations by tag rather than name. | ||
|
||
For example, you can add the `USER` and `ITEM` tags to columns. | ||
Modeling and inference operations can use that information to act accordingly on the dataset. | ||
|
||
## Syntax and Sample Code | ||
|
||
The following code block shows the typical syntax for building a workflow that operates on DAG components. | ||
|
||
```{rubric} Syntax | ||
``` | ||
|
||
```python | ||
result = [column_selector, ...] >> op1 >> op2 >> ...; | ||
``` | ||
|
||
Starting with the `column_selector`, the brackets group one or more column selectors that identify columns in the input data. | ||
|
||
The `op1` and `op2` represent operators. | ||
When an operator performs its operation on the input data, the operator returns a node. | ||
|
||
The `result` object is the graph. | ||
It contains the sequence of operations to perform. | ||
|
||
```{rubric} Sample Code | ||
``` | ||
|
||
```python | ||
item_features = ( | ||
["item_category", "item_shop", "item_brand"] >> Categorify(dtype="int32") >> TagAsItemFeatures() | ||
) | ||
``` | ||
|
||
In the sample code, the column selector is created by specifying the item-related column names. | ||
|
||
The {py:class}`~nvtabular.ops.Categorify` operator transforms the categorical features into unique integer values, adds the {py:attr}`~merlin.schema.Tags.CATEGORICAL` tag, and returns a node. | ||
|
||
The {py:class}`~nvtabular.ops.TagAsItemFeatures` operator applies the {py:attr}`~merlin.schema.Tags.ITEM` tag and returns a node. | ||
|
||
When the `item_features` variable is included in a transformation and applied to input data, it will traverse the nodes in order and apply the data transformation and tagging. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters