feat(alias extraction): Add alias-extraction.md

emqx · Apr 22, 2024 · c72bcf3 · c72bcf3
1 parent 0787f37
commit c72bcf3
Show file tree

Hide file tree

Showing 6 changed files with 266 additions and 2 deletions.
diff --git a/dir.yaml b/dir.yaml
@@ -576,7 +576,7 @@
           path: gateway/gbt32960
           edition: ee
         #- gateway/tcp
-
+    - alias-extraction/alias-extraction 
 
 - title_en: Tutorials
   title_cn: 实用教程

diff --git a/en_US/access-control/authn/authn.md b/en_US/access-control/authn/authn.md
@@ -180,6 +180,7 @@ EMQX currently supports the following placeholders:
 - `${peerhost}`: It will be replaced with the client's IP address at runtime. EMQX supports [Proxy Protocol](http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt), that is, even if EMQX is deployed behind some TCP proxy or load balancer, users can still use this placeholder to get the real IP address.
 - `${cert_subject}`: It will be replaced by the subject of the client's TLS certificate at runtime, only applicable to TLS connections.
 - `${cert_common_name}`: It will be replaced by the Common Name of the client's TLS certificate at runtime,  only applicable to TLS connections.
+- `${client_attrs.{NAME}}`: It will be replaced by the custom attribute names extracted from the client metadata at runtime. For details about the attribute extraction, see [MQTT Client Attribute Extraction](../../alias-extraction/alias-extraction.md).
 
 ## Configure Authenticators
 

diff --git a/en_US/access-control/authn/jwt.md b/en_US/access-control/authn/jwt.md
@@ -146,7 +146,7 @@ Follow the instruction below on how to configure:
 2. If configured as `public-key`, indicating that JWT uses the private key to generate the signature, and needs to use the public key to verify the signature (supports RS256, RS384, RS512, ES256, ES384 and ES512 algorithms), we also need to configure:
    - `Public Key`: specifying the public key in PEM format used to verify the signature
 
-**Payload**: Specify additional claims checks that the user wants to perform. Users can define multiple key-value pairs with the **Claim** and **Expacted Value** fields, where the key is used to find the corresponding claim in the JWT, so it needs to have the same name as the JWT claim to be checked, and the value is used to compare with the actual value of the claim. Currently the placeholders supported are `${clientid}` and `${username}`, 
+**Payload**: Specify additional claims checks that the user wants to perform. Users can define multiple key-value pairs with the **Claim** and **Expacted Value** fields, where the key is used to find the corresponding claim in the JWT, so it needs to have the same name as the JWT claim to be checked, and the value is used to compare with the actual value of the claim. Currently the placeholders supported are `${clientid}` and `${username}`.
 
 EMQX also supports periodically obtaining the latest JWKS from the JWKS endpoint, which is essentially a set of public keys that will be used to verify any JWT issued by the authorization server and signed using the RSA or ECDSA algorithm. If we want to use this feature, we first need to switch to the **JWKS** configuration page.
 

diff --git a/en_US/alias-extraction/alias-extraction.md b/en_US/alias-extraction/alias-extraction.md
@@ -0,0 +1,132 @@
+# MQTT Client Attribute Extraction
+
+Apart from using predefined names like `clientid` and `username` as MQTT client identifiers, the client attribute extraction feature allows setting custom attributes upon connection for use in functions such as authentication and authorization. It is designed to support flexible templating for MQTT client identification by extracting client attribute values from various client metadata sources. This feature is particularly useful in contexts such as multi-tenancy, personalized client configurations, and streamlined authentication processes.
+
+## Introduction
+
+When a client connects to EMQX, its client attributes are initialized. During initialization, EMQX extracts attributes based on predefined rules set in the `mqtt.client_attrs_init` configuration. For example, it can extract a substring from the clientID, username, or client certificate common name. This extraction happens before the authentication process, ensuring that the attributes are ready to be used in subsequent steps, such as used in the HTTP request body template or SQL template for composing authentication and authorization requests.
+
+Once initialized, client attributes are stored in a field called `client_attrs` within the client's session or connection context. This `client_attrs` info field acts like a dictionary or a map, holding the attributes as key-value pairs. This field is maintained in memory associated with the client's session for quick access during the client's connection lifecycle.
+
+### Extract Client Attributes
+
+This section describes from which client attributes EMQX extracts and how attributes are extracted.
+
+#### Sources of Client Metadata and Attributes
+
+In EMQX, client metadata and attributes originate from various sources and are stored within specific system properties for use throughout the client's connection lifecycle. Here's a breakdown of where these existing client information come from and where they are stored:
+
+- **MQTT CONNECT Packet**: When a client connects to EMQX, it sends a CONNECT packet that includes several pieces of information, such as `clientid`, `username`, `password`, and `user properties`.
+- **TLS Certificates**: If the client connects using TLS, the client's TLS certificate can provide additional metadata, such as:
+  - `cn` (Common Name): Part of the certificate that can identify the device or user.
+  - `dn` (Distinguished Name): The full subject field within the certificate that includes several descriptive fields about the certificate owner.
+  - Server Name Indication (SNI): Currently used as multi-tenancy tenant id.
+- **IP Connection Data**: This includes the client’s IP address and port number, which are automatically captured by EMQX when a client connects.
+
+#### Extraction Expressions
+
+The extraction process uses variform expressions that allow function calls and variable references to define how attributes should be extracted and dynamically process the data. However, the expressions are not fully programmable and support only predefined functions and variables.
+
+##### Syntax
+
+`function_call(clientid, another_function_call(username))` can be used to combine or manipulate client data. The configuration example is as follows:
+
+```bash
+mqtt {
+    client_attrs_init = [{expression = "conat([clientid, username])"}]
+}
+```
+
+##### Pre-bound Variables
+
+Pre-bound variables can be directly used in the extraction expressions. A set of variables are pre-bound, including:
+
+- `cn`: Client certificate common name.
+- `dn`: Client certificate distinguish name (Subject).
+- `clientid`
+- `username`
+- `user_property`: The user properties provided in the MQTT v5 CONNECT packet sent by the client.
+- `ip_address`: The source IP of the client.
+- `port`: The source port number of the client
+- `zone`: The zone name
+
+##### Pre-defined Functions
+
+EMQX includes a rich set of string, array, random, and hashing functions similar to those available in rule engine string functions. These functions can be used to manipulate and format the extracted data. For instance, `lower()`, `upper()`, and `concat()` help in adjusting the format of extracted strings, while `hash()` and `hash_to_range()` allow for creating hashed or ranged outputs based on the data.
+
+Below are the functions that can be used in the expressions:
+
+- **String functions**: 
+  - [String Operation Functions](../data-integration/rule-sql-builtin-functions.md#string-operation-functions)
+  - A new function any_to_string/1 is also added to convert any intermediate non-string value to a string.
+- **Array functions**: [nth/2](../data-integration/rule-sql-builtin-functions.md#nth-n-integer-array-array-any)
+- **Random functions**: rand_str, rand_int
+- **Schema-less encode/decode functions**:
+  - [bin2hexstr/1](../data-integration/rule-sql-builtin-functions.md#bin2hexstr-data-binary-string)
+  - [hexstr2bin/1](../data-integration/rule-sql-builtin-functions.md#hexstr2bin-data-string-binary)
+  - [base64_decode/1](../data-integration/rule-sql-builtin-functions.md#base64-decode-data-string-bytes-string)
+  - [base64_encode/1](../data-integration/rule-sql-builtin-functions.md#base64-encode-data-string-bytes-string)
+  - int2hexstr/1
+- **Hash functions**:
+  - hash(Algorihtm, Data), where  algorithm can be one of: md4 | md5, sha (or sha1) | sha224 | sha256 | sha384 | sha512 | sha3_224 | sha3_256 | sha3_384 | sha3_512 | shake128 | shake256 | blake2b | blake2s
+  - hash_to_range(Input, Min, Max): Use sha256 to hash the Input data and map the hash to an integer between Min and Max inclusive ( Min =< X =< Max)
+  - map_to_rage(Input, Min, Max): Map the input to an integer between Min and Max inclusive (Min =< X =< Max)
+
+##### Example Expressions
+
+ `nth(1, tokens(clientid, '.'))`:  Extract the prefix of a dot-separated client ID.
+
+`strlen(username, 0, 5)`: Extract a partial username.
+
+### Merge Authentication Data
+
+EMQX can also merge attributes from different sources such as JSON Web Token (JWT) claims or HTTP authentication responses into the client's attributes. 
+
+- **JWT claims**: If JWTs are used for authentication, they can include `client_attrs` claims that carry additional metadata about the client, such as roles, permissions, or other identifiers.
+- **HTTP Authentication Responses**: If EMQX is configured to use an external HTTP service for authentication, the response from this service might include additional attributes or metadata about the client, which can be configured to be captured and stored within EMQX. For example, if an HTTP response includes `"client_attrs": {"group": "g1"}`, EMQX will incorporate this data into the client's existing attributes, which can then be utilized in authorization requests.
+
+## Application of Client Attributes
+
+The extracted and merged attributes can be used in constructing authentication and authorization requests. The `client_attrs.{NAME}` can be used in authentication and authorization template rendering. If an attribute named `client_attrs.alias` is defined, it can be incorporated into an HTTP request body or SQL query, enhancing the flexibility and specificity of these requests.
+
+For example, for an attribute named `client_attrs.alias`, you can use `${client_attrs.alais}` to build an HTTP request body as an HTTP authentication request.  For more details, see [Authentication Placeholders](../access-control/authn/authn.md#authentication-placeholders).
+
+Other applications include: <!-- Need some descriptions about how it is used -->
+
+- Multi-tenancy tenant ID (more flexible tenant ID assignment)
+- Per client mountpoint
+- Simple match for Authentication (e.g. GOCSP wanted to compare certificate CN with clientid prefix)
+- Data field in ACL rules or requests
+
+## Configure Attribute Extraction
+
+You can configure the attribute extraction feature through the configuration file or Dashboard.
+
+### Configure Attribute Extraction via Configuration File
+
+Configuration example:
+
+```
+<!-- code example -->
+```
+
+Explanations of the configuration items:
+
+
+
+{% emqxce %}
+
+For detailed information about the configuration, see [Configuration Manual](https://www.emqx.io/docs/en/v@CE_VERSION@/hocon/).
+
+{% endemqxce %}
+
+{% emqxee %}
+
+For detailed information about the configuration, see [Configuration Manual](https://docs.emqx.com/en/enterprise/v@EE_VERSION@/hocon/).
+
+{% endemqxee %}
+
+### Configure Attribute Extraction via Dashboard
+
+<!-- Add description after Frontend dev. Complete -->
+
diff --git a/zh_CN/access-control/authn/authn.md b/zh_CN/access-control/authn/authn.md
@@ -170,6 +170,8 @@ SELECT password_hash, salt FROM mqtt_user where username = 'emqx_u' LIMIT 1
 
 - `${cert_common_name}`: 将在运行时被替换为客户端 TLS 证书的通用名称（Common Name），仅适用于 TLS 连接。
 
+- `${client_attrs.{NAME}}`：将在运行时被从客户端元数据中提取的自定义属性名称所替换。有关属性提取的详细信息，请参见[MQTT 客户端属性提取](../../alias-extraction/alias-extraction.md)。
+
 ## 认证配置方式
 
 EMQX 提供了 3 种使用认证的配置方式，分别为：Dashboard、配置文件和 HTTP API。

diff --git a/zh_CN/alias-extraction/alias-extraction.md b/zh_CN/alias-extraction/alias-extraction.md
@@ -0,0 +1,129 @@
+# MQTT 客户端属性提取
+
+在 EMQX 中，用户除了可以使用预定义的名称，如 `clientid` 和 `username` 作为 MQTT 客户端标识符外，还可以通过客户端属性提取功能在客户端连接时设置自定义属性，并将其用于认证和授权等功能。这一功能旨在通过从客户端元数据的各个来源提取客户端属性值，支持灵活模板化的 MQTT 客户端识别。这一功能在多租户环境、个性化客户端配置和简化的认证过程等场景中尤为有用。
+
+## 功能介绍
+
+当客户端连接到 EMQX 时，其客户端属性会被初始化。在初始化期间，EMQX 根据在 `mqtt.client_attrs_init` 配置中设置的预定义规则提取属性。例如，它可以从客户端 ID、用户名或客户证书的常用名称中提取子串。这种提取在认证过程之前发生，确保属性在后续步骤中能够被使用，如在 HTTP 请求体模板或 SQL 模板中用于组成认证和授权请求时。
+
+一旦初始化，客户端属性就存储在一个称为 `client_attrs` 的字段中，该字段位于客户端的会话或连接上下文中。`client_attrs` 信息字段保存以键值对形式的属性。这个字段保持在与客户端会话相关的内存中，以便在客户端连接生命周期中快速访问。
+
+### 提取客户端属性
+
+本节描述了 EMQX 从哪些客户端属性中提取以及如何提取属性。
+
+#### 客户端元数据和属性的来源
+
+在 EMQX 中，客户端元数据和属性有各种来源，并存储在特定的系统属性中，以便在客户端连接生命周期中使用。以下是这些现有客户端信息的来源和存储位置：
+
+- **MQTT 连接报文**：当客户端连接到 EMQX 时，它发送一个包含多个信息片段的连接报文，如 `clientid`、`username`、`password` 和 `user properties`。
+- **TLS 证书**：如果客户端使用 TLS 连接，客户的 TLS 证书可以提供额外的元数据，例如：
+  - `cn`（常用名称）：证书的一部分，可以识别设备或用户。
+  - `dn`（区分名称）：证书中包含关于证书持有者的几个描述性字段的完整主题字段。
+  - 服务器名称指示（SNI）：当前用作多租户租户 ID。
+- **IP 连接数据**：包括客户端的 IP 地址和端口号，这些都是客户端连接时 EMQX 自动捕获的。
+
+#### 属性表达式
+
+提取过程使用允许函数调用和变量引用的多样形式表达式来定义如何提取属性，并动态处理数据。然而，这些表达式不是完全可编程的，只支持预定义的函数和变量。
+
+##### 语法
+
+可以使用 `function_call(clientid, another_function_call(username))` 来组合或操作客户端数据。配置示例如下：
+
+```bash
+mqtt {
+    client_attrs_init = [{expression = "concat([clientid, username])"}]
+}
+```
+
+##### 预绑定变量
+
+预绑定变量可以直接在提取表达式中使用。包括以下预绑定变量：
+
+- `cn`：客户证书常用名称。
+- `dn`：客户证书区分名称（主题）。
+- `clientid`
+- `username`
+- `user_property`：客户在 MQTT v5 连接包中提供的用户属性。
+- `ip_address`：客户端的源 IP 地址。
+- `port`：客户端的源端口号。
+- `zone`：区域名称
+
+##### 预定义函数
+
+EMQX 包含一系列丰富的字符串、数组、随机和散列函数，类似于规则引擎字符串函数中可用的那些。这些函数可以用来操作和格式化提取的数据。例如，`lower()`、`upper()` 和 `concat()` 可以帮助调整提取字符串的格式，而 `hash()` 和 `hash_to_range()` 可以基于数据创建散列或范围输出。
+
+以下是可以在表达式中使用的函数：
+
+- **字符串函数**：
+  - [字符串操作函数](../data-integration/rule-sql-builtin-functions.md#string-operation-functions)
+  - 还添加了一个新函数 any_to_string/1，用于将任何中间非字符串值转换为字符串。
+- **数组函数**：[nth/2](../data-integration/rule-sql-builtin-functions.md#nth-n-integer-array-array-any)
+- **随机函数**：rand_str, rand_int
+- **无模式编码/解码函数**：
+  - [bin2hexstr/1](../data-integration/rule-sql-builtin-functions.md#bin2hexstr-data-binary-string)
+  - [hexstr2bin/1](../data-integration/rule-sql-builtin-functions.md#hexstr2bin-data-string-binary)
+  - [base64_decode/1](../data-integration/rule-sql-builtin-functions.md#base64-decode-data-string-bytes-string)
+  - [base64_encode/1](../data-integration/rule-sql-builtin-functions.md#base64-encode-data-string-bytes-string)
+  - int2hexstr/1
+- **散列函数**：
+  - hash(算法, 数据)，其中算法可以是以下之一：md4 | md5, sha (或 sha1) | sha224 | sha256 | sha384 | sha512 | sha3_224 | sha3_256 | sha3_384 | sha3_512 | shake128 | shake256 | blake2b | blake2s
+  - hash_to_range(输入, 最小值, 最大值)：使用 sha256 散列输入数据，并将散列映射到最小值和最大值之间的整数（包括最小值和最大值）。
+  - map_to_rage(输入, 最小值, 最大值)：将输入映射到最小值和最大值之间的整数（包括最小值和最大值）。
+
+##### 示例表达式
+
+`nth(1, tokens(clientid, '.'))`：提取以点分隔的客户端ID的前缀。
+
+`strlen(username, 0, 5)`: 提取部分用户名。
+
+### 合并认证数据
+
+EMQX 还可以将来自不同来源的属性合并到客户的属性中，如 JSON Web Token (JWT) 声明或 HTTP 认证响应。
+
+- **JWT 声明**：如果使用 JWT 进行认证，它们可以包含 `client_attrs` 声明，携带有关客户的额外元数据，如角色、权限或其他标识符。
+- **HTTP 认证响应**：如果 EMQX 配置为使用外部 HTTP 服务进行认证，此服务的响应可能包含有关客户的额外属性或元数据，可以配置为在 EMQX 中捕获并存储。例如，如果 HTTP 响应包括 `"client_attrs": {"group": "g1"}`，EMQX 将把这些数据合并到客户现有的属性中，然后可以在授权请求中使用这些属性。
+
+## 客户端属性的应用
+
+提取和合并的属性可以用于构建认证和授权请求。 `client_attrs.{NAME}` 可以用于认证和授权模板渲染。如果定义了名为 `client_attrs.alias` 的属性，可以将其合并到 HTTP 请求体或 SQL 查询中，增强这些请求的灵活性和特异性。
+
+例如，对于名为 `client_attrs.alias` 的属性，您可以使用 `${client_attrs.alias}` 来构建作为HTTP认证请求的HTTP请求体。有关更多详情，请参见[认证占位符](../access-control/authn/authn.md#authentication-placeholders)。
+
+其他应用包括： <!-- 需要一些描述它是如何被使用的 -->
+
+- 多租户租户 ID（更灵活的租户 ID 分配）
+- 每个客户的挂载点
+- 简单匹配认证（例如，GOCSP 希望将证书 CN 与客户端 ID 前缀进行比较）
+- ACL 规则或请求中的数据字段
+
+## 配置属性提取
+
+您可以通过配置文件或仪表板配置属性提取功能。
+
+### 通过配置文件配置属性提取
+
+配置示例：
+
+```
+<!-- 代码示例 -->
+```
+
+配置项解释：
+
+{% emqxce %}
+
+有关配置的详细信息，请参见[配置手册](https://www.emqx.io/docs/en/v@CE_VERSION@/hocon/)。
+
+{% endemqxce %}
+
+{% emqxee %}
+
+有关配置的详细信息，请参见[配置手册](https://docs.emqx.com/en/enterprise/v@EE_VERSION@/hocon/)。
+
+{% endemqxee %}
+
+### 通过 Dashboard 配置属性提取
+
+<!-- 在前端开发完成后添加描述 -->