Skip to content

Commit

Permalink
[doc][connector][hive] Add hive chinese doc (#8169)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhangshenghang authored Nov 29, 2024
1 parent df2fee0 commit ea4234a
Show file tree
Hide file tree
Showing 4 changed files with 949 additions and 37 deletions.
74 changes: 50 additions & 24 deletions docs/en/connector-v2/sink/Hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Write data to Hive.

:::tip

In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9.
In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9 and 3.1.3 .

If you use SeaTunnel Engine, You need put seatunnel-hadoop3-3.1.4-uber.jar and hive-exec-3.1.3.jar and libfb303-0.9.3.jar in $SEATUNNEL_HOME/lib/ dir.
:::
Expand Down Expand Up @@ -204,6 +204,55 @@ Description:
- `kerberos_keytab_path`: The keytab file path for Kerberos authentication.
- `krb5_path`: The path to the `krb5.conf` file used for Kerberos authentication.

Run the case:

```bash
env {
parallelism = 1
job.mode = "BATCH"
}

source {
FakeSource {
schema = {
fields {
pk_id = bigint
name = string
score = int
}
primaryKey {
name = "pk_id"
columnNames = [pk_id]
}
}
rows = [
{
kind = INSERT
fields = [1, "A", 100]
},
{
kind = INSERT
fields = [2, "B", 100]
},
{
kind = INSERT
fields = [3, "C", 100]
}
]
}
}

sink {
Hive {
table_name = "default.test_hive_sink_on_hdfs_with_kerberos"
metastore_uri = "thrift://metastore:9083"
hive_site_path = "/tmp/hive-site.xml"
kerberos_principal = "hive/[email protected]"
kerberos_keytab_path = "/tmp/hive.keytab"
krb5_path = "/tmp/krb5.conf"
}
}
```

## Hive on s3

Expand Down Expand Up @@ -418,26 +467,3 @@ sink {
}
}
```

## Changelog

### 2.2.0-beta 2022-09-26

- Add Hive Sink Connector

### 2.3.0-beta 2022-10-20

- [Improve] Hive Sink supports automatic partition repair ([3133](https://github.com/apache/seatunnel/pull/3133))

### 2.3.0 2022-12-30

- [BugFix] Fixed the following bugs that failed to write data to files ([3258](https://github.com/apache/seatunnel/pull/3258))
- When field from upstream is null it will throw NullPointerException
- Sink columns mapping failed
- When restore writer from states getting transaction directly failed

### Next version

- [Improve] Support kerberos authentication ([3840](https://github.com/apache/seatunnel/pull/3840))
- [Improve] Added partition_dir_expression validation logic ([3886](https://github.com/apache/seatunnel/pull/3886))

79 changes: 66 additions & 13 deletions docs/en/connector-v2/source/Hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Read data from Hive.

:::tip

In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9.
In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9 and 3.1.3 .

If you use SeaTunnel Engine, You need put seatunnel-hadoop3-3.1.4-uber.jar and hive-exec-3.1.3.jar and libfb303-0.9.3.jar in $SEATUNNEL_HOME/lib/ dir.
:::
Expand Down Expand Up @@ -180,6 +180,71 @@ Description:
- `kerberos_keytab_path`: The keytab file path for Kerberos authentication.
- `krb5_path`: The path to the `krb5.conf` file used for Kerberos authentication.

Run the case:

```bash
env {
parallelism = 1
job.mode = "BATCH"
}

source {
Hive {
table_name = "default.test_hive_sink_on_hdfs_with_kerberos"
metastore_uri = "thrift://metastore:9083"
hive.hadoop.conf-path = "/tmp/hadoop"
result_table_name = hive_source
hive_site_path = "/tmp/hive-site.xml"
kerberos_principal = "hive/[email protected]"
kerberos_keytab_path = "/tmp/hive.keytab"
krb5_path = "/tmp/krb5.conf"
}
}

sink {
Assert {
source_table_name = hive_source
rules {
row_rules = [
{
rule_type = MAX_ROW
rule_value = 3
}
],
field_rules = [
{
field_name = pk_id
field_type = bigint
field_value = [
{
rule_type = NOT_NULL
}
]
},
{
field_name = name
field_type = string
field_value = [
{
rule_type = NOT_NULL
}
]
},
{
field_name = score
field_type = int
field_value = [
{
rule_type = NOT_NULL
}
]
}
]
}
}
}
```

## Hive on s3

### Step 1
Expand Down Expand Up @@ -307,15 +372,3 @@ sink {
}
}
```

## Changelog

### 2.2.0-beta 2022-09-26

- Add Hive Source Connector

### Next version

- [Improve] Support kerberos authentication ([3840](https://github.com/apache/seatunnel/pull/3840))
- Support user-defined partitions ([3842](https://github.com/apache/seatunnel/pull/3842))

Loading

0 comments on commit ea4234a

Please sign in to comment.