Skip to content

Commit

Permalink
Feat/update docs (#349)
Browse files Browse the repository at this point in the history
* Update doc dependency

* Update docs content

* Update docs content
  • Loading branch information
NicoLaval authored Jul 10, 2024
1 parent cbcf12a commit 78b670c
Show file tree
Hide file tree
Showing 19 changed files with 672 additions and 428 deletions.
121 changes: 5 additions & 116 deletions docs/blog/2024-06-25-trevas-sdmx.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ tags: [Trevas, SDMX]

import useBaseUrl from '@docusaurus/useBaseUrl';
import ThemedImage from '@theme/ThemedImage';
import Link from '@theme/Link';

### News

Expand All @@ -28,128 +29,16 @@ It also allows to execute the VTL TransformationSchemes to obtain the resulting
/>
</div>

Trevas supports the above SDMX message elements. Only the VtlMappingSchemes attribute is optional.
Trevas supports the above SDMX message elements. Only the VtlMappingSchemes element is optional.

The elements in box 1 are used to produce Trevas DataStructures, filling VTL components attributes name, role, type, nullable and valuedomain.

The elements in box 2 are used to generate the VTL code (rulesets & transformations).

#### Tools available

#### `buildStructureFromSDMX3` utility
SDMX Trevas tools are documented <Link label={"here"} href={useBaseUrl('/developer-guide/spark-mode/data-sources/sdmx')} />.

`TrevasSDMXUtils.buildStructureFromSDMX3` allows to obtain a Trevas DataStructure.
#### Troubleshooting

Providing corresponding data, you can build a Trevas Dataset.

```java
Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "STRUCT_ID");

SparkDataset ds = new SparkDataset(
spark.read()
.option("header", "true")
.option("delimiter", ";")
.option("quote", "\"")
.csv("path"),
structure
);
```

#### `SDMXVTLWorkflow` object

The `SDMXVTLWorkflow` constructor takes 3 arguments:

- a `ScriptEngine` (Trevas or another)
- a `ReadableDataLocation` to handle an SDMX message
- a map of names / Datasets

```java
SparkSession.builder()
.appName("test")
.master("local")
.getOrCreate();

ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");

ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");

SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
```

This object then allows you to activate the following 3 functions.

#### SDMXVTLWorkflow `run` function - Preview mode

The `run` function can easily be called in a preview mode, without attached data.

```java
ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");

ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");

SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());

// instead of using TrevasSDMXUtils.buildStructureFromSDMX3 and data sources
// to build Trevas Datasets, sdmxVtlWorkflow.getEmptyDatasets()
// will handle SDMX message structures to produce Trevas Datasets
// with metadata defined in this message, and adding empty data
Map<String, Dataset> emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets();
engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets);

Map<String, PersistentDataset> result = sdmxVtlWorkflow.run();
```

The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets.

#### SDMXVTLWorkflow `run` function

Once an `SDMXVTLWorkflow` is built, it is easy to run the VTL validations and transformations defined in the SDMX file.

```java
Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1");

SparkDataset ds1 = new SparkDataset(
spark.read()
.option("header", "true")
.option("delimiter", ";")
.option("quote", "\"")
.csv("path/data.csv"),
structure
);

ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");

Map<String, Dataset> inputs = Map.of("ds1", ds1);

ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml");

SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs);

Map<String, PersistentDataset> bindings = sdmxVtlWorkflow.run();
```

As a result, one will receive all the dataset defined as persistent in the `TransformationSchemes` definition.

#### SDMXVTLWorkflow `getTransformationsVTL` function

Gets the VTL code corresponding to the SDMX TransformationSchemes definition.

```java
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
String vtl = sdmxVtlWorkflow.getTransformationsVTL();
```

#### SDMXVTLWorkflow `getRulesetsVTL` function

Gets the VTL code corresponding to the SDMX TransformationSchemes definition.

```java
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
String dprs = sdmxVtlWorkflow.getRulesetsVTL();
```
Have a look to <Link label={"this section"} href={useBaseUrl('/developer-guide/spark-mode/data-sources/sdmx#troubleshooting')} />.
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ It is thus strongly recommended to use this format.
</div>
</div>
<div className="row">
<div className="col">
<Card
title="SDMX"
page={useBaseUrl('/developer-guide/spark-mode/data-sources/sdmx')}
/>
</div>
<div className="col">
<Card
title="Others"
Expand Down
156 changes: 156 additions & 0 deletions docs/docs/developer-guide/spark-mode/data-sources/sdmx.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
---
id: sdmx
title: Spark mode - SDMX source
sidebar_label: SDMX
slug: /developer-guide/spark-mode/data-sources/sdmx
custom_edit_url: null
---

`vtl-sdmx` module exposes the following utilities.

### `buildStructureFromSDMX3` utility

`TrevasSDMXUtils.buildStructureFromSDMX3` allows to obtain a Trevas DataStructure.

Providing corresponding data, you can build a Trevas Dataset.

```java
Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "STRUCT_ID");

SparkDataset ds = new SparkDataset(
spark.read()
.option("header", "true")
.option("delimiter", ";")
.option("quote", "\"")
.csv("path"),
structure
);
```

### `SDMXVTLWorkflow` object

The `SDMXVTLWorkflow` constructor takes 3 arguments:

- a `ScriptEngine` (Trevas or another)
- a `ReadableDataLocation` to handle an SDMX message
- a map of names / Datasets

```java
SparkSession.builder()
.appName("test")
.master("local")
.getOrCreate();

ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");

ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");

SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
```

This object then allows you to activate the following 3 functions.

### SDMXVTLWorkflow `run` function - Preview mode

The `run` function can easily be called in a preview mode, without attached data.

```java
ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");

ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");

SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());

// instead of using TrevasSDMXUtils.buildStructureFromSDMX3 and data sources
// to build Trevas Datasets, sdmxVtlWorkflow.getEmptyDatasets()
// will handle SDMX message structures to produce Trevas Datasets
// with metadata defined in this message, and adding empty data
Map<String, Dataset> emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets();
engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets);

Map<String, PersistentDataset> result = sdmxVtlWorkflow.run();
```

The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets.

### SDMXVTLWorkflow `run` function

Once an `SDMXVTLWorkflow` is built, it is easy to run the VTL validations and transformations defined in the SDMX file.

```java
Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1");

SparkDataset ds1 = new SparkDataset(
spark.read()
.option("header", "true")
.option("delimiter", ";")
.option("quote", "\"")
.csv("path/data.csv"),
structure
);

ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine engine = mgr.getEngineByExtension("vtl");
engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");

Map<String, Dataset> inputs = Map.of("ds1", ds1);

ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml");

SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs);

Map<String, PersistentDataset> bindings = sdmxVtlWorkflow.run();
```

As a result, one will receive all the datasets defined as persistent in the `TransformationSchemes` definition.

### SDMXVTLWorkflow `getTransformationsVTL` function

Gets the VTL code corresponding to the SDMX TransformationSchemes definition.

```java
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
String vtl = sdmxVtlWorkflow.getTransformationsVTL();
```

### SDMXVTLWorkflow `getRulesetsVTL` function

Gets the VTL code corresponding to the SDMX TransformationSchemes definition.

```java
SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
String dprs = sdmxVtlWorkflow.getRulesetsVTL();
```

## Troubleshooting

### Hadoop client

The integration of `vtl-modules` with `hadoop-client` can cause dependency issues.

It was noted that `com.fasterxml.woodstox.woodstox-core` is imported by `hadoop-client`, with an incompatible version for a `vtl-sdmx` sub-dependency.

A way to fix this is to exclude `com.fasterxml.woodstox.woodstox-core` dependency from `hadoop-client` and import a newest version in your `pom.xml`:

```xml
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.3.4</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.woodstox</groupId>
<artifactId>woodstox-core</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.fasterxml.woodstox</groupId>
<artifactId>woodstox-core</artifactId>
<version>6.5.1</version>
</dependency>
```
Loading

0 comments on commit 78b670c

Please sign in to comment.