Feat/update docs (#349)

* Update doc dependency * Update docs content * Update docs content
InseeFr · Jul 10, 2024 · 78b670c · 78b670c
1 parent cbcf12a
commit 78b670c
Show file tree

Hide file tree

Showing 19 changed files with 672 additions and 428 deletions.
diff --git a/docs/blog/2024-06-25-trevas-sdmx.mdx b/docs/blog/2024-06-25-trevas-sdmx.mdx
@@ -7,6 +7,7 @@ tags: [Trevas, SDMX]
 
 import useBaseUrl from '@docusaurus/useBaseUrl';
 import ThemedImage from '@theme/ThemedImage';
+import Link from '@theme/Link';
 
 ### News
 
@@ -28,128 +29,16 @@ It also allows to execute the VTL TransformationSchemes to obtain the resulting
 	/>
 </div>
 
-Trevas supports the above SDMX message elements. Only the VtlMappingSchemes attribute is optional.
+Trevas supports the above SDMX message elements. Only the VtlMappingSchemes element is optional.
 
 The elements in box 1 are used to produce Trevas DataStructures, filling VTL components attributes name, role, type, nullable and valuedomain.
 
 The elements in box 2 are used to generate the VTL code (rulesets & transformations).
 
 #### Tools available
 
-#### `buildStructureFromSDMX3` utility
+SDMX Trevas tools are documented <Link label={"here"} href={useBaseUrl('/developer-guide/spark-mode/data-sources/sdmx')} />.
 
-`TrevasSDMXUtils.buildStructureFromSDMX3` allows to obtain a Trevas DataStructure.
+#### Troubleshooting
 
-Providing corresponding data, you can build a Trevas Dataset.
-
-```java
-Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "STRUCT_ID");
-
-SparkDataset ds = new SparkDataset(
-        spark.read()
-                .option("header", "true")
-                .option("delimiter", ";")
-                .option("quote", "\"")
-                .csv("path"),
-        structure
-);
-```
-
-#### `SDMXVTLWorkflow` object
-
-The `SDMXVTLWorkflow` constructor takes 3 arguments:
-
-- a `ScriptEngine` (Trevas or another)
-- a `ReadableDataLocation` to handle an SDMX message
-- a map of names / Datasets
-
-```java
-SparkSession.builder()
-                .appName("test")
-                .master("local")
-                .getOrCreate();
-
-ScriptEngineManager mgr = new ScriptEngineManager();
-ScriptEngine engine = mgr.getEngineByExtension("vtl");
-engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
-
-ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");
-
-SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
-```
-
-This object then allows you to activate the following 3 functions.
-
-#### SDMXVTLWorkflow `run` function - Preview mode
-
-The `run` function can easily be called in a preview mode, without attached data.
-
-```java
-ScriptEngineManager mgr = new ScriptEngineManager();
-ScriptEngine engine = mgr.getEngineByExtension("vtl");
-engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
-
-ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");
-
-SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
-
-// instead of using TrevasSDMXUtils.buildStructureFromSDMX3 and data sources
-// to build Trevas Datasets, sdmxVtlWorkflow.getEmptyDatasets()
-// will handle SDMX message structures to produce Trevas Datasets
-// with metadata defined in this message, and adding empty data
-Map<String, Dataset> emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets();
-engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets);
-
-Map<String, PersistentDataset> result = sdmxVtlWorkflow.run();
-```
-
-The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets.
-
-#### SDMXVTLWorkflow `run` function
-
-Once an `SDMXVTLWorkflow` is built, it is easy to run the VTL validations and transformations defined in the SDMX file.
-
-```java
-Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1");
-
-SparkDataset ds1 = new SparkDataset(
-        spark.read()
-                .option("header", "true")
-                .option("delimiter", ";")
-                .option("quote", "\"")
-                .csv("path/data.csv"),
-        structure
-);
-
-ScriptEngineManager mgr = new ScriptEngineManager();
-ScriptEngine engine = mgr.getEngineByExtension("vtl");
-engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
-
-Map<String, Dataset> inputs = Map.of("ds1", ds1);
-
-ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml");
-
-SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs);
-
-Map<String, PersistentDataset> bindings = sdmxVtlWorkflow.run();
-```
-
-As a result, one will receive all the dataset defined as persistent in the `TransformationSchemes` definition.
-
-#### SDMXVTLWorkflow `getTransformationsVTL` function
-
-Gets the VTL code corresponding to the SDMX TransformationSchemes definition.
-
-```java
-SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
-String vtl = sdmxVtlWorkflow.getTransformationsVTL();
-```
-
-#### SDMXVTLWorkflow `getRulesetsVTL` function
-
-Gets the VTL code corresponding to the SDMX TransformationSchemes definition.
-
-```java
-SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
-String dprs = sdmxVtlWorkflow.getRulesetsVTL();
-```
+Have a look to <Link label={"this section"} href={useBaseUrl('/developer-guide/spark-mode/data-sources/sdmx#troubleshooting')} />.
diff --git a/docs/docs/developer-guide/spark-mode/data-sources/index-data-sources.mdx b/docs/docs/developer-guide/spark-mode/data-sources/index-data-sources.mdx
@@ -39,6 +39,12 @@ It is thus strongly recommended to use this format.
 		</div>
 	</div>
 	<div className="row">
+		<div className="col">
+			<Card
+				title="SDMX"
+				page={useBaseUrl('/developer-guide/spark-mode/data-sources/sdmx')}
+			/>
+		</div>
 		<div className="col">
 			<Card
 				title="Others"

diff --git a/docs/docs/developer-guide/spark-mode/data-sources/sdmx.mdx b/docs/docs/developer-guide/spark-mode/data-sources/sdmx.mdx
@@ -0,0 +1,156 @@
+---
+id: sdmx
+title: Spark mode - SDMX source
+sidebar_label: SDMX
+slug: /developer-guide/spark-mode/data-sources/sdmx
+custom_edit_url: null
+---
+
+`vtl-sdmx` module exposes the following utilities.
+
+### `buildStructureFromSDMX3` utility
+
+`TrevasSDMXUtils.buildStructureFromSDMX3` allows to obtain a Trevas DataStructure.
+
+Providing corresponding data, you can build a Trevas Dataset.
+
+```java
+Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "STRUCT_ID");
+
+SparkDataset ds = new SparkDataset(
+        spark.read()
+                .option("header", "true")
+                .option("delimiter", ";")
+                .option("quote", "\"")
+                .csv("path"),
+        structure
+);
+```
+
+### `SDMXVTLWorkflow` object
+
+The `SDMXVTLWorkflow` constructor takes 3 arguments:
+
+- a `ScriptEngine` (Trevas or another)
+- a `ReadableDataLocation` to handle an SDMX message
+- a map of names / Datasets
+
+```java
+SparkSession.builder()
+                .appName("test")
+                .master("local")
+                .getOrCreate();
+
+ScriptEngineManager mgr = new ScriptEngineManager();
+ScriptEngine engine = mgr.getEngineByExtension("vtl");
+engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
+
+ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");
+
+SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
+```
+
+This object then allows you to activate the following 3 functions.
+
+### SDMXVTLWorkflow `run` function - Preview mode
+
+The `run` function can easily be called in a preview mode, without attached data.
+
+```java
+ScriptEngineManager mgr = new ScriptEngineManager();
+ScriptEngine engine = mgr.getEngineByExtension("vtl");
+engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
+
+ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml");
+
+SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
+
+// instead of using TrevasSDMXUtils.buildStructureFromSDMX3 and data sources
+// to build Trevas Datasets, sdmxVtlWorkflow.getEmptyDatasets()
+// will handle SDMX message structures to produce Trevas Datasets
+// with metadata defined in this message, and adding empty data
+Map<String, Dataset> emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets();
+engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets);
+
+Map<String, PersistentDataset> result = sdmxVtlWorkflow.run();
+```
+
+The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets.
+
+### SDMXVTLWorkflow `run` function
+
+Once an `SDMXVTLWorkflow` is built, it is easy to run the VTL validations and transformations defined in the SDMX file.
+
+```java
+Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1");
+
+SparkDataset ds1 = new SparkDataset(
+        spark.read()
+                .option("header", "true")
+                .option("delimiter", ";")
+                .option("quote", "\"")
+                .csv("path/data.csv"),
+        structure
+);
+
+ScriptEngineManager mgr = new ScriptEngineManager();
+ScriptEngine engine = mgr.getEngineByExtension("vtl");
+engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark");
+
+Map<String, Dataset> inputs = Map.of("ds1", ds1);
+
+ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml");
+
+SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs);
+
+Map<String, PersistentDataset> bindings = sdmxVtlWorkflow.run();
+```
+
+As a result, one will receive all the datasets defined as persistent in the `TransformationSchemes` definition.
+
+### SDMXVTLWorkflow `getTransformationsVTL` function
+
+Gets the VTL code corresponding to the SDMX TransformationSchemes definition.
+
+```java
+SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
+String vtl = sdmxVtlWorkflow.getTransformationsVTL();
+```
+
+### SDMXVTLWorkflow `getRulesetsVTL` function
+
+Gets the VTL code corresponding to the SDMX TransformationSchemes definition.
+
+```java
+SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of());
+String dprs = sdmxVtlWorkflow.getRulesetsVTL();
+```
+
+## Troubleshooting
+
+### Hadoop client
+
+The integration of `vtl-modules` with `hadoop-client` can cause dependency issues.
+
+It was noted that `com.fasterxml.woodstox.woodstox-core` is imported by `hadoop-client`, with an incompatible version for a `vtl-sdmx` sub-dependency.
+
+A way to fix this is to exclude `com.fasterxml.woodstox.woodstox-core` dependency from `hadoop-client` and import a newest version in your `pom.xml`:
+
+```xml
+<dependency>
+    <groupId>org.apache.hadoop</groupId>
+    <artifactId>hadoop-client</artifactId>
+    <version>3.3.4</version>
+    <exclusions>
+        <exclusion>
+            <groupId>com.fasterxml.woodstox</groupId>
+            <artifactId>woodstox-core</artifactId>
+        </exclusion>
+    </exclusions>
+</dependency>
+<dependency>
+    <groupId>com.fasterxml.woodstox</groupId>
+    <artifactId>woodstox-core</artifactId>
+    <version>6.5.1</version>
+</dependency>
+```