diff --git a/docs/blog/2024-06-25-trevas-sdmx.mdx b/docs/blog/2024-06-25-trevas-sdmx.mdx index 3c9e3d913..07902db4d 100644 --- a/docs/blog/2024-06-25-trevas-sdmx.mdx +++ b/docs/blog/2024-06-25-trevas-sdmx.mdx @@ -7,6 +7,7 @@ tags: [Trevas, SDMX] import useBaseUrl from '@docusaurus/useBaseUrl'; import ThemedImage from '@theme/ThemedImage'; +import Link from '@theme/Link'; ### News @@ -28,7 +29,7 @@ It also allows to execute the VTL TransformationSchemes to obtain the resulting /> -Trevas supports the above SDMX message elements. Only the VtlMappingSchemes attribute is optional. +Trevas supports the above SDMX message elements. Only the VtlMappingSchemes element is optional. The elements in box 1 are used to produce Trevas DataStructures, filling VTL components attributes name, role, type, nullable and valuedomain. @@ -36,120 +37,8 @@ The elements in box 2 are used to generate the VTL code (rulesets & transformati #### Tools available -#### `buildStructureFromSDMX3` utility +SDMX Trevas tools are documented . -`TrevasSDMXUtils.buildStructureFromSDMX3` allows to obtain a Trevas DataStructure. +#### Troubleshooting -Providing corresponding data, you can build a Trevas Dataset. - -```java -Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "STRUCT_ID"); - -SparkDataset ds = new SparkDataset( - spark.read() - .option("header", "true") - .option("delimiter", ";") - .option("quote", "\"") - .csv("path"), - structure -); -``` - -#### `SDMXVTLWorkflow` object - -The `SDMXVTLWorkflow` constructor takes 3 arguments: - -- a `ScriptEngine` (Trevas or another) -- a `ReadableDataLocation` to handle an SDMX message -- a map of names / Datasets - -```java -SparkSession.builder() - .appName("test") - .master("local") - .getOrCreate(); - -ScriptEngineManager mgr = new ScriptEngineManager(); -ScriptEngine engine = mgr.getEngineByExtension("vtl"); -engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); - -ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml"); - -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); -``` - -This object then allows you to activate the following 3 functions. - -#### SDMXVTLWorkflow `run` function - Preview mode - -The `run` function can easily be called in a preview mode, without attached data. - -```java -ScriptEngineManager mgr = new ScriptEngineManager(); -ScriptEngine engine = mgr.getEngineByExtension("vtl"); -engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); - -ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml"); - -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); - -// instead of using TrevasSDMXUtils.buildStructureFromSDMX3 and data sources -// to build Trevas Datasets, sdmxVtlWorkflow.getEmptyDatasets() -// will handle SDMX message structures to produce Trevas Datasets -// with metadata defined in this message, and adding empty data -Map emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets(); -engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets); - -Map result = sdmxVtlWorkflow.run(); -``` - -The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets. - -#### SDMXVTLWorkflow `run` function - -Once an `SDMXVTLWorkflow` is built, it is easy to run the VTL validations and transformations defined in the SDMX file. - -```java -Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1"); - -SparkDataset ds1 = new SparkDataset( - spark.read() - .option("header", "true") - .option("delimiter", ";") - .option("quote", "\"") - .csv("path/data.csv"), - structure -); - -ScriptEngineManager mgr = new ScriptEngineManager(); -ScriptEngine engine = mgr.getEngineByExtension("vtl"); -engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); - -Map inputs = Map.of("ds1", ds1); - -ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml"); - -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs); - -Map bindings = sdmxVtlWorkflow.run(); -``` - -As a result, one will receive all the dataset defined as persistent in the `TransformationSchemes` definition. - -#### SDMXVTLWorkflow `getTransformationsVTL` function - -Gets the VTL code corresponding to the SDMX TransformationSchemes definition. - -```java -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); -String vtl = sdmxVtlWorkflow.getTransformationsVTL(); -``` - -#### SDMXVTLWorkflow `getRulesetsVTL` function - -Gets the VTL code corresponding to the SDMX TransformationSchemes definition. - -```java -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); -String dprs = sdmxVtlWorkflow.getRulesetsVTL(); -``` +Have a look to . diff --git a/docs/docs/developer-guide/spark-mode/data-sources/index-data-sources.mdx b/docs/docs/developer-guide/spark-mode/data-sources/index-data-sources.mdx index fc6bd913b..af624072f 100644 --- a/docs/docs/developer-guide/spark-mode/data-sources/index-data-sources.mdx +++ b/docs/docs/developer-guide/spark-mode/data-sources/index-data-sources.mdx @@ -39,6 +39,12 @@ It is thus strongly recommended to use this format.
+
+ +
emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets(); +engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets); + +Map result = sdmxVtlWorkflow.run(); +``` + +The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets. + +### SDMXVTLWorkflow `run` function + +Once an `SDMXVTLWorkflow` is built, it is easy to run the VTL validations and transformations defined in the SDMX file. + +```java +Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1"); + +SparkDataset ds1 = new SparkDataset( + spark.read() + .option("header", "true") + .option("delimiter", ";") + .option("quote", "\"") + .csv("path/data.csv"), + structure +); + +ScriptEngineManager mgr = new ScriptEngineManager(); +ScriptEngine engine = mgr.getEngineByExtension("vtl"); +engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); + +Map inputs = Map.of("ds1", ds1); + +ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml"); + +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs); + +Map bindings = sdmxVtlWorkflow.run(); +``` + +As a result, one will receive all the datasets defined as persistent in the `TransformationSchemes` definition. + +### SDMXVTLWorkflow `getTransformationsVTL` function + +Gets the VTL code corresponding to the SDMX TransformationSchemes definition. + +```java +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); +String vtl = sdmxVtlWorkflow.getTransformationsVTL(); +``` + +### SDMXVTLWorkflow `getRulesetsVTL` function + +Gets the VTL code corresponding to the SDMX TransformationSchemes definition. + +```java +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); +String dprs = sdmxVtlWorkflow.getRulesetsVTL(); +``` + +## Troubleshooting + +### Hadoop client + +The integration of `vtl-modules` with `hadoop-client` can cause dependency issues. + +It was noted that `com.fasterxml.woodstox.woodstox-core` is imported by `hadoop-client`, with an incompatible version for a `vtl-sdmx` sub-dependency. + +A way to fix this is to exclude `com.fasterxml.woodstox.woodstox-core` dependency from `hadoop-client` and import a newest version in your `pom.xml`: + +```xml + + org.apache.hadoop + hadoop-client + 3.3.4 + + + com.fasterxml.woodstox + woodstox-core + + + + + com.fasterxml.woodstox + woodstox-core + 6.5.1 + +``` diff --git a/docs/i18n/fr/docusaurus-plugin-content-blog/2024-06-25-trevas-sdmx.mdx b/docs/i18n/fr/docusaurus-plugin-content-blog/2024-06-25-trevas-sdmx.mdx index 612be29d3..3168a3191 100644 --- a/docs/i18n/fr/docusaurus-plugin-content-blog/2024-06-25-trevas-sdmx.mdx +++ b/docs/i18n/fr/docusaurus-plugin-content-blog/2024-06-25-trevas-sdmx.mdx @@ -7,6 +7,7 @@ tags: [Trevas, SDMX] import useBaseUrl from '@docusaurus/useBaseUrl'; import ThemedImage from '@theme/ThemedImage'; +import Link from '@theme/Link'; ### Nouveautés @@ -28,128 +29,16 @@ Il permet également d'exécuter les VTL TransformationSchemes pour obtenir les />
-Trevas prend en charge les éléments de message SDMX ci-dessus. Seul l'attribut VtlMappingSchemes est facultatif. +Trevas prend en charge les éléments de message SDMX ci-dessus. Seul l'élément VtlMappingSchemes est facultatif. Les éléments de la case 1 sont utilisés pour produire des Trevas DataStructures, en valorisant les attributs des composants VTL : name, role, type, nullable et valuedomain. Les éléments de la case 2 sont utilisés pour générer le code VTL (ensembles de règles et transformations). -#### Outils disponibles +#### Utilitaires disponibles -#### Utilitaire `buildStructureFromSDMX3` +Les utilitaires de Trevas SDMX sont documentés . -`TrevasSDMXUtils.buildStructureFromSDMX3` permet d'obtenir une Trevas DataStructure. +#### Dépannage -En fournissant les données correspondantes, vous pouvez créer un Trevas Dataset. - -```java -Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "STRUCT_ID"); - -SparkDataset ds = new SparkDataset( - spark.read() - .option("header", "true") - .option("delimiter", ";") - .option("quote", "\"") - .csv("path"), - structure -); -``` - -#### Objet `SDMXVTLWorkflow` - -Le constructeur `SDMXVTLWorkflow` contient 3 arguments: - -- un `ScriptEngine` (Trevas ou autre) -- un `ReadableDataLocation`pour prendre en charge un message SDMX -- une correspondance entre noms et Datasets - -```java -SparkSession.builder() - .appName("test") - .master("local") - .getOrCreate(); - -ScriptEngineManager mgr = new ScriptEngineManager(); -ScriptEngine engine = mgr.getEngineByExtension("vtl"); -engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); - -ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml"); - -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); -``` - -Cet objet permet alors d'accéder aux 3 fonctions suivantes. - -#### Fonction SDMXVTLWorkflow `run` - Mode aperçu - -La fonction `run` peut facilement être appelée en mode aperçu, sans données jointes. - -```java -ScriptEngineManager mgr = new ScriptEngineManager(); -ScriptEngine engine = mgr.getEngineByExtension("vtl"); -engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); - -ReadableDataLocation rdl = new ReadableDataLocationTmp("src/test/resources/DSD_BPE_CENSUS.xml"); - -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); - -// instead of using TrevasSDMXUtils.buildStructureFromSDMX3 and data sources -// to build Trevas Datasets, sdmxVtlWorkflow.getEmptyDatasets() -// will handle SDMX message structures to produce Trevas Datasets -// with metadata defined in this message, and adding empty data -Map emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets(); -engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets); - -Map result = sdmxVtlWorkflow.run(); -``` - -Le mode aperçu permet de vérifier la conformité du fichier SDMX et des métadonnées des jeux de données en sortie. - -#### Fonction SDMXVTLWorkflow `run` - -Une fois qu'un `SDMXVTLWorkflow` est construit, il est facile d'exécuter les validations et transformations VTL définies dans le fichier SDMX. - -```java -Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1"); - -SparkDataset ds1 = new SparkDataset( - spark.read() - .option("header", "true") - .option("delimiter", ";") - .option("quote", "\"") - .csv("path/data.csv"), - structure -); - -ScriptEngineManager mgr = new ScriptEngineManager(); -ScriptEngine engine = mgr.getEngineByExtension("vtl"); -engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); - -Map inputs = Map.of("ds1", ds1); - -ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml"); - -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs); - -Map bindings = sdmxVtlWorkflow.run(); -``` - -En conséquence, on recevra l'ensemble des données définies comme persistantes dans la définition `TransformationSchemes`. - -#### Fonction SDMXVTLWorkflow `getTransformationsVTL` - -Permet d'obtenir le code VTL correspondant à la définition SDMX TransformationSchemes. - -```java -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); -String vtl = sdmxVtlWorkflow.getTransformationsVTL(); -``` - -#### Fonction SDMXVTLWorkflow `getRulesetsVTL` - -Permet d'obtenir le code VTL correspondant à la définition SDMX TransformationSchemes. - -```java -SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); -String dprs = sdmxVtlWorkflow.getRulesetsVTL(); -``` +Voir . diff --git a/docs/i18n/fr/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx b/docs/i18n/fr/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx index 41804bb87..3d9a2b7c9 100644 --- a/docs/i18n/fr/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx +++ b/docs/i18n/fr/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx @@ -45,6 +45,12 @@ import Card from '@theme/Card';
+
+ +
emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets(); +engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets); + +Map result = sdmxVtlWorkflow.run(); +``` + +Le mode aperçu permet de vérifier la conformité du fichier SDMX et des métadonnées des jeux de données en sortie. + +#### Fonction SDMXVTLWorkflow `run` + +Une fois qu'un `SDMXVTLWorkflow` est construit, il est facile d'exécuter les validations et transformations VTL définies dans le fichier SDMX. + +```java +Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1"); + +SparkDataset ds1 = new SparkDataset( + spark.read() + .option("header", "true") + .option("delimiter", ";") + .option("quote", "\"") + .csv("path/data.csv"), + structure +); + +ScriptEngineManager mgr = new ScriptEngineManager(); +ScriptEngine engine = mgr.getEngineByExtension("vtl"); +engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); + +Map inputs = Map.of("ds1", ds1); + +ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml"); + +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs); + +Map bindings = sdmxVtlWorkflow.run(); +``` + +En conséquence, on recevra l'ensemble des données définies comme persistantes dans la définition `TransformationSchemes`. + +#### Fonction SDMXVTLWorkflow `getTransformationsVTL` + +Permet d'obtenir le code VTL correspondant à la définition SDMX TransformationSchemes. + +```java +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); +String vtl = sdmxVtlWorkflow.getTransformationsVTL(); +``` + +#### Fonction SDMXVTLWorkflow `getRulesetsVTL` + +Permet d'obtenir le code VTL correspondant à la définition SDMX TransformationSchemes. + +```java +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); +String dprs = sdmxVtlWorkflow.getRulesetsVTL(); +``` + +## Dépannage + +### Hadoop client + +L'intégration de `vtl-modules` avec `hadoop-client` peut poser des problèmes de dépendances. + +Il a été remarqué que `com.fasterxml.woodstox.woodstox-core` est importé par `hadoop-client`, avec une version incompatible avec la sous-dépendance `vtl-sdmx`. + +Une façon de résoudre ce problème est d'exclure `com.fasterxml.woodstox.woodstox-core` de `hadoop-client` et d'importer une version plus récente dans votre `pom.xml`: + +```xml + + org.apache.hadoop + hadoop-client + 3.3.4 + + + com.fasterxml.woodstox + woodstox-core + + + + + com.fasterxml.woodstox + woodstox-core + 6.5.1 + +``` diff --git a/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-jupyter-0.3.2.mdx b/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-jupyter-0.3.2.mdx deleted file mode 100644 index 8467aaee0..000000000 --- a/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-jupyter-0.3.2.mdx +++ /dev/null @@ -1,37 +0,0 @@ ---- -slug: /trevas-jupyter-0.3.2 -title: Trevas Jupyter 0.3.2 -authors: [nicolas] -tags: [Trevas Jupyter] ---- - -import useBaseUrl from '@docusaurus/useBaseUrl'; -import Link from '@theme/Link'; - -[Trevas Jupyter](https://github.com/InseeFrLab/Trevas-Jupyter) `0.3.2` uses version `1.0.2` of [Trevas](https://github.com/InseeFr/Trevas). - -### News - -In addition to the greatly increased since the publication of Trevas 1.x.x, Trevas Jupyter offers 1 new connector: - -- SAS files (via the `loadSas` method) - -### Launch - -#### Manually adding the Trevas Kernel to an existing Jupyter instance - -- Trevas Jupyter compiler -- Copy the `kernel.json` file and the `bin` and `repo` folders to a new kernel folder. -- Edit the `kernel.json` file -- Launch Jupyter - -#### Docker - -```bash -docker pull inseefrlab/trevas-jupyter:0.3.2 -docker run -p 8888:8888 inseefrlab/trevas-jupyter:0.3.2 -``` - -#### Helm - -The Trevas Jupyter docker image can be instantiated via the `jupyter-pyspark` Helm contract from [InseeFrLab](https://github.com/InseeFrLab/helm-charts-interactive-services/tree/main). diff --git a/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-lab-0.3.3.mdx b/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-lab-0.3.3.mdx deleted file mode 100644 index 22981caab..000000000 --- a/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-lab-0.3.3.mdx +++ /dev/null @@ -1,24 +0,0 @@ ---- -slug: /trevas-lab-0.3.3 -title: Trevas Lab 0.3.3 -authors: [nicolas] -tags: [Trevas Lab] ---- - -import useBaseUrl from '@docusaurus/useBaseUrl'; -import Link from '@theme/Link'; - -[Trevas Lab](https://github.com/InseeFrLab/Trevas-Lab) `0.3.3` uses version `1.0.2` of [Trevas](https://github.com/InseeFr/Trevas). - -### News - -In addition to the greatly increased since the publication of Trevas 1.x.x, Trevas Lab offers 2 new connectors: - -- SAS files -- JDBC MariaDB - -### Launch - -#### Kubernetes - -Sample Kubernetes objects are available in the `.kubernetes` folders of [Trevas Lab](https://github.com/InseeFrLab/Trevas-Lab/tree/master/.kubernetes) and [Trevas Lab UI](https://github.com/InseeFrLab/Trevas-Lab-UI/tree/master/.kubernetes). diff --git a/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-02-trevas-batch-0.1.1.mdx b/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-02-trevas-batch-0.1.1.mdx deleted file mode 100644 index 75804f834..000000000 --- a/docs/i18n/no/docusaurus-plugin-content-blog/2023-07-02-trevas-batch-0.1.1.mdx +++ /dev/null @@ -1,32 +0,0 @@ ---- -slug: /trevas-batch-0.1.1 -title: Trevas Batch 0.1.1 -authors: [nicolas] -tags: [Trevas Batch] ---- - -import useBaseUrl from '@docusaurus/useBaseUrl'; -import Link from '@theme/Link'; - -[Trevas Batch](https://github.com/Making-Sense-Info/Trevas-Batch) `0.1.1` uses version `1.0.2` of [Trevas](https://github.com/InseeFr/Trevas). - -This Java batch provides Trevas execution metrics in Spark mode. - -The configuration file to fill in is described in the [README](https://github.com/Making-Sense-Info/Trevas-Batch/tree/main#readme) of the project. -Launching the batch will produce a Markdown file as output. - -### Launch - -#### Local - -```java -java -jar trevas-batch-0.1.1.jar -Dconfig.path="..." -Dreport.path="..." -``` - -The java execution will be done in local Spark. - -#### Kubernetes - -Default Kubernetes objects are defined in the [.kubernetes](https://github.com/Making-Sense-Info/Trevas-Batch/tree/main/.kubernetes) folder. - -Feed the `config-map.yml` file then launch the job in your cluster. diff --git a/docs/i18n/no/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx b/docs/i18n/no/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx index f785d571a..1e58063bd 100644 --- a/docs/i18n/no/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx +++ b/docs/i18n/no/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx @@ -15,8 +15,8 @@ import Card from '@theme/Card'; Apache Parquet - format er den eneste måten å lagre og administrere VTL-metadata når Trevas-motoren - instansieres i Spark-modus. + format er den eneste måten å lagre og administrere VTL-metadata når + Trevas-motoren instansieres i Spark-modus.

Det anbefales derfor sterkt å bruke dette formatet.

@@ -44,6 +44,12 @@ import Card from '@theme/Card';
+
+ +
emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets(); +engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets); + +Map result = sdmxVtlWorkflow.run(); +``` + +The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets. + +### SDMXVTLWorkflow `run` function + +Once an `SDMXVTLWorkflow` is built, it is easy to run the VTL validations and transformations defined in the SDMX file. + +```java +Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1"); + +SparkDataset ds1 = new SparkDataset( + spark.read() + .option("header", "true") + .option("delimiter", ";") + .option("quote", "\"") + .csv("path/data.csv"), + structure +); + +ScriptEngineManager mgr = new ScriptEngineManager(); +ScriptEngine engine = mgr.getEngineByExtension("vtl"); +engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); + +Map inputs = Map.of("ds1", ds1); + +ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml"); + +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs); + +Map bindings = sdmxVtlWorkflow.run(); +``` + +As a result, one will receive all the dataset defined as persistent in the `TransformationSchemes` definition. + +### SDMXVTLWorkflow `getTransformationsVTL` function + +Gets the VTL code corresponding to the SDMX TransformationSchemes definition. + +```java +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); +String vtl = sdmxVtlWorkflow.getTransformationsVTL(); +``` + +### SDMXVTLWorkflow `getRulesetsVTL` function + +Gets the VTL code corresponding to the SDMX TransformationSchemes definition. + +```java +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); +String dprs = sdmxVtlWorkflow.getRulesetsVTL(); +``` + +## Troubleshooting + +### Hadoop client + +The integration of `vtl-modules` with `hadoop-client` can cause dependency issues. + +It was noted that `com.fasterxml.woodstox.woodstox-core` is imported by `hadoop-client`, with an incompatible version for a `vtl-sdmx` sub-dependency. + +A way to fix is to exclude `com.fasterxml.woodstox.woodstox-core` dependency from `hadoop-client` and import a newest version in your `pom.xml`: + +```xml + + org.apache.hadoop + hadoop-client + 3.3.4 + + + com.fasterxml.woodstox + woodstox-core + + + + + com.fasterxml.woodstox + woodstox-core + 6.5.1 + +``` diff --git a/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-jupyter-0.3.2.mdx b/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-jupyter-0.3.2.mdx deleted file mode 100644 index 8467aaee0..000000000 --- a/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-jupyter-0.3.2.mdx +++ /dev/null @@ -1,37 +0,0 @@ ---- -slug: /trevas-jupyter-0.3.2 -title: Trevas Jupyter 0.3.2 -authors: [nicolas] -tags: [Trevas Jupyter] ---- - -import useBaseUrl from '@docusaurus/useBaseUrl'; -import Link from '@theme/Link'; - -[Trevas Jupyter](https://github.com/InseeFrLab/Trevas-Jupyter) `0.3.2` uses version `1.0.2` of [Trevas](https://github.com/InseeFr/Trevas). - -### News - -In addition to the greatly increased since the publication of Trevas 1.x.x, Trevas Jupyter offers 1 new connector: - -- SAS files (via the `loadSas` method) - -### Launch - -#### Manually adding the Trevas Kernel to an existing Jupyter instance - -- Trevas Jupyter compiler -- Copy the `kernel.json` file and the `bin` and `repo` folders to a new kernel folder. -- Edit the `kernel.json` file -- Launch Jupyter - -#### Docker - -```bash -docker pull inseefrlab/trevas-jupyter:0.3.2 -docker run -p 8888:8888 inseefrlab/trevas-jupyter:0.3.2 -``` - -#### Helm - -The Trevas Jupyter docker image can be instantiated via the `jupyter-pyspark` Helm contract from [InseeFrLab](https://github.com/InseeFrLab/helm-charts-interactive-services/tree/main). diff --git a/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-lab-0.3.3.mdx b/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-lab-0.3.3.mdx deleted file mode 100644 index 22981caab..000000000 --- a/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-01-v1-trevas-lab-0.3.3.mdx +++ /dev/null @@ -1,24 +0,0 @@ ---- -slug: /trevas-lab-0.3.3 -title: Trevas Lab 0.3.3 -authors: [nicolas] -tags: [Trevas Lab] ---- - -import useBaseUrl from '@docusaurus/useBaseUrl'; -import Link from '@theme/Link'; - -[Trevas Lab](https://github.com/InseeFrLab/Trevas-Lab) `0.3.3` uses version `1.0.2` of [Trevas](https://github.com/InseeFr/Trevas). - -### News - -In addition to the greatly increased since the publication of Trevas 1.x.x, Trevas Lab offers 2 new connectors: - -- SAS files -- JDBC MariaDB - -### Launch - -#### Kubernetes - -Sample Kubernetes objects are available in the `.kubernetes` folders of [Trevas Lab](https://github.com/InseeFrLab/Trevas-Lab/tree/master/.kubernetes) and [Trevas Lab UI](https://github.com/InseeFrLab/Trevas-Lab-UI/tree/master/.kubernetes). diff --git a/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-02-trevas-batch-0.1.1.mdx b/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-02-trevas-batch-0.1.1.mdx deleted file mode 100644 index 75804f834..000000000 --- a/docs/i18n/zh-CN/docusaurus-plugin-content-blog/2023-07-02-trevas-batch-0.1.1.mdx +++ /dev/null @@ -1,32 +0,0 @@ ---- -slug: /trevas-batch-0.1.1 -title: Trevas Batch 0.1.1 -authors: [nicolas] -tags: [Trevas Batch] ---- - -import useBaseUrl from '@docusaurus/useBaseUrl'; -import Link from '@theme/Link'; - -[Trevas Batch](https://github.com/Making-Sense-Info/Trevas-Batch) `0.1.1` uses version `1.0.2` of [Trevas](https://github.com/InseeFr/Trevas). - -This Java batch provides Trevas execution metrics in Spark mode. - -The configuration file to fill in is described in the [README](https://github.com/Making-Sense-Info/Trevas-Batch/tree/main#readme) of the project. -Launching the batch will produce a Markdown file as output. - -### Launch - -#### Local - -```java -java -jar trevas-batch-0.1.1.jar -Dconfig.path="..." -Dreport.path="..." -``` - -The java execution will be done in local Spark. - -#### Kubernetes - -Default Kubernetes objects are defined in the [.kubernetes](https://github.com/Making-Sense-Info/Trevas-Batch/tree/main/.kubernetes) folder. - -Feed the `config-map.yml` file then launch the job in your cluster. diff --git a/docs/i18n/zh-CN/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx b/docs/i18n/zh-CN/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx index e31185100..9bb131cdf 100644 --- a/docs/i18n/zh-CN/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx +++ b/docs/i18n/zh-CN/docusaurus-plugin-content-docs/current/developer-guide/spark-mode/data-sources/index-data-sources.mdx @@ -15,7 +15,8 @@ import Card from '@theme/Card'; 当 Trevas 引擎在 Spark 模式下运行时, Apache Parquet - 格式是唯一允许存储和管理 VTL 元数据的格式。 + {' '} + 格式是唯一允许存储和管理 VTL 元数据的格式。

因此强烈建议使用`Apache Parquet`这种格式。

@@ -43,6 +44,12 @@ import Card from '@theme/Card';
+
+ +
emptyDatasets = sdmxVtlWorkflow.getEmptyDatasets(); +engine.getBindings(ScriptContext.ENGINE_SCOPE).putAll(emptyDatasets); + +Map result = sdmxVtlWorkflow.run(); +``` + +The preview mode allows to check the conformity of the SDMX file and the metadata of the output datasets. + +### SDMXVTLWorkflow `run` function + +Once an `SDMXVTLWorkflow` is built, it is easy to run the VTL validations and transformations defined in the SDMX file. + +```java +Structured.DataStructure structure = TrevasSDMXUtils.buildStructureFromSDMX3("path/sdmx_file.xml", "ds1"); + +SparkDataset ds1 = new SparkDataset( + spark.read() + .option("header", "true") + .option("delimiter", ";") + .option("quote", "\"") + .csv("path/data.csv"), + structure +); + +ScriptEngineManager mgr = new ScriptEngineManager(); +ScriptEngine engine = mgr.getEngineByExtension("vtl"); +engine.put(VtlScriptEngine.PROCESSING_ENGINE_NAMES, "spark"); + +Map inputs = Map.of("ds1", ds1); + +ReadableDataLocation rdl = new ReadableDataLocationTmp("path/sdmx_file.xml"); + +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, inputs); + +Map bindings = sdmxVtlWorkflow.run(); +``` + +As a result, one will receive all the dataset defined as persistent in the `TransformationSchemes` definition. + +### SDMXVTLWorkflow `getTransformationsVTL` function + +Gets the VTL code corresponding to the SDMX TransformationSchemes definition. + +```java +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); +String vtl = sdmxVtlWorkflow.getTransformationsVTL(); +``` + +### SDMXVTLWorkflow `getRulesetsVTL` function + +Gets the VTL code corresponding to the SDMX TransformationSchemes definition. + +```java +SDMXVTLWorkflow sdmxVtlWorkflow = new SDMXVTLWorkflow(engine, rdl, Map.of()); +String dprs = sdmxVtlWorkflow.getRulesetsVTL(); +``` + +## Troubleshooting + +### Hadoop client + +The integration of `vtl-modules` with `hadoop-client` can cause dependency issues. + +It was noted that `com.fasterxml.woodstox.woodstox-core` is imported by `hadoop-client`, with an incompatible version for a `vtl-sdmx` sub-dependency. + +A way to fix is to exclude `com.fasterxml.woodstox.woodstox-core` dependency from `hadoop-client` and import a newest version in your `pom.xml`: + +```xml + + org.apache.hadoop + hadoop-client + 3.3.4 + + + com.fasterxml.woodstox + woodstox-core + + + + + com.fasterxml.woodstox + woodstox-core + 6.5.1 + +``` diff --git a/docs/package.json b/docs/package.json index fa086c0a0..636ddf742 100644 --- a/docs/package.json +++ b/docs/package.json @@ -71,7 +71,7 @@ "eslint-plugin-simple-import-sort": "^10.0.0", "prettier": "^3.3.2", "prettier-linter-helpers": "^1.0.0", - "typescript": "^5.5.2" + "typescript": "^5.5.3" }, "browserslist": { "production": [ diff --git a/docs/sidebars.js b/docs/sidebars.js index 44657d612..fb61af251 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -126,9 +126,10 @@ module.exports = { label: 'Data sources', items: [ 'developer-guide/spark-mode/data-sources/index-data-sources', - 'developer-guide/spark-mode/data-sources/parquet', 'developer-guide/spark-mode/data-sources/csv', 'developer-guide/spark-mode/data-sources/jdbc', + 'developer-guide/spark-mode/data-sources/parquet', + 'developer-guide/spark-mode/data-sources/sdmx', 'developer-guide/spark-mode/data-sources/others', ], }, diff --git a/docs/yarn.lock b/docs/yarn.lock index 0f33dd4da..751f14619 100644 --- a/docs/yarn.lock +++ b/docs/yarn.lock @@ -9034,10 +9034,10 @@ typedarray-to-buffer@^3.1.5: dependencies: is-typedarray "^1.0.0" -typescript@^5.5.2: - version "5.5.2" - resolved "https://registry.yarnpkg.com/typescript/-/typescript-5.5.2.tgz#c26f023cb0054e657ce04f72583ea2d85f8d0507" - integrity sha512-NcRtPEOsPFFWjobJEtfihkLCZCXZt/os3zf8nTxjVH3RvTSxjrCamJpbExGvYOF+tFHc3pA65qpdwPbzjohhew== +typescript@^5.5.3: + version "5.5.3" + resolved "https://registry.yarnpkg.com/typescript/-/typescript-5.5.3.tgz#e1b0a3c394190838a0b168e771b0ad56a0af0faa" + integrity sha512-/hreyEujaB0w76zKo6717l3L0o/qEUtRgdvUBvlkhoWeOVMjMuHNHk0BRBzikzuGDqNmPQbg5ifMEqsHLiIUcQ== ua-parser-js@^0.7.30: version "0.7.33"