From b0c0336dcc9654b81120285c5f6af5902bd2c750 Mon Sep 17 00:00:00 2001 From: Nick Jackson Date: Mon, 23 Oct 2023 14:43:50 +0100 Subject: [PATCH] Add TDE files for stats extraction This forms the basis of bulk data extraction for analysis; two new Template Driven Extraction templates will populate `documents.summary` and `documents.propertysummary` tables, which can be collectively queried using an SQL `JOIN` to extract a table of key information about documents. These are a first pass, and will likely change as we get a better handle on the stats we need to extract. Fields which have `` set to `ignore` and `` set to `true` will create rows with `NULL` values where a value cannot be found. Rows where this isn't the case will be dropped, as they will upset the `JOIN`. | Field | Value | Ignore invalid? | | --- | --- | --- | | `uri` | `xdmp:node-uri(.)` | No | | `ncn` | `sem:iri(//uk:cite//text())` | Yes | | `name` | `sem:iri(//akn:FRBRWork/akn:FRBRname/@value)` | Yes | | Field | Value | Ignore invalid? | | --- | --- | --- | | `uri` | `xdmp:node-uri(.)` | No | | `doc_uri` | `dls:version/dls:document-uri` | Yes | | `version_number` | `dls:version/dls:version-id` | Yes | | `modified` | `prop:last-modified` | Yes | | `published` | `published` | Yes | --- .../ml-schemas/tde/sql-document-extract.xsd | 51 ++++++++++++++++ .../tde/sql-document-properties-extract.xsd | 59 +++++++++++++++++++ 2 files changed, 110 insertions(+) create mode 100644 src/main/ml-schemas/tde/sql-document-extract.xsd create mode 100644 src/main/ml-schemas/tde/sql-document-properties-extract.xsd diff --git a/src/main/ml-schemas/tde/sql-document-extract.xsd b/src/main/ml-schemas/tde/sql-document-extract.xsd new file mode 100644 index 0000000..bf1c2c8 --- /dev/null +++ b/src/main/ml-schemas/tde/sql-document-extract.xsd @@ -0,0 +1,51 @@ + diff --git a/src/main/ml-schemas/tde/sql-document-properties-extract.xsd b/src/main/ml-schemas/tde/sql-document-properties-extract.xsd new file mode 100644 index 0000000..3d4daed --- /dev/null +++ b/src/main/ml-schemas/tde/sql-document-properties-extract.xsd @@ -0,0 +1,59 @@ +