Skip to content

Commit

Permalink
Merge pull request #202 from zazuko/move-cube
Browse files Browse the repository at this point in the history
cube package
  • Loading branch information
tpluscode authored Nov 22, 2023
2 parents a5ad883 + bbc237d commit f45559f
Show file tree
Hide file tree
Showing 52 changed files with 1,396 additions and 504 deletions.
5 changes: 5 additions & 0 deletions .changeset/clean-owls-think.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"barnard59-core": minor
---

Add support for "late errors" where step authors can call `context.error()` to avoid immediately breaking the pipeline
5 changes: 5 additions & 0 deletions .changeset/five-cups-wash.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"barnard59-sparql": patch
---

fix code link in manifest
22 changes: 22 additions & 0 deletions .changeset/silver-humans-joke.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
"barnard59-cube": major
"barnard59-rdf": major
---

Move cube operations from package `barnard59-rdf` to the new package `barnard59-cube`.


```diff
<#toObservation> a p:Step;
code:implementedBy [ a code:EcmaScriptModule;
- code:link <node:barnard59-rdf/cube.js#toObservation>
+ code:link <node:barnard59-cube/cube.js#toObservation>
].

<#buildCubeShape> a p:Step;
code:implementedBy [ a code:EcmaScriptModule;
- code:link <node:barnard59-rdf/cube.js#buildCubeShape>
+ code:link <node:barnard59-code/cube.js#buildCubeShape>
].

```
5 changes: 5 additions & 0 deletions .changeset/soft-peaches-brake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"barnard59-env": minor
---

Added `cube` and `meta` namespaces
5 changes: 5 additions & 0 deletions .changeset/strong-lions-wait.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"barnard59": patch
---

include peer dependencies in manifest discovery
1 change: 1 addition & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ jobs:
package:
- base
- core
- cube
- csvw
- formats
- ftp
Expand Down
1 change: 1 addition & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ flag_management:
- name: barnard59-base
- name: barnard59-core
- name: barnard59-csvw
- name: barnard59-cube
- name: barnard59-formats
- name: barnard59-ftp
- name: barnard59-graph-store
Expand Down
101 changes: 95 additions & 6 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion packages/cli/lib/discoverManifests.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ const require = module.createRequire(import.meta.url)
export default async function * () {
const packages = findPlugins({
includeDev: true,
includePeer: true,
filter({ pkg }) {
return packagePattern.test(pkg.name) && hasManifest(pkg.name)
},
Expand All @@ -19,7 +20,7 @@ export default async function * () {
if (hasManifest(dir)) {
const { name, version } = require(`${dir}/package.json`)
yield {
name,
name: packagePattern.test(name) ? name.match(packagePattern)[1] : name,
manifest: rdf.clownface({ dataset: await rdf.dataset().import(rdf.fromFile(`${dir}/manifest.ttl`)) }),
version,
}
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/lib/pipeline.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ export const desugar = async (dataset, { logger, knownOperations } = {}) => {
const [quad] = step.dataset.match(step.term)
const knownStep = knownOperations.get(quad?.predicate)
if (!knownStep) {
logger?.warn(`Operation <${quad?.predicate.value}> not found in known manifests. Have you added the right \`branard59-*\` package as dependency?`)
logger?.warn(`Operation <${quad?.predicate.value}> not found in known manifests. Have you added the right \`barnard59-*\` package as dependency?`)
continue
}

Expand Down
13 changes: 10 additions & 3 deletions packages/core/lib/factory/pipeline.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ import { VariableMap } from '../VariableMap.js'
import createStep from './step.js'
import createVariables from './variables.js'

async function createPipelineContext(ptr, { basePath, context, logger, variables }) {
return { ...context, basePath, logger, variables }
async function createPipelineContext(ptr, { basePath, context, logger, variables, error }) {
return { error, ...context, basePath, logger, variables }
}

async function createPipelineVariables(ptr, { basePath, context, loaderRegistry, logger, variables }) {
Expand Down Expand Up @@ -35,8 +35,15 @@ function createPipeline(ptr, {
ptr = context.env.clownface({ dataset: ptr.dataset, term: ptr.term })

const onInit = async pipeline => {
function error(err) {
logger.error(err)
if (!pipeline.error) {
pipeline.error = err
}
}

variables = await createPipelineVariables(ptr, { basePath, context, loaderRegistry, logger, variables })
context = await createPipelineContext(ptr, { basePath, context, logger, variables })
context = await createPipelineContext(ptr, { basePath, context, logger, variables, error })

logVariables(ptr, context, variables)

Expand Down
3 changes: 3 additions & 0 deletions packages/core/lib/run.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ async function run(pipeline, { end = false, resume = false } = {}) {
pipeline.logger.on('finish', () => resolve())
})

if (pipeline.error) {
throw pipeline.error
}
pipeline.logger.end()
await p
} catch (err) {
Expand Down
85 changes: 85 additions & 0 deletions packages/cube/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# barnard59-cube

This package provides operations and commands for RDF cubes in Barnard59 Linked Data pipelines.
The `manifest.ttl` file contains a full list of all operations included in this package.

## Operations

### `cube/buildCubeShape`

TBD

### `cube/toObservation`

TBD


## Commands

## Cube validation

The following pipelines retrieve and validate cube observations and their constraints.

### fetch constraint

Pipeline `fetch-constraint` queries a given SPARQL endpoint to retrieve
a [concise bounded description](https://docs.stardog.com/query-stardog/#describe-queries) of the `cube:Constraint` part of a given cube.

```bash
npx barnard59 cube fetch-constraint \
--cube https://agriculture.ld.admin.ch/agroscope/PRIFm8t15/2 \
--endpoint https://int.lindas.admin.ch/query
```


This pipeline is useful mainly for cubes published with [cube creator](https://github.com/zazuko/cube-creator) (if the cube definition is manually crafted, likely it's already available as a local file).


### check constraint

Pipeline `check-constraint` validates the input constraint against the shapes provided with the `profile` variable (the default profile is https://cube.link/latest/shape/standalone-constraint-constraint).

The pipeline reads the constraint from `stdin`, allowing input from a local file (as in the following example) as well as from the output of the `fetch-constraint` pipeline (in most cases it's useful to have the constraint in a local file because it's needed also for the `check-observations` pipeline).

```bash
cat myConstraint.ttl \
| npx barnard59 cube check-constraint \
--profile https://cube.link/v0.1.0/shape/standalone-constraint-constraint
```
SHACL reports for violations are written to `stdout`.


### fetch observations

Pipeline `fetch-observations` queries a given SPARQL endpoint to retrieve the observations of a given cube.

```bash
npx barnard59 cube fetch-observations \
--cube https://agriculture.ld.admin.ch/agroscope/PRIFm8t15/2 \
--endpoint https://int.lindas.admin.ch/query
```
Results are written to `stdout`.

### check observations

Pipeline `check-observations` validates the input observations against the shapes provided with the `constraint` variable.

The pipeline reads the observations from `stdin`, allowing input from a local file (as in the following example) as well as from the output of the `fetch-observations` pipeline.

```bash
cat myObservations.ttl \
| npx barnard59 cube check-observations \
--constraint myConstraint.ttl
```

To enable validation, the pipeline adds to the constraint a `sh:targetClass` property with value `cube:Observation`, requiring that each observation has an explicit `rdf:type`.

To leverage streaming, input is split and validated in little batches of adjustable size (the default is 50 and likely it's appropriate in most cases). This allows the validation of very big cubes because observations are not loaded in memory all at once. To ensure triples for the same observation are adjacent (hence processed in the same batch), the input is sorted by subject (and in case the input is large the sorting step relies on temporary local files).

SHACL reports for violations are written to `stdout`.

To limit the output size, there is also a `maxViolations` option to stop validation when the given number of violations is reached.

### Known issues

Command `check-constraint` may fail if there are `sh:in` constraints with too many values.
File renamed without changes.
Loading

0 comments on commit f45559f

Please sign in to comment.