Skip to content

Commit

Permalink
Data dictionary capability for datasets (#4015)
Browse files Browse the repository at this point in the history
  • Loading branch information
dafeder authored Oct 6, 2023
1 parent f285310 commit a148554
Show file tree
Hide file tree
Showing 28 changed files with 1,169 additions and 181 deletions.
3 changes: 2 additions & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@
"oomphinc/composer-installers-extender": "^2.0",
"ramsey/uuid" : "^3.8.0",
"stolt/json-merge-patch": "^2.0",
"getdkan/pdlt": "~0.1"
"getdkan/pdlt": "~0.1",
"symfony/polyfill-php80": "^1.27"
},
"require-dev": {
"getdkan/mock-chain": "^1.3.0",
Expand Down
4 changes: 1 addition & 3 deletions docs/source/components/dkan_metastore.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The term "data dictionary" is fairly broad, and can refer to anything from a PDF
machine-readable table schema.

While users are free to integrate data dictionaries into their metadata schemas in any way they chose
in DKAN, we are introducing our own native data dictionary concept. Data dictionaries in DKAN are JSON
in DKAN, DKAN also has its own native data dictionary type. Data dictionaries in DKAN are JSON
metadata items managed in the metastore in the same way that datasets and distributions are. They are,
however, less flexible than other metadata schema, which can be completely overridden/replaced in your
instance of DKAN. To use DKAN's new native data dictionary features, you must use the `data-dictionary`
Expand All @@ -70,8 +70,6 @@ Data dictionaries can have three different relationships with your catalog:
2. You may define a set of domain-specific data dictionaries for your catalog, which you can chose between when creating a dataset.
3. You may define one data dictionary for every dataset, or even every distribution, in your catalog.

*(Note that only option #1 above has been implemented in the current version of DKAN.)*

Data dictionaries will affect the behavior of the :doc:`Datastore <dkan_datastore>`.

By default, all data imported into a datastore will be stored as strings.
Expand Down
85 changes: 77 additions & 8 deletions docs/source/user-guide/guide_data_dictionaries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,13 @@ Tutorial I: Catalog-wide data dictionary

Creating a data dictionary via the API
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The simplest way to use data dictionaries on your site is to create one for the entire catalog. To do this, let's first create a new dictionary using the API. We will define a list of fields based on the example header row below.

.. note:: Note
Data dictionaries through the UI are still a work in progress!

The simplest way to use data dictionaries on your site is to create one for the entire catalog. To
do this, let's first create a new dictionary using the API. We will define a list of fields based
on the example header row below.

.. list-table::
:widths: 16 16 16 16 16 16
Expand All @@ -117,7 +123,7 @@ The simplest way to use data dictionaries on your site is to create one for the

----

.. code-block::
.. code-block:: http
POST http://mydomain.com/api/1/metastore/schemas/data-dictionary/items
Authorization: Basic username:password
Expand Down Expand Up @@ -166,13 +172,15 @@ The simplest way to use data dictionaries on your site is to create one for the
We get a response that tells us the identifier for the new dictionary is `7fd6bb1f-2752-54de-9a33-81ce2ea0feb2`.

We now need to set the data dictionary mode to *sitewide*, and the sitewide data dictionary to this identifier. For now, we must do this through drush:

.. code-block::
We now need to set the data dictionary mode to *sitewide*, and the sitewide data dictionary to this identifier.

drush -y config:set metastore.settings data_dictionary_mode 1
drush -y config:set metastore.settings data_dictionary_sitewide 7fd6bb1f-2752-54de-9a33-81ce2ea0feb2
1. Go to admin/dkan/data-dictionary/settings
2. Set "Dictionary Mode" to "Sitewide".
3. Set "Sitewide Dictionary ID" to `7fd6bb1f-2752-54de-9a33-81ce2ea0feb2`.

.. image:: images/dictionary-settings.png
:alt: Data dictionay settings admin page, with select input for "Dictionary Mode" set to "Sitewide" and text
input for Sitewide Dictionary ID containing the identifier 7fd6bb1f-2752-54de-9a33-81ce2ea0feb2.

Creating a data dictionary via the UI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -188,5 +196,66 @@ Creating a data dictionary via the UI

Adding indexes
^^^^^^^^^^^^^^
The same process is used for adding indexes to the datastores.
Data dictionaries can be used to describe indexes that should be applied when importing to a database.
Learn more about this on :doc:`guide_indexes`

Tutorial II: Assign a data dictionary to a dataset
--------------------------------------------------

Datasets can reference specific data dictionaries as well. Follow the last tutorial and create a data dictionary
with ID `7fd6bb1f-2752-54de-9a33-81ce2ea0feb2`.

Now, let's use the UI to set the data dictionary mode mode to "distribution reference".

.. note::
Distribution reference mode for data dictionaries means that DKAN will look for links to data dictionaries in the
`describedBy` field of the distribution that a data file is described in. It will look for a URL to a data dictionary
in the metastore The `describedByType` must also be `application/vnd.tableschema+json` to signal correct data
dictionary format.

1. Go to admin/dkan/data-dictionary/settings
2. Set "Dictionary Mode" to "Distribution reference".

Now let's link a dataset to a data dictionay. Again, let's use the API for now.

.. code-block:: http
POST http://mydomain.com/api/1/metastore/schemas/dataset/items
Authorization: Basic username:password
{
"@type": "dcat:Dataset",
"accessLevel": "public",
"contactPoint": {
"fn": "Jane Doe",
"hasEmail": "mailto:[email protected]"
},
"title": "Project list",
"description": "Example dataset.",
"distribution": [
{
"@type": "dcat:Distribution",
"downloadURL": "https://example.com/projects.csv",
"mediaType": "text\/csv",
"format": "csv",
"title": "Projects",
"describedBy": "dkan://metastore/schemas/data-dictionary/items/7fd6bb1f-2752-54de-9a33-81ce2ea0feb2",
"describedByType": "application/vnd.tableschema+json"
}
],
"issued": "2016-06-22",
"license": "http://opendatacommons.org/licenses/by/1.0/",
"modified": "2016-06-22",
"publisher": {
"@type": "org:Organization",
"name": "Data publisher"
},
"keyword":["tag1"]
}
Note the special URL used to point to the data dictionary. The full URL, e.g.
http://mydomain.com/api/1metastore/schemas/data-dictionary/items/7fd6bb1f-2752-54de-9a33-81ce2ea0feb2,
could also be used, and would be converted to an internal `dkan://` URL on save.

This data dictionary will now be used to modify the datastore table after import. If we were to
request the dataset back from the API, it would show us the absolute URL as well.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 5 additions & 4 deletions modules/common/src/StreamWrapper/DkanStreamWrapper.php
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ class DkanStreamWrapper extends LocalReadOnlyStream implements StreamWrapperInte

use StringTranslationTrait;

const DKAN_API_VERSION = 1;
const DKAN_API_URL_BASE = "/api/1/";

/**
* {@inheritdoc}
Expand All @@ -41,15 +41,16 @@ public static function getType() {
* {@inheritdoc}
*/
public function getExternalUrl() {
$url = Url::fromUserInput("/api/1/" . $this->getTarget(), ['absolute' => TRUE]);
return $url->toString();
$url = Url::fromUserInput(self::DKAN_API_URL_BASE . $this->getTarget(), ['absolute' => TRUE]);
$return = $url->toString(TRUE);
return $return->getGeneratedUrl();
}

/**
* {@inheritdoc}
*/
public function getDirectoryPath() {
$url = Url::fromUserInput("/api/1/", ['absolute' => TRUE]);
$url = Url::fromUserInput(self::DKAN_API_URL_BASE, ['absolute' => TRUE]);
return $url->toString();
}

Expand Down
54 changes: 54 additions & 0 deletions modules/common/src/UrlHostTokenResolver.php
Original file line number Diff line number Diff line change
Expand Up @@ -66,4 +66,58 @@ public static function resolveFilePath(string $resourceUrl): string {
));
}

/**
* Substitute the host for local URLs with a custom localhost token.
*
* @param string $resourceUrl
* The URL of the resource being substituted.
*
* @return string
* The resource URL with the custom localhost token.
*/
public static function hostify(string $resourceUrl): string {
// Get HTTP server public files URL and extract the host.
$serverPublicFilesUrl = self::getServerPublicFilesUrl();
$serverPublicFilesUrl = isset($serverPublicFilesUrl) ? parse_url($serverPublicFilesUrl) : NULL;
$serverHost = $serverPublicFilesUrl['host'] ?? \Drupal::request()->getHost();
// Determine whether the resource URL has the same host as this server.
$resourceParsedUrl = parse_url($resourceUrl);
if (isset($resourceParsedUrl['host']) && $resourceParsedUrl['host'] == $serverHost) {
// Swap out the host portion of the resource URL with the localhost token.
$resourceParsedUrl['host'] = UrlHostTokenResolver::TOKEN;
$resourceUrl = self::unparseUrl($resourceParsedUrl);
}
return $resourceUrl;
}

/**
* Private.
*/
private static function unparseUrl($parsedUrl) {
$url = '';
$urlParts = [
'scheme',
'host',
'port',
'user',
'pass',
'path',
'query',
'fragment',
];

foreach ($urlParts as $part) {
if (!isset($parsedUrl[$part])) {
continue;
}
$url .= ($part == "port") ? ':' : '';
$url .= ($part == "query") ? '?' : '';
$url .= ($part == "fragment") ? '#' : '';
$url .= $parsedUrl[$part];
$url .= ($part == "scheme") ? '://' : '';
}

return $url;
}

}
39 changes: 35 additions & 4 deletions modules/common/tests/src/Unit/UrlHostTokenResolverTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,29 @@
use Drupal\Core\StreamWrapper\StreamWrapperManager;

use Drupal\common\UrlHostTokenResolver;

use Drupal\Core\StreamWrapper\PublicStream;
use MockChain\Chain;
use MockChain\Options;
use PHPUnit\Framework\TestCase;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\RequestStack;

/**
*
*/
class UrlHostTokenResolverTest extends TestCase {

/**
* HTTP host protocol and domain for testing download URL.
*
* @var string
*/
public const HOST = 'http://example.com';

/**
* HTTP file path for testing download URL.
*
* @var string
*/
public const FILE_PATH = 'tmp/mycsv.csv';

/**
*
*/
Expand All @@ -40,4 +51,24 @@ public function test() {
$this->assertEquals('blahj do bla da bla replacement after token.', $newString);
}


/**
* Test the `Referencer::hostify()` method.
*/
public function testHostify(): void {
// Initialize `\Drupal::container`.
$options = (new Options())
->add('stream_wrapper_manager', StreamWrapperManager::class)
->index(0);
$container_chain = (new Chain($this))
->add(Container::class, 'get', $options)
->add(PublicStream::class, 'getExternalUrl', self::HOST)
->add(StreamWrapperManager::class, 'getViaUri', PublicStream::class);
\Drupal::setContainer($container_chain->getMock());
// Ensure the hostify method is properly resolving the supplied URL.
$this->assertEquals(
'http://' . UrlHostTokenResolver::TOKEN . '/' . self::FILE_PATH,
UrlHostTokenResolver::hostify(self::HOST . '/' . self::FILE_PATH));
}

}
7 changes: 3 additions & 4 deletions modules/datastore/src/Service/ResourceLocalizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,17 @@
namespace Drupal\datastore\Service;

use Contracts\FactoryInterface;
use Drupal\common\LoggerTrait;
use Drupal\common\DataResource;
use Drupal\common\EventDispatcherTrait;
use Drupal\common\LoggerTrait;
use Drupal\common\Storage\JobStoreFactory;
use Drupal\common\UrlHostTokenResolver;
use Drupal\common\Util\DrupalFiles;
use Drupal\Core\File\FileSystemInterface;
use Drupal\metastore\Exception\AlreadyRegistered;
use Drupal\metastore\Reference\Referencer;
use Drupal\metastore\ResourceMapper;
use FileFetcher\FileFetcher;
use Procrastinator\Result;
use Drupal\common\EventDispatcherTrait;

/**
* Resource localizer.
Expand Down Expand Up @@ -131,7 +130,7 @@ private function registerNewPerspectives(DataResource $resource, FileFetcher $fi
$public_dir = 'file://' . $this->drupalFiles->getPublicFilesDirectory();
$localFileDrupalUri = str_replace($public_dir, 'public://', $localFilePath);
$localUrl = $this->drupalFiles->fileCreateUrl($localFileDrupalUri);
$localUrl = Referencer::hostify($localUrl);
$localUrl = UrlHostTokenResolver::hostify($localUrl);

$new = $resource->createNewPerspective(self::LOCAL_FILE_PERSPECTIVE, $localFilePath);

Expand Down
12 changes: 11 additions & 1 deletion modules/metastore/metastore.services.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ services:
arguments:
- '@config.factory'
- '@dkan.metastore.storage'
- '@dkan.metastore.resource_mapper'
- '@dkan.metastore.url_generator'
calls:
- [setLoggerFactory, ['@logger.factory']]

Expand Down Expand Up @@ -108,7 +108,17 @@ services:
- '@cache_tags.invalidator'
- '@module_handler'

dkan.metastore.url_generator:
class: \Drupal\metastore\Reference\MetastoreUrlGenerator
arguments:
- '@stream_wrapper_manager'
- '@dkan.metastore.service'
- '@request_stack'

dkan.metastore.data_dictionary_discovery:
class: \Drupal\metastore\DataDictionary\DataDictionaryDiscovery
arguments:
- '@config.factory'
- '@dkan.metastore.service'
- '@dkan.metastore.reference_lookup'
- '@dkan.metastore.url_generator'
Loading

0 comments on commit a148554

Please sign in to comment.