-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve identifier URIs to MarkLogic URIs #794
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
import json | ||
from typing import NamedTuple | ||
|
||
from caselawclient.models.documents import DocumentURIString | ||
from caselawclient.xquery_type_dicts import MarkLogicDocumentURIString | ||
|
||
|
||
class IdentifierResolutions(list["IdentifierResolution"]): | ||
""" | ||
A list of candidate MarkLogic documents which correspond to a Public UI uri | ||
|
||
MarkLogic returns a list of dictionaries; IdentifierResolution handles a single dictionary | ||
which corresponds to a single identifier to MarkLogic document mapping. | ||
|
||
see `xquery/resolve_from_identifier.xqy` and `resolve_from_identifier` in `Client.py` | ||
""" | ||
|
||
@staticmethod | ||
def from_marklogic_output(table: list[str]) -> "IdentifierResolutions": | ||
return IdentifierResolutions(list(IdentifierResolution.from_marklogic_output(row) for row in table)) | ||
|
||
def published(self) -> "IdentifierResolutions": | ||
"Filter the list so that only published documents are returned" | ||
return IdentifierResolutions(list(x for x in self if x.document_published)) | ||
|
||
|
||
class IdentifierResolution(NamedTuple): | ||
dragon-dxw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""A single response from MarkLogic about a single identifier / document mapping""" | ||
|
||
identifier_uuid: str | ||
document_uri: MarkLogicDocumentURIString | ||
identifier_slug: DocumentURIString | ||
document_published: bool | ||
|
||
@staticmethod | ||
def from_marklogic_output(raw_row: str) -> "IdentifierResolution": | ||
row = json.loads(raw_row) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be a candidate for refactoring in future if we do much more with TDE; possibly one to touch on when we improve how EUI reports on things in various pending states, or when we do exception reporting. |
||
return IdentifierResolution( | ||
identifier_uuid=row["documents.compiled_url_slugs.identifier_uuid"], | ||
document_uri=MarkLogicDocumentURIString(row["documents.compiled_url_slugs.document_uri"]), | ||
identifier_slug=DocumentURIString(row["documents.compiled_url_slugs.identifier_slug"]), | ||
document_published=row["documents.compiled_url_slugs.document_published"] == "true", | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
xquery version "1.0-ml"; | ||
|
||
declare namespace xdmp="http://marklogic.com/xdmp"; | ||
declare variable $identifier_uri as xs:string external; | ||
declare variable $published_only as xs:int? external := 1; | ||
|
||
let $published_query := if ($published_only) then " AND document_published = 'true'" else "" | ||
let $query := "SELECT * from compiled_url_slugs WHERE (identifier_slug = @uri)" || $published_query | ||
|
||
return xdmp:sql( | ||
$query, | ||
"map", | ||
map:new(( | ||
map:entry("uri", $identifier_uri) | ||
)) | ||
) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
from caselawclient.identifier_resolution import IdentifierResolutions | ||
|
||
raw_marklogic_resolutions = [ | ||
""" | ||
{"documents.compiled_url_slugs.identifier_uuid":"24b9a384-8bcf-4f20-996a-5c318f8dc657", | ||
"documents.compiled_url_slugs.document_uri":"/ewca/civ/2003/547.xml", | ||
"documents.compiled_url_slugs.identifier_slug":"ewca/civ/2003/54721", | ||
"documents.compiled_url_slugs.document_published":"false"} | ||
""", | ||
""" | ||
{"documents.compiled_url_slugs.identifier_uuid":"x", | ||
"documents.compiled_url_slugs.document_uri":"x", | ||
"documents.compiled_url_slugs.identifier_slug":"x", | ||
"documents.compiled_url_slugs.document_published":"true"} | ||
""", | ||
] | ||
|
||
|
||
def test_decoded_identifier(): | ||
decoded_resolutions = IdentifierResolutions.from_marklogic_output(raw_marklogic_resolutions) | ||
res = decoded_resolutions[0] | ||
assert res.identifier_uuid == "24b9a384-8bcf-4f20-996a-5c318f8dc657" | ||
assert res.document_uri == "/ewca/civ/2003/547.xml" | ||
assert res.identifier_slug == "ewca/civ/2003/54721" | ||
assert res.document_published == False # noqa: E712 | ||
|
||
|
||
def test_published(): | ||
decoded_resolutions = IdentifierResolutions.from_marklogic_output(raw_marklogic_resolutions) | ||
assert len(decoded_resolutions.published()) == 1 | ||
assert decoded_resolutions.published()[0] == decoded_resolutions[1] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit conflicted about where this file lives -- I think this is better than being in
models.identifiers
. Maybe it should be inmodels.identifier_resolution
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this isn't a data model - current location seems fine to me.