All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog 1.0.0.
- FCL-568: add new class for Press Summary identifiers
- deps: update dependency mypy-boto3-s3 to v1.35.92
- deps: update dependency boto3 to v1.35.91
- deps: update dependency boto3 to v1.35.88
- deps: update dependency charset-normalizer to v3.4.1
- deps: update dependency boto3 to v1.35.87
- deps: update dependency boto3 to v1.35.85
- Identifiers: preferred identifier now correctly handles case where there are none of type
- Identifiers: fix case where unpacking unknown identifier type would raise an exception
- deps: update dependency mypy-boto3-s3 to v1.35.81
- deps: update dependency boto3 to v1.35.82
- Methods which were previously guaranteed to return a Neutral Citation may now return
None
.
- FCL-533: getting scored or preferred identifiers can now be done by type
- FCL-533: modify human identifier to rely on identifiers framework
- FCL-533: add scoring to Identifiers
- IdentifierSchema: use hasattr instead of getattr with a default when testing required attributes
- FCL-532: assign FCLIDs on document publication
- FCL-532: add ability to retrieve identifiers by type
- FCL-499: add new FCLID identifier class
- FCL-499: add method to get next sequence number from MarkLogic
- deps: update dependency certifi to >=2024.12.14,<2024.13.0
- deps: update dependency boto3 to v1.35.80
- FCL-309: identifier UUIDs are now prefixed with 'id-'
- FCL-309: identifiers can compile URL slugs
- FCL-309: identifiers can now be saved to and retrieved from MarkLogic
- FCL-309: add functionality for packing and unpacking XML representations of identifiers
- FCL-309: add stub for defining identifier schemas, and a Neutral Citation schema
- FCL-309: add ability to add, delete and update identifiers
- deps: update boto packages to v1.35.69
- deps: update dependency ds-caselaw-utils to v2.0.1
- deps: update dependency mypy-boto3-sns to v1.35.68
- deps: update boto packages to v1.35.67
- deps: update dependency boto3 to v1.35.64
- deps: update boto packages to v1.35.61
- deps: update dependency boto3 to v1.35.77
- deps: update dependency mypy-boto3-s3 to v1.35.76
- deps: update dependency boto3 to v1.35.75
- deps: update boto packages to v1.35.72
- Code which provided unsanitised URIs when initialising
DocumentURIStrings
will now causeInvalidDocumentURIException
s to be raised. - Document can now no longer be initialised with a string as the
uri
, it must be aDocumentURIString
.
- Validate strings when creating a new DocumentURIString
- deps: update dependency boto3 to v1.35.58
- deps: update dependency boto3 to v1.35.56
- Document: initialising a Document now requires a DocumentURIString, not a str
- tests: simpler test changes to pass type checking
- Require documents to be published before bulk enrichment will enrich them
- Add logging of xquery commands and values passed to them if DEBUG environment set
- FCL-386: search query can now be passed to get_document_by_uri
- FCL-396: query highlighting is now done as a function of requesting the Document
- deps: update dependency boto3 to v1.35.48
- deps: update dependency mypy-boto3-s3 to v1.35.45
- deps: update dependency boto3 to v1.35.45
- FCL-396: tidy up API implementation for search query highlighting change
- Feature: Add native XSLT transformations to the API
- Allow things on doc.body to be called from doc with a warning
- client.checkout_judgment now accepts a
timeout_seconds
parameter - Allow test failures for Python 3.13/3.14
- Ensure Judgment- and PressSummaryFactory have working NCNs
- Fix
.content_as_html
on Document Factory
- Remove
Document.overwrite
andMarkLogicApiClient.overwrite
- The
models.documents.body.CourtIdentifierString
type has been replaced with the more specificcourts.CourtCode
type from ds-caselaw-utils.
- NeutralCitationMixin: use ABC to flag abstract methods properly
- deps: update dependency boto3 to v1.35.33
- deps: update dependency mypy-boto3-s3 to v1.35.32
- deps: update dependency boto3 to v1.35.30
- SearchResponse: total now returns an int, not a str
- SearchResult: update behaviour to meet type checking
- deps: update dependency ds-caselaw-utils to v1.7.0
- deps: update dependency boto3 to v1.35.28
- deps: update dependency ds-caselaw-utils to v1.5.7
- FCL-331: move api_client, xml and html params to build method signature instead of kwargs
- types: typing improvements around NeutralCitationString
- Document: remove unused overwrite method
- DocumentBody: replace CourtIdentifierString with utils.courts.CourtCode
- Multiple methods which used to be within
Document
are now inDocument.body
- FCL-268: break functions which rely on the document body into their own subclass
- FCL-268: update factory behaviour to match new document body model
- FCL-268: use real date when testing if document date should be sent in reparse payload
- deps: update dependency boto3 to v1.35.23
- FCL-268: move document statuses to their own submodule
- FCL-268: move document exceptions into their own submodule
- FCL-268: move XML manipulation into its own file
- FCL-268: move the documents module in readiness for better code separation
- Breaking: Remove xml_tools
- Multiple stylistic improvements, and enabling ruff to allow us to keep standards up in future
- Truncate reparse references to avoid overlong step function names in TRE
- Always set last sent date to parser, even on failed parses
- [FCL-176] Tooling configuration audit
- [FCL-195] Skip pre-commit branch check in CI
- Make enrichment date maths not care about timezones
- Remove explicit urllib3 v1 dependency, rely on implicit dependency only
- Remove fclex_id prefix from UUID of reparse execution ID
- Implement handling of facets received from MarkLogic search results
- Add an
enriched_recently
property - Add a
validates_against_schema
property - Add a
can_enrich
property - Only enrich if not recently enriched and valid against current schema
- Allow fetching linked documents for
Judgement
s andPressSummary
s - Add function to check if the docx exists for a judgment
- Add a method to allow fetching press summaries for a given document
- Ensure that we log a warning and do not error when a judgment has an unrecognised jurisdiction
- Expose court jurisdictions in search results
- Breaking:
Client.get_pending_enrichment_for_version
now requires both a target enrichment version and a target parser version, and will not include documents which have not been parsed with the target version. - Feature: Add accessors for judgment jurisdiction
- Feature: New
Client.get_pending_parse_for_version
andClient.get_highest_parser_version
methods to help find documents in need of re-parsing. - Breaking:
Client.get_pending_enrichment_for_version
now accepts a tuple of(major_version, minor_version)
rather than a single major version.
- Add support for quoted phrase prioritisation in result snippets
- Breaking:
Client.set_published
no longer has a default argument; you must always be explicit. - Feature: New
Client.get_pending_enrichment_for_version
method finds documents which are not yet enriched with a given version, and which haven't recently been sent for enrichment.
- Breaking: Fully remove the deprecated
caselawclient.api_client
instance. - Breaking: Remove top-level methods for interacting with a document's XML representation. These are now all encapsulated in
document.xml
, which is an instance ofDocument.XML
. - Feature: New
Document.xml_root_element
function to replaceget_judgment_root
- Feature: Documents which are not valid XML are now identified by the raising of a new
Document.NonXMLDocumentError
exception - Feature: Add method to return document's lock status and message.
- Feature:
Document.enrich()
method will send a message to the announce SNS, requesting that a document be enriched.
document.content_as_html
now takes an optionalquery=
string parameter, which, when supplied, highlights instances of the query within the document with<mark>
tags, each of which has a numbered id indicating its sequence in the document.document.number_of_mentions
method which takes aquery=
string parameter, and returns the number of highlighted mentions in the html.
- New
Client.get_combined_stats_table
method to run a combined statistics query against MarkLogic.
- BREAKING:
VersionAnnotation
now requires a statement of if the action is automated or not VersionAnnotation
can now accept an optional dict of structuredpayload
dataVersionAnnotation
can now record a user agent string
- New versions of a document created with
insert_document_xml
can now be annotated - BREAKING: Renamed
save_judgment_xml
toupdate_document_xml
- BREAKING: All annotations for versions are now mandatory instances of the new
VersionAnnotation
class
- Expose the creation date of a version
- Get version annotation for a single document
- Expose the type of the latest manifestation date of a document
- Search results for press summaries now include NCNs
- Search results now correctly include document status information
- Latest manifestation datetime is available for documents (including versions)
- Bugfix: document_date_as_date shouldn't fail hard if we can't parse it.
-
Changed
is_failure
to rely onfailed_to_parse
, rather thanfailure
in the URI. -
Added
transformation_datetime
toDocument
-
Added
enrichment_datetime
toDocument
-
Added
get_manifestation_datetimes
toDocument
-
Added
get_latest_manifestation_datetime
toDocument
-
Added
versions_as_documents
toDocument
-
Added
is_version
toDocument
-
Added
version_number
toDocument
- Add default user agent string
- Add functions for overwriting and moving judgments
- Fixed
neutral_citation
property to look withinpreface
tag rather thanmainBody
for press summaries, due to updated parsing resulting in updated press summary xml structure. - Added
python-dotenv
as a poetrydev
dependency to be able to run the newsmoketest.py
file that connects to a MarkLogic instance.
- Fixed
Client.set_document_court
method - Fixed
Client.get_document_type_from_uri
method
- Breaking:: Removed
document.is_editable
in favour of the more descriptive and better-testeddocument.failed_to_parse
. - Add new
Document.delete()
method. - Generalised the set judgment metadata methods to set document metadata methods specifically for name, court and date.
- Fix issues blocking push to PyPI
- Add a "Best human identifier" to Documents
- Added
get_judgment_xml_bytestring
andcontent_as_xml_bytestring
toClient
- Fixed
content_as_xml_tree
by making it usecontent_as_xml_bytestring
- Made
Document
class'name
,court
,document_date_as_string
anddocument_date_as_date
work for Press Summaries also. - Added
neutral_citation
property and validation toPressSummary
class. - Significant improvements to inline documentation of the code.
- Deprecated: The
caselawclient.api_client
instance should be considered deprecated. Projects should instead initialise their own instance.
-
supplemental/anonymous/sensitive getters/setters removed
-
XQueries which return multiple responses will raise an error
-
Refactored
Document
class'name
,court
,document_date_as_string
anddocument_date_as_date
(previously judgmentdate...) on Document class and neutral_citation on Judgment class making use of the new cachedcontent_as_xml_tree
property. -
Renamed
judgment_date_as_string
judgment_date_as_date
todocument_date_as_string
anddocument_date_as_date
respectively. -
Added
content_as_xml_tree
cached property toDocument
class -
Changed the
Document
class'content_as_xml
to be a cached_property also. [Note: this changelog line previously mistakenly referred tocontent_as_html
.] -
Removed
get_judgment_name
,get_judgment_citation
,get_judgment_court
,get_judgment_work_date
from theClient
class and associated.xqy
files. -
Add a new
MarklogicApiClient.get_document_by_uri
method to retrieve a document (of any type) by URI. -
New
get_document_by_uri
method on API client returning unique types forJudgment
s andPressSummary
s. -
New
Document.enrich()
method to trigger enrichment
- Breaking: Renamed
Judgment
toDocument
- Breaking:
Document.judgment_exists
is nowDocument.document_exists
- Check for a valid court, rather than an present one
- Trim whitespace when trying to set an NCN
- Breaking: Renamed
copy_judgment
tocopy_document
copy_document
now adds the document to the appropriate collection based on the uri.
Judgment.validation_failure_messages
method for retrieving a list of strings with reasons a judgment cannot be published.
- Fixed
insert_document_xml
to pattern match uris with and add documents topress-summary
, notpress_summary
.
- BREAKING: Renamed
insert_judgment_xml
toinsert_document_xml
and enhanced it to place a document in the appropriate collection (press_summary
orjudgment
)
- BREAKING: Changed
SearchParameters
dataclass field fromq
toquery
- Added
search_helpers
module to allow clients to search and process document search responses in one go.
- Added
SearchParameters
dataclass for use with search functions using the legacy kwargs fromClient.advanced_search
and newcollections
field for filtering by collections - BREAKING: Changed
Client.advanced_search
interface to take inSearchParameters
as opposed to the legacy kwargs. - Added
search_and_decode_response
andsearch_judgments_and_decode_response
methods toClient
- Added
SearchResponse
,SearchResult
,SearchResultMetadata
classes to encapsulate and process document search responses.
- BREAKING: Instantiating a
Judgment
object will now raise acaselawclient.errors.JudgmentNotFoundError
if the uri passed in does not correspond to a valid Judgment, rather than attempting (and failing) to return aMarklogicResourceNotFoundError
- Added
judgment_exists
method toClient
class - Make version_uri optional in Judgment.content_as_html
- Ensure XSLT_IMAGE_LOCATION existing doesn't break tests
- Improve detection of when a judgment doesn't exist
- Unlock judgment on Judgment.unpublish() so editors can unpublish immediately after a publish
Judgment.publish
method will now reject publication in more invalid states (must have a name, must have a valid NCN, must have a court code).- Less strict version pinning of dependencies to give downstream package users more flexibility in resolving.
- Significantly more type annotations on
Client
andJudgment
methods, including some which are stricter than before. This is potentially a breaking change if implementations have been relying on duck typing. - Automatic generation of strict typing for XQuery files which run against MarkLogic.
- Improvements to the methods used in content hashing, which will be breaking changes if these are used downstream.
- Correct import location used in Judgment model, so it's usable when packaged
- Fix broken build process
Release 5.3.0 (Yanked)
- Dependabot now updates dependencies for all new versions, not just security updates
- Use Poetry for dependency management, to improve robustness
- Add a
Judgment
class (copied from Editor Interface) to begin the process of harmonising how various services interface with the data
- Add code coverage reporting to CI
- Make a PEP-561 declaration of typing
- Re-add the code that was pointing the XSLT to the assets
- This release had a bug, fixed by 5.2.5
- HTML view: Do not default to current version if the version doesn't exist (cause an error instead)
- Add content hash validation when we save a locked judgment
- Bug fix: setting court was not valid XQuery in eval context
- Improvements to code linting
- Expose hash of judgment content
- Unset the court tag where the court is an empty string
- Clarify release process documentation
- Add pypi version badge and libraries.io dependency shield
- Expose MarklogicValidationFailed exception
- Validate against a schema when priv API document is uploaded
- Add CodeQL configuration
- Add a check for secrets
- Bump certifi from 2021.10.8 to 2022.12.7
- Don't crash if multipart data is actually an empty bytestring.
** This release had a bug where the Editor UI was unusable. **
- Ensure Work Date and Court values are returned as text
- Get properties for a range of URIs for use in search results
- Remove a debug
print()
statement that was missed - Admin users can't read unpublished judgments
- Deprecate XMLTools methods
- Fix
TypeError: 'type' object is not subscriptable
- Breaking change: passes a list of zero-or-more courts, rather than a string that might be empty.
- Search queries: pages less than one are treated as one
- Add linting
- Ensure only people who are allowed to view unpublished judgments can view them
- Refactor tests
- Break judgment checkout
- Methods & XQueries to get & set all metadata
- DRY up some aspects of the API Client
- Support renaming of the XSL Transformation files
- Speed up privilege checking
- Add user_has_privilege method & XQuery to check if a user has a privilege
- Use
user_has_privilege
to check if a user can see unpublished documents - Move error message codes and messages into this client
- New errors handled from Marklogic
- New function to save XML for a locked judgment
- Fix: add external declaration to XQuery parameter
- Bump version of requests to 2.28.1
- Raise error if unpublished document is not returned
- Use -1 as value meaning 'lock forever' in checkout_judgment
- Return none if the judgment is not locked, rather than an empty string
- Add optional annotation parameter to
checkout_judgment
method - Add method to get the lock/checkout status of a judgment
- Judgment checkout may optionally expire at midnight
- Gracefully handle a null, empty or unexpected error response from Marklogic
- Rename set_judgment_date to set_judgment_work_expression_date
- Update the FRBRWork and FRBRExpression dates and @name attributes
- Fix a typo in setting the internal URI of a judgment
- Change the XQuery delete method from xdmp:document-delete to dls:document-delete
- Change the behaviour of 'last-modified' dates to use prop:last-modified rather than xdmp:document-timestamp
- Set the judgment's internal URI (
FRBRthis
andFRBRuri
nodes)
- Allow the xsl filename used in the judgment transformation to vary. We have two xsls available in Marklogic -
judgment2
(the accessible version) andjudgment0
(the "as handed down" version). Add two helper methodsaccessible_judgment_transformation()
andoriginal_judgment_transformation()
to call these transformations without specifying the xsl filename. - Copy judgment from URI to URI
- Adds a new
delete_judgment
endpoint, for deleting a judgment from marklogic
- Create a new
akn:FRBRdate
,uk:cite
anduk:court
nodes for the judgment metadata, if they do not exist
- Create a new
akn:FRBRname
node for the judgment metadata name, if one does not exist
- Patch release to update
setup.cfg
, which was missed from v4.5.0
- Allow metadata elements (name, date, court and citation) to be edited in the XML via XQuery, not by deserialising and serialising the XML in the implementing client code.
- If an element doesn't exist in a document,
xml_tools.get_element
tries to return an empty element with the same name as the desired element.
- Add function to retrieve the last time a document or it's properties was updated
- Add neutral citation and specific keyword search parameters to advanced search
- Use error code from eval response body to throw MarklogicResourceNotFound errors
- Parameterize the location of images in the XSLT transformation
- Use the
invoke
endpoint, and thesearch.xqy
stored on Marklogic, to search - Remove the
database
parameter ineval
, it's not required; the db associated with the REST server is used
- Add flag to advanced_search to enable filtering out published documents from the search
- Add function to get and set text properties on a document
- Fix insert_document xquery to call the document-insert-and-manage function
- Replace LXML with standard library xml for wider compatibility and reduced build times
- Add a new anonymised content flag
- Fix intermittently failing XSLT transforms
- Refactor save_judgment_xml to use the eval endpoint, so that we can introduce versioning via na XQuery.
- List all versions of a managed judgment
- Get a version of a managed judgment
- Restrict search to managed judgments only
- Set properties on a judgment using the dls namespace, not xdmp
- Insert & manage a new document
- Check in and check out a document for editing
- Use document properties on the "original" version of the judgment, not its version, to see if a judgment is published
- Minor bugfixes
- Refactored property accessor methods
- BREAKING CHANGE
is_document_published
changed toget_published
andpublish_document
changed toset_published
.
- Initial tagged release