Releases: Sage-Bionetworks/schematic
Schematic Release (v24.1.1)
What's Changed
We have added a Data Model Validator that is currently run during the convert step to check that some key assumptions about the data model are fulfilled, and that is converted properly to the graph network. We have plans to expand the validator in the future and allow it to be run directly.
Users can now supply the CSV data model itself as the data model, instead of the JSONLD.
Breaking Changes: Users will be expected to regenerate their JSON-LD using the new refactor release. Previously generated JSON-LDs will not currently work with the refactored schematic.
Other Changes:
- Schema Refactor: Unit Tests for TestDataModelGraph by @mialy-defelice in #1309
- Tests for DataModelRelationships class by @andrewelamb in #1293
- Schema Refactor: Develop unit tests DataModelJsonSchema by @mialy-defelice in #1307
- Schema Refactor: Unit Tests for DataModelJSONLDParser by @mialy-defelice in #1308
- Refactor Schemas: Unit tests for DataModelJSONLD FDS-1064 by @mialy-defelice in #1313
- Refactor Schemas: Response to review comments to original PR by @mialy-defelice in #1315
- Schema Refactor: Unit Tests for DataModelParser by @mialy-defelice in #1304
- Refactor Schemas: unit tests FDS-1061 TestDataModelNodes by @mialy-defelice in #1310
- Schema Refactor: Tests for Data Model CSV Parser by @mialy-defelice in #1305
- Schema Refactor: Create tests for
DataModelEdges
by @GiaJordan in #1306 - Changes to address JSONLD processing by @mialy-defelice in #1321
- Revert changes related to workflow by @linglp in #1328
- Refactor Schemas: Initial Release BugFix, incorrect name causing issues with Manifest generation -- FDS-1442 by @mialy-defelice in #1338
- Revert "Refactor Schemas: Initial Release BugFix, incorrect name causing issues with Manifest generation -- FDS-1442" by @mialy-defelice in #1343
- Re-add change to change parentOf to subClassOf by @mialy-defelice in #1344
- Refactor Schemas: JSONLD Data Model Parsing, change from using labels to display names, merge develop by @mialy-defelice in #1348
- Refactor Schemas - Latest with Fixes. by @mialy-defelice in #1350
- Schematic Schemas Refactor: DM Parser by @mialy-defelice in #1126
- Schematic release v24.1.1 by @linglp in #1354
Bug Fixes: - Bug fix: Added missing import statements by @linglp in #1327
- Bug Fix: Fix issue with trailing comma creating an empty node by @mialy-defelice in #1329
- Bug fix: Fixed tests in
test_api.py
by @linglp in #1352
Full Changelog: v23.12.1...v24.1.1
Schematic release (v23.12.1)
What's Changed
Bug fixes
- Update dockerfile poetry versions FDS-1325 by @GiaJordan in #1325
- Updated poetry version when building schematic docker image for AWS deployment by @linglp in #1326
- Fixed typing that API uses by @andrewelamb in #1332
- Added
fit=true
parameter when setting a dataframe by @linglp in #1318 - Resolved issues with cross-manifest validation that occured when a manifest was only a single row by @GiaJordan in #1337
- Resolved an issue where a visualization endpoint intermittently returned the wrong information by @GiaJordan in #1336
- Make access token optional to fix CLI manifest submission by @mialy-defelice in #1340
- Remove
great_expectations/expectations/Manifest_test_suite.json
before running validation tests by @linglp in #1342
Features
- Reduce execution time for
Test Schematic
workflow by @GiaJordan in #1320
Others
- Move methods of creating multiple manifests from API to manifest generator by @linglp in #1333
- Update deprecated GX
v0.15.x
functions by @GiaJordan in #1335 - Update manifest unit test for new manifest behavior by @GiaJordan in #1334
- Update
pyproject.toml
notation by @GiaJordan in #1331
Full Changelog: v23.11.1...v23.12.1
v23.11.1
What's Changed
Bug Fixes:
- [BugFix]: pypi publish workflow (v23.9.3) by @andrewelamb in #1301
- [BugFix]: Addresses HTAN-250 & HTAN-258 issues: Add functions to return a display name or requirement if not available. by @mialy-defelice in #1303
- [BugFix]: Address an issue where the asset view was not being set correctly for operations involving manifest validation by @GiaJordan in #1312
New Features:
- Raise a helpful error when a datatype cannot be found in a schema by @GiaJordan in #1302
- Move API
access_token
parameter to request header by @GiaJordan in #1288 - Add
access_token
to/model/validate
header by @GiaJordan in #1311 - Update store/synapse.updateDB to take sg and se by @mialy-defelice in #1291
- Add default value for
access_token
to another validation method FDS-1248 by @GiaJordan in #1319
Technical Debt:
- Redid poetry install in publish workflow by @andrewelamb in #1299
- Update minimum Poetry version required to
1.3.0
FDS-1218 by @GiaJordan in #1316 - Add
--durations
flag to pytest by @GiaJordan in #1223 - Add Component as a required attribute to the example data model by @mialy-defelice in #1259
- Project scope manifest validation tests by @GiaJordan in #1314
- Update schematic dependencies FDS-1312 by @GiaJordan in #1322
- Schematic
v23.11.1
by @GiaJordan in #1317
Full Changelog: v23.9.3...v23.11.1
v23.9.3
What's Changed
- BugFix:
is_class_in_schema
no longer errors out when a class is not in schema by @GiaJordan in #1287 - BugFix: Fix error when
use_annotations=True
for record based metadata where there are no existing annotations by @GiaJordan in #1285 - Bug Fix: Adding annotations to files with -mrt file_only parameter by @mialy-defelice in #1290
- Develop markers for api tests according to credentials required FDS-1026 by @GiaJordan in #1289
Full Changelog: v23.9.2...v23.9.3
v23.9.1
What's Changed
- [bug fix] Renamed function related to setting background color of required columns by @linglp in #1276
- [bug fix] Added includesType parameter when using walk function by @linglp in #1281
- [bug fix] Do not pull annotations when use_annotations set to False and there is no existing manifests by @linglp in #1278
- Add unit test for
add_root_to_component
function by @linglp in #1282 - [bug fix] Skipped submission when running API tests remotely by @linglp in #1280
- Refactored logic when pulling annotations by @linglp in #1279
- [bug fix] Fix issues when generating an existing manifest as an excel sheet by @linglp in #1284
- Schematic release 23.9.1 by @andrewelamb in #1283
Full Changelog: v23.8.1...v23.9.1
v23.8.1
What's Changed
- [bug fix] Updated docker file when building docker images for schematic AWS deployment by @linglp in #1269
- update channel id to
fair-data-team
channel by @GiaJordan in #1272 - [bug fix] Updated manifest download endpoint to avoid querying file view by @linglp in #1270
- [bug fix] Fixed typo in manifest generator by @linglp in #1275
- Add API endpoint that returns the current version of schematic by @GiaJordan in #1277
- [bug fix] Updated .synapseCache, functions to calculate cache, and cleared manifests before each download by @linglp in #1268
- replace issue templates by @allaway in #1274
- Schematic
v23.8.1
FDS-807 by @GiaJordan in #1273
Full Changelog: v23.7.1...v23.8.1
v23.7.1
Release Notes
New Features & Improvements:
- Allow users to set strictness of google sheet/Excel regex validation via the Schematic API. Previously the default schematic configuration of
strict_validation=true
did not allow a user to proceed if an incorrect value was entered. By exposing thestrict_validation
option in the manifest generation api, users are now able to select True or False.strict_validation=false
will allow users to enter an incorrect value, but be served a warning. In either case, incorrect entries will not pass manifest validation. - Breaking change for 23.7.1 - Simplified Schematic configuration - this will require config changes as of 23.7.1
- Please adjust your config file following the documentation example and Readme:
- https://github.com/Sage-Bionetworks/schematic/blob/develop/config_example.yml
- Readme (see
Configure config.yml
File section)
- Optimized the performance in table upsert backwards compatibility by Adding functionality to the synapseStorage object to make calls to the synapse REST API by way of the functionality exposed in the synapsePythonClient.
Bug Fixes:
- Addressed regex search validation limitations in Excel and updated documentation on use.
- Ensure that entity annotations are always compliant with Synapse regardless of column name format to avoid causing an error.
- Addressed bugs to ensure that submission and annotation are occurring as expected
Technical Debt:
- Code doesn't escape the 2nd law of thermodynamics. We put energy into refactoring handling of validation rules and interactions with Synapse (so that adding features and avoiding bugs is easier later); catching errors and exceptions more robustly and specifically (so that users and clients know what's causing a problem and can handle, report, or fix it more effectively); improving coverage of automated testing (so that we reduce the likelihood of letting bugs in released versions of schematic).
For more details on specific changes, please refer to the changelog below.
Full Changelog: v23.6.3...v23.7.1
What's Changed
- Replaced synapse API calls with synapse Python client call by @linglp in #1235
- Feature fds 273 coniguration by @andrewelamb in #1219
- Expose strict_validation option for manifest/generate endpoint by @mialy-defelice in #1253
- Optimize performance in table upsert backwards compatibility scenarios by @GiaJordan in #1229
- Fix df_utils/load_df so it more accurately captures integers to prevent Regex Errors. by @mialy-defelice in #1240
- Use FAIR Data service desk for issues by @afwillia in #1257
- Update Black to 2023 version:
23.7.0
by @GiaJordan in #1262 - Update mypy to latest version by @GiaJordan in #1263
- Updated cli help text for parameter
json_schema
by @linglp in #1264 - Modified Nginx parameters to fix submission issue by @linglp in #1255
- modify tag by @linglp in #1251
- BugFix: Fixed an issue where files on synapse were not being annotated correctly by @GiaJordan in #1254
- [bug fix] Fixed errors when calling
get_empty_manifest
by @linglp in #1260 - [bug fix] Updated parameters being used when generating existing manifests in test_api.py by @linglp in #1261
- Update
schematic_db
minimum version tov0.0.29
by @GiaJordan in #1266 - Fix Google Sheet column info mismatch FDS-675 by @mialy-defelice in #1265
- Only add validation rules to google sheets by @mialy-defelice in #1252
- Make all entity annotations comply with synapse FDS-481 FDS-726 by @GiaJordan in #1267
- Schematic 23.7.1 - FDS-728 by @linglp in #1271
New Contributors
Full Changelog: v23.6.3...v23.7.1
Minor release v23.6.3
What's Changed
- [Bug fix]: Fix regex in docker build workflow by @linglp in #1246
- Minor Release v23.6.3 by @linglp in #1247
- Revert "[Bug fix]: Fix regex in docker build workflow" by @linglp in #1248
- [Bug fix]: Fix regex in docker build workflow by @linglp in #1249
- Minor Release v23.6.3 by @linglp in #1250
Full Changelog: v23.6.2...v23.6.3
v23.6.2
Created a minor release
- Feat: allowed production tag, staging tag, and manual run on existing tags to trigger building docker images for AWS deployment by @linglp in #1241
- Added hide-blanks parameter when submitting manifest through API by @linglp in #1242
- Revert "Feat: allowed production tag, staging tag, and manual run on existing tags to trigger building docker images for AWS deployment" by @linglp in #1243
- feat: allowed production tag, staging tag, and manual run on existing tags to trigger building docker images for AWS deployment by @linglp in #1244
- Minor release v23.6.2 by @linglp in #1245
Full Changelog: v23.6.1...v23.6.2
Release v23.6.1
Release notes
New Features and Enhancements
-
Update and insert (upsert) rows in Synapse tables. This feature allows piece-wise updates to a table in Synapse: a user only needs a csv manifest containing new or changed data/metadata. Given a manifest csv file and a dataset folder on Synapse, schematic will find the associated metadata table for this dataset folder. For each row in the manifest file schematic would check whether the row is already present in the Synapse metadata table. If the row is present, schematic would update it with values from the corresponding manifest row. If the row is not present, schematic will insert it as a new row in the Synapse metadata table. Instructions for using the upsert features via the schematic CLI are here. Note: this feature works differently than the existing table replace option (the default table manipulation option) in schematic. A table replace will substitute the full content of an existing table with the content of a manifest csv file. The latter allows removing rows from the existing table. The upsert feature does not remove existing rows in the table. This feature does not impact users that only work with csv manifest files and do not store metadata in Synapse tables.
-
Adding parameter controlling whether to execute validation rules part of the Great Expectations (GX) suite. GX is great but some rules take a while to load and execute. This is undesirable in certain situation (e.g. large number of data records that need to be validated in real time). A user can now turn off GX validation rules.
-
Standardizing validation error format: previously different types of data validation errors may have had different 'look and feel' to them (in addition to different structure). Validation error format and structure are now standardized which allows users & client apps to reliably process them.
-
New REST API endpoints:
- Retrieving validation rules associated with an attribute in a data model schema: if a schema attribute has a validation rule specifying its type (e.g. int, string, etc.), this endpoint allows retrieving the validation rule and determining the type of the attribute via the schematic REST API. The endpoint retrieves any other validation rules associated with an attribute as well.
- Retrieving the display name associated with an attribute in a data model: aside from machine-friendly labels, attributes in data-model schemas have human-friendly names (aka display names); this endpoint allows retrieving the display name of an attribute given its label.
- Checking if an entity is w/in an asset view (aka fileview in Synapse): this is useful when a user is uncertain whether a dataset has been deleted; users can provide the dataset ID and schematic would check if a dataset with this ID is present.
These endpoints can be accessed by running the schematic REST API locally or deployments on the cloud using schematic version (v23.6.1) or greater.
- Updated REST API web server: previously schematic used the default Flask web server. That was suitable for development, but unreliable for production deployments. The new schematic REST API server (uWSGI) remedies security and performance issues.
Performance improvements:
- Loading a manifest (or other) csv files now takes advantage of multiple processors speeding up loading of large files if the user's machine has multiple cores (the more cores, the larger the speed up).
- REST API calls are profiled and benchmarked against a standard set of inputs (e.g. data models, csv manifests, etc.).
- Validation rules are benchmarked against a standard set of inputs (e.g. data models, csv manifests, etc.).
These benchmarks allow us to detect when feature performance is degraded (or improved) due to an update; they'd also allow us to maintain guarantees on performance in the future.
Bug fixes:
- Data template formatting: catching edge cases and ensuring column headers are aligned with column values; ensuring conditional formatting works as expected in both Excel and Google Sheets templates.
- Ensuring properties of attributes in the data-model schema are properly loaded in schematic: the same property can be reused in multiple attributes (e.g. if the property represents the same concept: name, diagnosis); previously, a property would only be added to one schema attribute. This allows setting up data models for Relation Databases (RDB) where different tables may have columns with the same name (e.g. both Patient and Biospecimen table can have column 'name').
Security fixes:
Updated dependencies, hardened handling of access tokens, among other security and reliability issues allowing schematic to be deployed in secure production environments handling PHI data.
Technical debt:
Code doesn't escape the 2nd law of thermodynamics. We put energy into refactoring handling of validation rules and interactions with Synapse (so that adding features and avoiding bugs is easier later); catching errors and exceptions more robustly and specifically (so that users and clients know what's causing a problem and can handle, report, or fix it more effectively); improving coverage of automated testing (so that we reduce the likelihood of letting bugs in released versions of schematic).
For more details on specific changes, please refer to the changelog below.
What's Changed
- Skip api tests when rule combination tests are run by @GiaJordan in #1068
- Added workflow to deploy schematic docker container in Github container registry by @linglp in #1062
- Remove schematic support for Python
v3.7
andv3.8
by @GiaJordan in #1090 - Refactor table operations structure in asset store by @GiaJordan in #1069
- Added input_token as a parameter for /manifest/get endpoint to fix credential issues when getting an existing manifest on AWS by @linglp in #1080
- Fixed
getProjectManifests
function in synapse storage by @linglp in #1084 - Develop api node display names by @mialy-defelice in #1094
- Create API endpoint for get_node_validation_rules by @mialy-defelice in #1095
- Update schematic dependencies by @GiaJordan in #1092
- Raise
errors
forwrong schema
errors by @GiaJordan in #1073 - Set default of "table_manipulation" as "replace" in API endpoint when users enter None and updated tests by @linglp in #1115
- Update
synapseClient
dependency and api for manifest table uploads by @GiaJordan in #1101 - set pyopenssl = "^23.0.0" by @andrewelamb in #1125
- added date GE rule by @andrewelamb in #1103
- Implement table upsert feature by using
schematic-db
by @GiaJordan in #1081 - Add
use_schema_label
parameter to manifest submission endpoint, separate manifest submission and table upsert tests by @GiaJordan in #1129 - Delete GE checkpoint after completion of GE validation by @GiaJordan in #1136
- Remove
try: catch:
block from manifest submission command function by @GiaJordan in #1130 - Save all properties that are Included in the domain of a Class by @mialy-defelice in #1134
- Display exceptions raised during validation with Great expectations, allow exclusion of upper bound OR lower bound for
inRange
rule by @GiaJordan in #1131 - Update Documentation - python/package versions and POCs by @GiaJordan in #1139
- Increase buffer size to a higher limit to deal with long token by @linglp in #1144
- lock
schematic-db
to version0.0.6
by @GiaJordan in #1145 - use try: finally: to delete checkpoint even if running the checkpoint fails or errors out by @GiaJordan in #1155
- Allowed CORS on given routes instead of all routes by @linglp in #1168
- Added restrict rules param to
manifest/validate
by @linglp in #1178 - Bug Fix: remedy negation of table manipulation specification by @GiaJordan in #1186
- Added an endpoint to check entity type on Synapse and an endpoint to check if an entity is in the asset view by @linglp in #1078
- add restrict rules control to manifest validate by @linglp in #1189
- Added a parameter to control if GE gets used when using
manifest/validate
endpoint by @linglp in #1177 - Propagate logger level entered in from command line to other schematic submodules by @GiaJordan in https://github.com/Sage-Bion...