From 2840c662a6878bd873bb176d7b953eab91d174b1 Mon Sep 17 00:00:00 2001 From: Jenny Medina Date: Tue, 24 Sep 2024 15:48:56 -0400 Subject: [PATCH 01/34] initial edits --- README.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index cf1cd96f6..ecdc7a0e6 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,10 @@ # Schematic [![Build Status](https://img.shields.io/endpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2FSage-Bionetworks%2Fschematic%2Fbadge%3Fref%3Ddevelop&style=flat)](https://actions-badge.atrox.dev/Sage-Bionetworks/schematic/goto?ref=develop) [![Documentation Status](https://readthedocs.org/projects/sage-schematic/badge/?version=develop)](https://sage-schematic.readthedocs.io/en/develop/?badge=develop) [![PyPI version](https://badge.fury.io/py/schematicpy.svg)](https://badge.fury.io/py/schematicpy) -# Table of contents +# TLDR +Under Construction. + +# Table of Contents - [Schematic](#schematic) - [Table of contents](#table-of-contents) - [Introduction](#introduction) @@ -36,16 +39,19 @@ SCHEMATIC is an acronym for _Schema Engine for Manifest Ingress and Curation_. T Note: Our credential policy for Google credentials in order to create Google sheet files from Schematic, see tutorial ['HERE'](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA). If you plan to use `config.yml`, please ensure that the path of `schematic_service_account_creds.json` is indicated there (see `google_sheets > service_account_creds` section) -## Installation guide for Schematic CLI users -1. **Verifying Python Version Compatibility** +## Installation Guide: For Schematic CLI users + +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation-Requirements](). -To ensure compatibility with Schematic, please follow these steps: +### Verify your python version -Check your own Python version: +Ensure your python version meets the requirements from the [Installation-Requirements]() using the following command: ``` python3 --version ``` +> !Note: +> You can double-check the current supported python version by opening up the [pyproject.toml](https://github.com/Sage-Bionetworks/schematic/blob/main/pyproject.toml#L39) file in this repository. Check the Supported Python Version: Open the pyproject.toml file in the Schematic repository to find the version of Python that is supported. You can view this file directly on GitHub [here](https://github.com/Sage-Bionetworks/schematic/blob/main/pyproject.toml#L39). Switching Python Versions: If your current Python version is not supported by Schematic, you can switch to the supported version using tools like [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#switch-between-python-versions). Follow the instructions in the pyenv documentation to install and switch between Python versions easily. From 693e4735b949d291409a8107348876eab8b02e22 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 24 Sep 2024 16:11:31 -0400 Subject: [PATCH 02/34] Restructure: Moving things under contribution guidelines --- README.md | 67 ++++++++++++++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 35 deletions(-) diff --git a/README.md b/README.md index ecdc7a0e6..463353762 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # Schematic [![Build Status](https://img.shields.io/endpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2FSage-Bionetworks%2Fschematic%2Fbadge%3Fref%3Ddevelop&style=flat)](https://actions-badge.atrox.dev/Sage-Bionetworks/schematic/goto?ref=develop) [![Documentation Status](https://readthedocs.org/projects/sage-schematic/badge/?version=develop)](https://sage-schematic.readthedocs.io/en/develop/?badge=develop) [![PyPI version](https://badge.fury.io/py/schematicpy.svg)](https://badge.fury.io/py/schematicpy) -# TLDR +# TL;DR Under Construction. # Table of Contents @@ -39,7 +39,7 @@ SCHEMATIC is an acronym for _Schema Engine for Manifest Ingress and Curation_. T Note: Our credential policy for Google credentials in order to create Google sheet files from Schematic, see tutorial ['HERE'](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA). If you plan to use `config.yml`, please ensure that the path of `schematic_service_account_creds.json` is indicated there (see `google_sheets > service_account_creds` section) -## Installation Guide: For Schematic CLI users +## Installation Guide For: Schematic CLI users The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation-Requirements](). @@ -56,7 +56,7 @@ Check the Supported Python Version: Open the pyproject.toml file in the Schemati Switching Python Versions: If your current Python version is not supported by Schematic, you can switch to the supported version using tools like [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#switch-between-python-versions). Follow the instructions in the pyenv documentation to install and switch between Python versions easily. -2. **Setting Up the Virtual Environment** +### 2. Set up your virtual environment After switching to the version of Python supported by Schematic, please activate a virtual environment within which you can install the package: ``` @@ -65,7 +65,7 @@ source .venv/bin/activate ``` Note: Python 3 has built-in support for virtual environments with the venv module, so you no longer need to install virtualenv. -3. **Installing Schematic** +### 3. Install `schematic` dependencies Install the package using [pip](https://pip.pypa.io/en/stable/quickstart/): @@ -79,7 +79,7 @@ If you run into error: Failed building wheel for numpy, the error might be able pip3 install --upgrade pip ``` -## Installation guide for developers/contributors +## Installation Guide For: Contributors When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change. @@ -214,30 +214,6 @@ $ pre-commit install pre-commit installed at .git/hooks/pre-commit ``` -### Development process instruction - -For new features, bugs, enhancements - -1. Pull the latest code from [develop branch in the upstream repo](https://github.com/Sage-Bionetworks/schematic) -2. Checkout a new branch develop- from the develop branch -3. Do development on branch develop- - a. may need to ensure that schematic poetry toml and lock files are compatible with your local environment -4. Add changed files for tracking and commit changes using [best practices](https://www.perforce.com/blog/vcs/git-best-practices-git-commit) -5. Have granular commits: not “too many” file changes, and not hundreds of code lines of changes -6. Commits with work in progress are encouraged: - a. add WIP to the beginning of the commit message for “Work In Progress” commits -7. Keep commit messages descriptive but less than a page long, see best practices -8. Push code to develop- in upstream repo -9. Branch out off develop- if needed to work on multiple features associated with the same code base -10. After feature work is complete and before creating a PR to the develop branch in upstream - a. ensure that code runs locally - b. test for logical correctness locally - c. wait for git workflow to complete (e.g. tests are run) on github -11. Create a PR from develop- into the develop branch of the upstream repo -12. Request a code review on the PR -13. Once code is approved merge in the develop branch -14. Delete the develop- branch - *Note*: Make sure you have the latest version of the `develop` branch on your local machine. ### Example For REST API
@@ -296,7 +272,30 @@ docker run -v %cd%:/schematic \ -c config.yml validate -mp tests/data/mock_manifests/inValid_Test_Manifest.csv -dt MockComponent -js /schematic/data/example.model.jsonld ``` -# Other Contribution Guidelines +# Contribution Guidelines +### Development process instruction + +For new features, bugs, enhancements + +1. Pull the latest code from [develop branch in the upstream repo](https://github.com/Sage-Bionetworks/schematic) +2. Checkout a new branch develop- from the develop branch +3. Do development on branch develop- + a. may need to ensure that schematic poetry toml and lock files are compatible with your local environment +4. Add changed files for tracking and commit changes using [best practices](https://www.perforce.com/blog/vcs/git-best-practices-git-commit) +5. Have granular commits: not “too many” file changes, and not hundreds of code lines of changes +6. Commits with work in progress are encouraged: + a. add WIP to the beginning of the commit message for “Work In Progress” commits +7. Keep commit messages descriptive but less than a page long, see best practices +8. Push code to develop- in upstream repo +9. Branch out off develop- if needed to work on multiple features associated with the same code base +10. After feature work is complete and before creating a PR to the develop branch in upstream + a. ensure that code runs locally + b. test for logical correctness locally + c. wait for git workflow to complete (e.g. tests are run) on github +11. Create a PR from develop- into the develop branch of the upstream repo +12. Request a code review on the PR +13. Once code is approved merge in the develop branch +14. Delete the develop- branch ## Updating readthedocs documentation 1. `cd docs` 2. After making relevant changes, you could run the `make html` command to re-generate the `build` folder. @@ -345,9 +344,7 @@ schematic model -c /path/to/config.yml submit -mp -d Date: Wed, 25 Sep 2024 16:19:30 -0400 Subject: [PATCH 03/34] Update README.md --- README.md | 152 +++++++++++++++++++++++------------------------------- 1 file changed, 64 insertions(+), 88 deletions(-) diff --git a/README.md b/README.md index 463353762..fc933b58d 100644 --- a/README.md +++ b/README.md @@ -34,36 +34,35 @@ SCHEMATIC is an acronym for _Schema Engine for Manifest Ingress and Curation_. T # Installation ## Installation Requirements -* Python version 3.9.0≤x<3.11.0 +* Your installed python version must be 3.9.0 ≤ version < 3.11.0 * You need to be a registered and certified user on [`synapse.org`](https://www.synapse.org/) -Note: Our credential policy for Google credentials in order to create Google sheet files from Schematic, see tutorial ['HERE'](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA). If you plan to use `config.yml`, please ensure that the path of `schematic_service_account_creds.json` is indicated there (see `google_sheets > service_account_creds` section) +> [!NOTE] +> To create Google Sheets files from Schematic, please follow our credential policy for Google credentials. You can find a detailed tutorial [here](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA). +> If you're using config.yml, make sure to specify the path to `schematic_service_account_creds.json` (see the `google_sheets > service_account_creds` section for more information). ## Installation Guide For: Schematic CLI users -The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation-Requirements](). +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. -### Verify your python version +### 1. Verify your python version -Ensure your python version meets the requirements from the [Installation-Requirements]() using the following command: +Ensure your python version meets the requirements from the [Installation Requirements](#installation-requirements) section using the following command: ``` python3 --version ``` +If your current Python version is not supported by Schematic, you can switch to the supported version using a tool like [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#switch-between-python-versions). Follow the instructions in the pyenv documentation to install and switch between Python versions easily. -> !Note: -> You can double-check the current supported python version by opening up the [pyproject.toml](https://github.com/Sage-Bionetworks/schematic/blob/main/pyproject.toml#L39) file in this repository. -Check the Supported Python Version: Open the pyproject.toml file in the Schematic repository to find the version of Python that is supported. You can view this file directly on GitHub [here](https://github.com/Sage-Bionetworks/schematic/blob/main/pyproject.toml#L39). - -Switching Python Versions: If your current Python version is not supported by Schematic, you can switch to the supported version using tools like [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#switch-between-python-versions). Follow the instructions in the pyenv documentation to install and switch between Python versions easily. +> [!NOTE] +> You can double-check the current supported python version by opening up the [pyproject.toml](https://github.com/Sage-Bionetworks/schematic/blob/main/pyproject.toml#L39) file in this repository and find the supported versions of python in the script. ### 2. Set up your virtual environment -After switching to the version of Python supported by Schematic, please activate a virtual environment within which you can install the package: +Once you are working with a python version supported by Schematic, please activate a virtual environment within which you can install the package. Python 3 has built-in support for virtual environments with the `venv` module, so you no longer need to install `virtualenv`: ``` python3 -m venv .venv source .venv/bin/activate ``` -Note: Python 3 has built-in support for virtual environments with the venv module, so you no longer need to install virtualenv. ### 3. Install `schematic` dependencies @@ -73,7 +72,7 @@ Install the package using [pip](https://pip.pypa.io/en/stable/quickstart/): python3 -m pip install schematicpy ``` -If you run into error: Failed building wheel for numpy, the error might be able to resolve by upgrading pip. Please try to upgrade pip by: +If you run into `ERROR: Failed building wheel for numpy`, the error might be able to resolve by upgrading pip. Please try to upgrade pip by: ``` pip3 install --upgrade pip @@ -85,92 +84,51 @@ When contributing to this repository, please first discuss the change you wish t Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project. -### Development environment setup -1. Clone the `schematic` package repository. +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. + +### 1. Clone the `schematic` package repository + +For development, you will be working with the latest version of `schematic` on the repository to ensure compatibility between its latest state and your changes. Ensure your current working directory is where +you would like to store your local fork before running the following command: + ``` git clone https://github.com/Sage-Bionetworks/schematic.git ``` -2. Install `poetry` (version 1.3.0 or later) using either the [official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or [pipx](https://python-poetry.org/docs/#installing-with-pipx). If you have an older installation of Poetry, we recommend uninstalling it first. -3. Start the virtual environment by doing: +### 2. Install `poetry` + +Install `poetry` (version 1.3.0 or later) using either the [official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or [pipx](https://python-poetry.org/docs/#installing-with-pipx). If you have an older installation of Poetry, we recommend uninstalling it first. + +### 3. Start the virtual environment + +Initialize the virtual environment using the following command with `poetry`: + ``` poetry shell ``` -4. Install the dependencies by doing: -``` -poetry install --all-extras -``` -This command will install the dependencies based on what we specify in poetry.lock. If this step is taking a long time, try to go back to step 2 and check your version of poetry. Alternatively, you could also try deleting the lock file and regenerate it by doing `poetry install` (Please note this method should be used as a last resort because this would force other developers to change their development environment) - -5. Fill in credential files: -*Note*: If you won't interact with Synapse, please ignore this section. +### 4. Install `schematic` dependencies -There are two main configuration files that need to be edited: -- config.yml -- [synapseConfig](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig) +The following command will install the dependencies based on what we specify in the `poetry.lock` file of this repository. If this step is taking a long time, try to go back to Step 2 and check your version of `poetry`. Alternatively, you can try deleting the lock file and regenerate it by doing `poetry install` (Please note this method should be used as a last resort because this would force other developers to change their development environment) -Configure .synapseConfig File +``` +poetry install --all-extras +``` -Download a copy of the ``.synapseConfig`` file, open the file in the editor of your -choice and edit the `username` and `authtoken` attribute under the `authentication` -section. **Note:** You must place the file at the root of the project like -`{project_root}/.synapseConfig` in order for any authenticated tests to work. +### 5. Set up configuration files -*Note*: You could also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: ->[authentication]
username = ABC
authtoken = abc +The following section will walk through setting up your configuration files with your credentials to allow for communication between `schematic` and the Synapse API. -Configure config.yml File +There are two main configuration files that need to be created + modified: +- `config.yml` +- [.synapseConfig](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig) -There are some defaults in schematic that can be configured. These fields are in ``config_example.yml``: +**Create and modify the `config.yml`** -```text +In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, +such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. -# This is an example config for Schematic. -# All listed values are those that are the default if a config is not used. -# Save this as config.yml, this will be gitignored. -# Remove any fields in the config you don't want to change -# Change the values of any fields you do want to change - - -# This describes where assets such as manifests are stored -asset_store: - # This is when assets are stored in a synapse project - synapse: - # Synapse ID of the file view listing all project data assets. - master_fileview_id: "syn23643253" - # Path to the synapse config file, either absolute or relative to this file - config: ".synapseConfig" - # Base name that manifest files will be saved as - manifest_basename: "synapse_storage_manifest" - -# This describes information about manifests as it relates to generation and validation -manifest: - # Location where manifests will saved to - manifest_folder: "manifests" - # Title or title prefix given to generated manifest(s) - title: "example" - # Data types of manifests to be generated or data type (singular) to validate manifest against - data_type: - - "Biospecimen" - - "Patient" - -# Describes the location of your schema -model: - # Location of your schema jsonld, it must be a path relative to this file or absolute - location: "tests/data/example.model.jsonld" - -# This section is for using google sheets with Schematic -google_sheets: - # Path to the synapse config file, either absolute or relative to this file - service_acct_creds: "schematic_service_account_creds.json" - # When doing google sheet validation (regex match) with the validation rules. - # true is alerting the user and not allowing entry of bad values. - # false is warning but allowing the entry on to the sheet. - strict_validation: true -``` - -If you want to change any of these copy ``config_example.yml`` to ``config.yml``, change any fields you want to, and remove any fields you don't. +Copy-paste the contents of `config_example.yml` into a new file called `config.yml` and modify its contents according to your use case. For example if you wanted to change the folder where manifests are downloaded your config should look like: @@ -180,17 +138,35 @@ manifest: manifest_folder: "my_manifest_folder_path" ``` -_Note_: `config.yml` is ignored by git. +> [!NOTE] +> `config.yml` is ignored by git. -_Note_: Paths can be specified relative to the `config.yml` file or as absolute paths. +> [!NOTE] +> Paths can be specified relative to the `config.yml` file or as absolute paths. + +**Create and modify the `.synapseConfig`** + +The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. +Download a copy of the `.synapseConfig` file from [here](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig), open the file in the editor of your +choice and edit the `username` and `authtoken` attribute under the `authentication` section. + +> [!IMPORTANT] +> You must place the file at the root of the project like so: +> ``` +> {project_root}/.synapseConfig +> ``` +> In order for any tests that involve Synapse authentication to work. + +*Note*: You can also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: +>[authentication]
username = ABC
authtoken = abc -6. Login to Synapse by using the command line +### 6. Login to Synapse by using the command line On the CLI in your virtual environment, run the following command: ``` synapse login -u -p --rememberMe ``` -7. Obtain Google credential Files +### 7. Obtain Google credential files Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). > As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. @@ -206,7 +182,7 @@ Most Google sheet functionality could be authenticated with service account. How requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future. -8. Set up pre-commit hooks +### 8. Set up pre-commit hooks This repository is configured to utilize pre-commit hooks as part of the development process. To enable these hooks, please run the following command and look for the following success message: ``` From 2f7e97d8c80d4a5c86fbca1c3394ca8d93f0ac51 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 10:51:02 -0400 Subject: [PATCH 04/34] Small restructure for devs and usage cases --- README.md | 146 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 107 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index fc933b58d..f5b02bbc5 100644 --- a/README.md +++ b/README.md @@ -78,6 +78,74 @@ If you run into `ERROR: Failed building wheel for numpy`, the error might be abl pip3 install --upgrade pip ``` +### 4. Set up configuration files + +The following section will walk through setting up your configuration files with your credentials to allow for communication between `schematic` and the Synapse API. + +There are two main configuration files that need to be created + modified: +- `config.yml` +- [.synapseConfig](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig) + +**Create and modify the `config.yml`** + +In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, +such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. + +Copy-paste the contents of `config_example.yml` into a new file called `config.yml` and modify its contents according to your use case. + +For example if you wanted to change the folder where manifests are downloaded your config should look like: + +```text + +manifest: + manifest_folder: "my_manifest_folder_path" +``` + +> [!NOTE] +> `config.yml` is ignored by git. + +> [!NOTE] +> Paths can be specified relative to the `config.yml` file or as absolute paths. + +**Create and modify the `.synapseConfig`** + +The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. +Download a copy of the `.synapseConfig` file from [here](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig), open the file in the editor of your +choice and edit the `username` and `authtoken` attribute under the `authentication` section. + +> [!IMPORTANT] +> You must place the file at the root of the project like so: +> ``` +> {project_root}/.synapseConfig +> ``` +> In order for any tests that involve Synapse authentication to work. + +*Note*: You can also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: +>[authentication]
username = ABC
authtoken = abc + +### 5. Login to Synapse by using the command line +On the CLI in your virtual environment, run the following command: +``` +synapse login -u -p --rememberMe +``` + +### 6. Obtain Google credential files +Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). + +> As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. + +*Notes*: Use the ``schematic_service_account_creds.json`` file for the service +account mode of authentication (*for Google services/APIs*). Service accounts +are special Google accounts that can be used by applications to access Google APIs +programmatically via OAuth2.0, with the advantage being that they do not require +human authorization. + +*Background*: schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. +Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality +requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate +token-based authentication and keep only service account authentication in the future. + + ## Installation Guide For: Contributors When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change. @@ -192,6 +260,35 @@ pre-commit installed at .git/hooks/pre-commit *Note*: Make sure you have the latest version of the `develop` branch on your local machine. +# Command Line Usage +1. Generate a new manifest as a google sheet + +``` +schematic manifest -c /path/to/config.yml get -dt -s +``` + +2. Grab an existing manifest from synapse + +``` +schematic manifest -c /path/to/config.yml get -dt -d -s +``` + +3. Validate a manifest + +``` +schematic model -c /path/to/config.yml validate -dt -mp +``` + +4. Submit a manifest as a file + +``` +schematic model -c /path/to/config.yml submit -mp -d -vc -mrt file_only +``` + +Please visit more documentation [here](https://sage-schematic.readthedocs.io/en/develop/cli_reference.html) for more information. + +# Docker Usage + ### Example For REST API
#### Use file path of `config.yml` to run API endpoints: @@ -220,8 +317,6 @@ docker run --rm -p 3001:3001 \ sagebionetworks/schematic \ python /usr/src/app/run_api.py ``` - - ### Example For Schematic on mac/linux
To run example below, first clone schematic into your home directory `git clone https://github.com/sage-bionetworks/schematic ~/schematic`
Then update .synapseConfig with your credentials @@ -285,42 +380,7 @@ For new features, bugs, enhancements ## Update toml file and lock file If you install external libraries by using `poetry add `, please make sure that you include `pyproject.toml` and `poetry.lock` file in your commit. -## Reporting bugs or feature requests -You can **create bug and feature requests** through [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8). Providing enough details to the developers to verify and troubleshoot your issue is paramount: -- **Provide a clear and descriptive title as well as a concise summary** of the issue to identify the problem. -- **Describe the exact steps which reproduce the problem** in as many details as possible. -- **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior. -- **Explain which behavior you expected to see** instead and why. -- **Provide screenshots of the expected or actual behaviour** where applicable. - -# Command Line Usage -1. Generate a new manifest as a google sheet - -``` -schematic manifest -c /path/to/config.yml get -dt -s -``` - -2. Grab an existing manifest from synapse - -``` -schematic manifest -c /path/to/config.yml get -dt -d -s -``` - -3. Validate a manifest - -``` -schematic model -c /path/to/config.yml validate -dt -mp -``` - -4. Submit a manifest as a file - -``` -schematic model -c /path/to/config.yml submit -mp -d -vc -mrt file_only -``` - -Please visit more documentation [here](https://sage-schematic.readthedocs.io/en/develop/cli_reference.html) for more information. - -### Testing +## Testing All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the [tests](https://github.com/Sage-Bionetworks/schematic/tree/develop-docs-update/tests) subdirectory. @@ -339,12 +399,20 @@ pytest -vs tests/ 5. Once the PR is merged, leave the original copies on Synapse to maintain support for feature branches that were forked from `develop` before your update. - If the old copies are problematic and need to be removed immediately (_e.g._ contain sensitive data), proceed with the deletion and alert the other contributors that they need to merge the latest `develop` branch into their feature branches for their tests to work. -### Code style +## Code style * Please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. * Be consistent and follow existing code conventions and spirit. +# Reporting bugs or feature requests +You can **create bug and feature requests** through [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8). Providing enough details to the developers to verify and troubleshoot your issue is paramount: +- **Provide a clear and descriptive title as well as a concise summary** of the issue to identify the problem. +- **Describe the exact steps which reproduce the problem** in as many details as possible. +- **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior. +- **Explain which behavior you expected to see** instead and why. +- **Provide screenshots of the expected or actual behaviour** where applicable. + # Contributors Main contributors and developers: From 2f7327caafd650f6e282083636c0b870def90774 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 11:57:34 -0400 Subject: [PATCH 05/34] Update instructions for making .synapseConfig file --- README.md | 39 ++++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index f5b02bbc5..cf0d3e09d 100644 --- a/README.md +++ b/README.md @@ -84,19 +84,18 @@ The following section will walk through setting up your configuration files with There are two main configuration files that need to be created + modified: - `config.yml` -- [.synapseConfig](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig) +- `.synapseConfig` **Create and modify the `config.yml`** In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. -Copy-paste the contents of `config_example.yml` into a new file called `config.yml` and modify its contents according to your use case. +Download the `config_example.yml` as a new file called `config.yml` and modify its contents according to your use case. For example if you wanted to change the folder where manifests are downloaded your config should look like: ```text - manifest: manifest_folder: "my_manifest_folder_path" ``` @@ -110,26 +109,28 @@ manifest: **Create and modify the `.synapseConfig`** The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. -Download a copy of the `.synapseConfig` file from [here](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig), open the file in the editor of your -choice and edit the `username` and `authtoken` attribute under the `authentication` section. +You can automatically generate a `.synapseConfig` file by running the following in your command line and following the prompts: -> [!IMPORTANT] -> You must place the file at the root of the project like so: -> ``` -> {project_root}/.synapseConfig -> ``` -> In order for any tests that involve Synapse authentication to work. +``` +synapse config +``` -*Note*: You can also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: ->[authentication]
username = ABC
authtoken = abc +You can generate a new authentication token on the Synapse website by going to `Account Settings` > `Personal Access Tokens`. + +After following the prompts, a new Synapse configuration file will exist in your home directory which you can access with the following command: -### 5. Login to Synapse by using the command line -On the CLI in your virtual environment, run the following command: ``` -synapse login -u -p --rememberMe +ls ~/.synapseConfig ``` -### 6. Obtain Google credential files +> [!IMPORTANT] +> To start working with the CLI, your `.synapseConfig` should be in your current working directory. + +> [!NOTE] +> You can also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: +> [authentication]
username = ABC
authtoken = abc + +### 5. Obtain Google credential files Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). > As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. @@ -148,12 +149,12 @@ token-based authentication and keep only service account authentication in the f ## Installation Guide For: Contributors +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. + When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change. Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project. -The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. - ### 1. Clone the `schematic` package repository For development, you will be working with the latest version of `schematic` on the repository to ensure compatibility between its latest state and your changes. Ensure your current working directory is where From 4b155a10dad080efdb857d0c80969eedf6b6f264 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 12:02:28 -0400 Subject: [PATCH 06/34] Update Google credentials section --- README.md | 41 +++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index cf0d3e09d..76d345de8 100644 --- a/README.md +++ b/README.md @@ -131,21 +131,22 @@ ls ~/.synapseConfig > [authentication]
username = ABC
authtoken = abc ### 5. Obtain Google credential files -Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). - -> As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. -*Notes*: Use the ``schematic_service_account_creds.json`` file for the service -account mode of authentication (*for Google services/APIs*). Service accounts -are special Google accounts that can be used by applications to access Google APIs -programmatically via OAuth2.0, with the advantage being that they do not require -human authorization. - -*Background*: schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. +Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). +schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future. +> As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. + +> [!NOTE] +> Use the ``schematic_service_account_creds.json`` file for the service +> account mode of authentication (*for Google services/APIs*). Service accounts +> are special Google accounts that can be used by applications to access Google APIs +> programmatically via OAuth2.0, with the advantage being that they do not require +> human authorization. + ## Installation Guide For: Contributors @@ -237,20 +238,20 @@ synapse login -u -p --rememberMe ### 7. Obtain Google credential files Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). - -> As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. - -*Notes*: Use the ``schematic_service_account_creds.json`` file for the service -account mode of authentication (*for Google services/APIs*). Service accounts -are special Google accounts that can be used by applications to access Google APIs -programmatically via OAuth2.0, with the advantage being that they do not require -human authorization. - -*Background*: schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. +schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future. +> As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. + +> [!NOTE] +> Use the ``schematic_service_account_creds.json`` file for the service +> account mode of authentication (*for Google services/APIs*). Service accounts +> are special Google accounts that can be used by applications to access Google APIs +> programmatically via OAuth2.0, with the advantage being that they do not require +> human authorization. + ### 8. Set up pre-commit hooks This repository is configured to utilize pre-commit hooks as part of the development process. To enable these hooks, please run the following command and look for the following success message: From 8624b47333adac7a19a7cbe807c62306fceaf76f Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 12:08:32 -0400 Subject: [PATCH 07/34] Remove step 6 from contributor installation --- README.md | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 76d345de8..09f042d67 100644 --- a/README.md +++ b/README.md @@ -230,13 +230,7 @@ choice and edit the `username` and `authtoken` attribute under the `authenticati *Note*: You can also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: >[authentication]
username = ABC
authtoken = abc -### 6. Login to Synapse by using the command line -On the CLI in your virtual environment, run the following command: -``` -synapse login -u -p --rememberMe -``` - -### 7. Obtain Google credential files +### 6. Obtain Google credential files Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality @@ -252,7 +246,7 @@ token-based authentication and keep only service account authentication in the f > programmatically via OAuth2.0, with the advantage being that they do not require > human authorization. -### 8. Set up pre-commit hooks +### 7. Set up pre-commit hooks This repository is configured to utilize pre-commit hooks as part of the development process. To enable these hooks, please run the following command and look for the following success message: ``` From 2a22f54efb7ed3e1841413ecdd7381f81245e5d6 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 15:56:16 -0400 Subject: [PATCH 08/34] Remove tests/ hyperlink --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 09f042d67..61cbc277f 100644 --- a/README.md +++ b/README.md @@ -378,7 +378,7 @@ If you install external libraries by using `poetry add `, pleas ## Testing -All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the [tests](https://github.com/Sage-Bionetworks/schematic/tree/develop-docs-update/tests) subdirectory. +All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the `tests/` subdirectory. You can run the test suite in the following way: From 8e8cd412a16f7d61d2676ea06ca536ec04d5fa73 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 16:00:38 -0400 Subject: [PATCH 09/34] Fix link to CLI docs --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 61cbc277f..b7ca2e390 100644 --- a/README.md +++ b/README.md @@ -281,7 +281,7 @@ schematic model -c /path/to/config.yml validate -dt -mp -d -vc -mrt file_only ``` -Please visit more documentation [here](https://sage-schematic.readthedocs.io/en/develop/cli_reference.html) for more information. +Please visit more documentation [here](https://sage-schematic.readthedocs.io/en/stable/cli_reference.html#) for more information. # Docker Usage From 8a498205fb3ef2a66c6116ce4382fda029ecb939 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 16:09:51 -0400 Subject: [PATCH 10/34] Point users to service desk --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b7ca2e390..c6f601f65 100644 --- a/README.md +++ b/README.md @@ -152,7 +152,7 @@ token-based authentication and keep only service account authentication in the f The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. -When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change. +When contributing to this repository, please first discuss the change you wish to make via the [service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8) so that we may track these changes. Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project. From 5cd91ddff906fd7aa9fd253a5ce3322dfa78c163 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 16:28:29 -0400 Subject: [PATCH 11/34] Update code style section with how to run black --- README.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index c6f601f65..4acd85eae 100644 --- a/README.md +++ b/README.md @@ -397,8 +397,18 @@ pytest -vs tests/ ## Code style -* Please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. -* Be consistent and follow existing code conventions and spirit. +To ensure consistent code formatting across the project, we use [`black`](https://black.readthedocs.io/en/stable/), the Python code formatter. + +You can apply `black` to the code in this repository by running the following command, for example like so: + +``` +poetry run black schematic tests schematic_api +``` + +When run at the root of the repository, this ensures all your scripts in the `schematic/` `tests/` and `schematic_api/` are formatted consistently. + +Further, please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. +Be consistent and follow existing code conventions and spirit. # Reporting bugs or feature requests From 3ee58ad960d016c9c3ba23f9713223944d4486ea Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 16:34:34 -0400 Subject: [PATCH 12/34] mention pre-commit in code style isntead --- README.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 4acd85eae..f1986fd82 100644 --- a/README.md +++ b/README.md @@ -397,16 +397,12 @@ pytest -vs tests/ ## Code style -To ensure consistent code formatting across the project, we use [`black`](https://black.readthedocs.io/en/stable/), the Python code formatter. - -You can apply `black` to the code in this repository by running the following command, for example like so: +To ensure consistent code formatting across the project, we use the `pre-commit` hook. You can manually run `pre-commit` across the respository before making a pull request like so: ``` -poetry run black schematic tests schematic_api +pre-commit run --all-files ``` -When run at the root of the repository, this ensures all your scripts in the `schematic/` `tests/` and `schematic_api/` are formatted consistently. - Further, please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. Be consistent and follow existing code conventions and spirit. From 023bb0a89a22e16dff9d1cf83ae412cdbdcf2fa4 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 1 Oct 2024 16:57:25 -0400 Subject: [PATCH 13/34] Revisit Set up configuration files --- README.md | 91 +++++++++++++++++++++++++++---------------------------- 1 file changed, 45 insertions(+), 46 deletions(-) diff --git a/README.md b/README.md index f1986fd82..3270be228 100644 --- a/README.md +++ b/README.md @@ -83,28 +83,8 @@ pip3 install --upgrade pip The following section will walk through setting up your configuration files with your credentials to allow for communication between `schematic` and the Synapse API. There are two main configuration files that need to be created + modified: -- `config.yml` - `.synapseConfig` - -**Create and modify the `config.yml`** - -In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, -such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. - -Download the `config_example.yml` as a new file called `config.yml` and modify its contents according to your use case. - -For example if you wanted to change the folder where manifests are downloaded your config should look like: - -```text -manifest: - manifest_folder: "my_manifest_folder_path" -``` - -> [!NOTE] -> `config.yml` is ignored by git. - -> [!NOTE] -> Paths can be specified relative to the `config.yml` file or as absolute paths. +- `config.yml` **Create and modify the `.synapseConfig`** @@ -123,12 +103,28 @@ After following the prompts, a new Synapse configuration file will exist in your ls ~/.synapseConfig ``` +> [!NOTE] +> !!TODO!! You will notice a new `.synapseCache` folder is created alongside the `.synapseConfig` file. This is where all your non-manifest assets will be stored(?) + +**Create and modify the `config.yml`** + +In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, +such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. + +Download the `config_example.yml` as a new file called `config.yml` and modify its contents according to your use case. + +For example if you wanted to change the folder where manifests are downloaded your config should look like: + +```text +manifest: + manifest_folder: "my_manifest_folder_path" +``` + > [!IMPORTANT] -> To start working with the CLI, your `.synapseConfig` should be in your current working directory. +> Be sure to update your `config.yml` with the location of your `.synapseConfig` created in the step above, to avoid authentication errors. Paths can be specified relative to the `config.yml` file or as absolute paths. > [!NOTE] -> You can also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: -> [authentication]
username = ABC
authtoken = abc +> `config.yml` is ignored by git. ### 5. Obtain Google credential files @@ -190,45 +186,48 @@ poetry install --all-extras The following section will walk through setting up your configuration files with your credentials to allow for communication between `schematic` and the Synapse API. There are two main configuration files that need to be created + modified: +- `.synapseConfig` - `config.yml` -- [.synapseConfig](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig) + +**Create and modify the `.synapseConfig`** + +The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. +You can automatically generate a `.synapseConfig` file by running the following in your command line and following the prompts: + +``` +synapse config +``` + +You can generate a new authentication token on the Synapse website by going to `Account Settings` > `Personal Access Tokens`. + +After following the prompts, a new Synapse configuration file will exist in your home directory which you can access with the following command: + +``` +ls ~/.synapseConfig +``` + +> [!NOTE] +> !!TODO!! You will notice a new `.synapseCache` folder is created alongside the `.synapseConfig` file. This is where all your non-manifest assets will be stored(?) **Create and modify the `config.yml`** In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. -Copy-paste the contents of `config_example.yml` into a new file called `config.yml` and modify its contents according to your use case. +Download the `config_example.yml` as a new file called `config.yml` and modify its contents according to your use case. For example if you wanted to change the folder where manifests are downloaded your config should look like: ```text - manifest: manifest_folder: "my_manifest_folder_path" ``` -> [!NOTE] -> `config.yml` is ignored by git. +> [!IMPORTANT] +> Be sure to update your `config.yml` with the location of your `.synapseConfig` created in the step above, to avoid authentication errors. Paths can be specified relative to the `config.yml` file or as absolute paths. > [!NOTE] -> Paths can be specified relative to the `config.yml` file or as absolute paths. - -**Create and modify the `.synapseConfig`** - -The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. -Download a copy of the `.synapseConfig` file from [here](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig), open the file in the editor of your -choice and edit the `username` and `authtoken` attribute under the `authentication` section. - -> [!IMPORTANT] -> You must place the file at the root of the project like so: -> ``` -> {project_root}/.synapseConfig -> ``` -> In order for any tests that involve Synapse authentication to work. - -*Note*: You can also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: ->[authentication]
username = ABC
authtoken = abc +> `config.yml` is ignored by git. ### 6. Obtain Google credential files Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). From 90930cd876e88970ddea0ee1c0e28df371d823f2 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 11:44:41 -0400 Subject: [PATCH 14/34] .synapseCache --- README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 3270be228..72dfeb691 100644 --- a/README.md +++ b/README.md @@ -104,7 +104,8 @@ ls ~/.synapseConfig ``` > [!NOTE] -> !!TODO!! You will notice a new `.synapseCache` folder is created alongside the `.synapseConfig` file. This is where all your non-manifest assets will be stored(?) +> You will notice a new `.synapseCache` folder is created alongside the `.synapseConfig` file. if your `config.yml` does not specify `.synapseCache` +> as the location in which to store your manifests, assets retrieved in ways other than through the CLI will be stored in this cache folder. **Create and modify the `config.yml`** @@ -134,7 +135,7 @@ Most Google sheet functionality could be authenticated with service account. How requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future. -> As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. +> As of `schematic` v22.12.1, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. > [!NOTE] > Use the ``schematic_service_account_creds.json`` file for the service @@ -207,7 +208,8 @@ ls ~/.synapseConfig ``` > [!NOTE] -> !!TODO!! You will notice a new `.synapseCache` folder is created alongside the `.synapseConfig` file. This is where all your non-manifest assets will be stored(?) +> You will notice a new `.synapseCache` folder is created alongside the `.synapseConfig` file. if your `config.yml` does not specify `.synapseCache` +> as the location in which to store your manifests, assets retrieved in ways other than through the CLI will be stored in this cache folder. **Create and modify the `config.yml`** @@ -236,7 +238,7 @@ Most Google sheet functionality could be authenticated with service account. How requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future. -> As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. +> As of `schematic` v22.12.1, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. > [!NOTE] > Use the ``schematic_service_account_creds.json`` file for the service From b1cb1ded232a08fa45f497d0b9ce6c9973d1dc71 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 12:17:38 -0400 Subject: [PATCH 15/34] .synapseConfig location --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 72dfeb691..961b3ace6 100644 --- a/README.md +++ b/README.md @@ -227,6 +227,7 @@ manifest: > [!IMPORTANT] > Be sure to update your `config.yml` with the location of your `.synapseConfig` created in the step above, to avoid authentication errors. Paths can be specified relative to the `config.yml` file or as absolute paths. +> If you are interacting with `schematic` with `python` directly and not through the CLI, the `.synapseConfig` needs to be in your current working directory to avoid authentication errors. > [!NOTE] > `config.yml` is ignored by git. From 5751505bf10aad27631e398d967883fc2f4051d0 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 13:41:48 -0400 Subject: [PATCH 16/34] Update README.md --- README.md | 37 ++++++++++++++++++------------------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 961b3ace6..6561375b5 100644 --- a/README.md +++ b/README.md @@ -89,23 +89,23 @@ There are two main configuration files that need to be created + modified: **Create and modify the `.synapseConfig`** The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. -You can automatically generate a `.synapseConfig` file by running the following in your command line and following the prompts: +You can automatically generate a `.synapseConfig` file by running the following in your command line and following the prompts. + +>[!TIP] +>You can generate a new authentication token on the Synapse website by going to `Account Settings` > `Personal Access Tokens`. ``` synapse config ``` -You can generate a new authentication token on the Synapse website by going to `Account Settings` > `Personal Access Tokens`. - -After following the prompts, a new Synapse configuration file will exist in your home directory which you can access with the following command: +After following the prompts, a new `.synapseConfig` file and `.synapseCache` folder will be created in your home directory. You can view these hidden +assets in your home directory with the following command: ``` -ls ~/.synapseConfig +ls -a ~ ``` -> [!NOTE] -> You will notice a new `.synapseCache` folder is created alongside the `.synapseConfig` file. if your `config.yml` does not specify `.synapseCache` -> as the location in which to store your manifests, assets retrieved in ways other than through the CLI will be stored in this cache folder. +The `.synapseConfig` is used to log into Synapse if you are not using an environment variable (i.e. `SYNAPSE_ACCESS_TOKEN`) for authentication, and the `.synapseCache` is where your assets are stored if you are not working with the CLI and/or you have specified `.synapseCache` as the location in which to store your manfiests, in your `config.yml` (more on the `config.yml` below). **Create and modify the `config.yml`** @@ -114,7 +114,7 @@ such as the Synapse ID of the main file view containing all your project assets, Download the `config_example.yml` as a new file called `config.yml` and modify its contents according to your use case. -For example if you wanted to change the folder where manifests are downloaded your config should look like: +For example, if you wanted to change the folder where manifests are downloaded your config should look like: ```text manifest: @@ -193,23 +193,23 @@ There are two main configuration files that need to be created + modified: **Create and modify the `.synapseConfig`** The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. -You can automatically generate a `.synapseConfig` file by running the following in your command line and following the prompts: +You can automatically generate a `.synapseConfig` file by running the following in your command line and following the prompts. + +>[!TIP] +>You can generate a new authentication token on the Synapse website by going to `Account Settings` > `Personal Access Tokens`. ``` synapse config ``` -You can generate a new authentication token on the Synapse website by going to `Account Settings` > `Personal Access Tokens`. - -After following the prompts, a new Synapse configuration file will exist in your home directory which you can access with the following command: +After following the prompts, a new `.synapseConfig` file and `.synapseCache` folder will be created in your home directory. You can view these hidden +assets in your home directory with the following command: ``` -ls ~/.synapseConfig +ls -a ~ ``` -> [!NOTE] -> You will notice a new `.synapseCache` folder is created alongside the `.synapseConfig` file. if your `config.yml` does not specify `.synapseCache` -> as the location in which to store your manifests, assets retrieved in ways other than through the CLI will be stored in this cache folder. +The `.synapseConfig` is used to log into Synapse if you are not using an environment variable (i.e. `SYNAPSE_ACCESS_TOKEN`) for authentication, and the `.synapseCache` is where your assets are stored if you are not working with the CLI and/or you have specified `.synapseCache` as the location in which to store your manfiests, in your `config.yml` (more on the `config.yml` below). **Create and modify the `config.yml`** @@ -218,7 +218,7 @@ such as the Synapse ID of the main file view containing all your project assets, Download the `config_example.yml` as a new file called `config.yml` and modify its contents according to your use case. -For example if you wanted to change the folder where manifests are downloaded your config should look like: +For example, if you wanted to change the folder where manifests are downloaded your config should look like: ```text manifest: @@ -227,7 +227,6 @@ manifest: > [!IMPORTANT] > Be sure to update your `config.yml` with the location of your `.synapseConfig` created in the step above, to avoid authentication errors. Paths can be specified relative to the `config.yml` file or as absolute paths. -> If you are interacting with `schematic` with `python` directly and not through the CLI, the `.synapseConfig` needs to be in your current working directory to avoid authentication errors. > [!NOTE] > `config.yml` is ignored by git. From d1428395b4a628054748156b779bd737c60282ea Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 14:02:59 -0400 Subject: [PATCH 17/34] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6561375b5..697e792db 100644 --- a/README.md +++ b/README.md @@ -144,6 +144,7 @@ token-based authentication and keep only service account authentication in the f > programmatically via OAuth2.0, with the advantage being that they do not require > human authorization. +After running this step, your setup is complete, and you can test it on a `python` instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. ## Installation Guide For: Contributors From 95b68ce7902c7f7c855ee73e44fe99705f6e79e7 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 15:41:13 -0400 Subject: [PATCH 18/34] second pass through developer instructions --- README.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 697e792db..b20eed86a 100644 --- a/README.md +++ b/README.md @@ -148,7 +148,7 @@ After running this step, your setup is complete, and you can test it on a `pytho ## Installation Guide For: Contributors -The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. For development, we recommend working with versions > python 3.9 to avoid issues with `pre-commit`'s default hook configuration. When contributing to this repository, please first discuss the change you wish to make via the [service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8) so that we may track these changes. @@ -169,7 +169,7 @@ Install `poetry` (version 1.3.0 or later) using either the [official installer]( ### 3. Start the virtual environment -Initialize the virtual environment using the following command with `poetry`: +`cd` into your cloned `schematic` repository, and initialize the virtual environment using the following command with `poetry`: ``` poetry shell @@ -177,6 +177,8 @@ poetry shell ### 4. Install `schematic` dependencies +Before you begin, make sure you are in the latest `develop` of the repository. + The following command will install the dependencies based on what we specify in the `poetry.lock` file of this repository. If this step is taking a long time, try to go back to Step 2 and check your version of `poetry`. Alternatively, you can try deleting the lock file and regenerate it by doing `poetry install` (Please note this method should be used as a last resort because this would force other developers to change their development environment) ``` @@ -212,6 +214,9 @@ ls -a ~ The `.synapseConfig` is used to log into Synapse if you are not using an environment variable (i.e. `SYNAPSE_ACCESS_TOKEN`) for authentication, and the `.synapseCache` is where your assets are stored if you are not working with the CLI and/or you have specified `.synapseCache` as the location in which to store your manfiests, in your `config.yml` (more on the `config.yml` below). +> [!IMPORTANT] +> When developing on `schematic`, keep your `.synapseConfig` in your current working directory to avoid authentication errors. + **Create and modify the `config.yml`** In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, @@ -256,7 +261,13 @@ $ pre-commit install pre-commit installed at .git/hooks/pre-commit ``` -*Note*: Make sure you have the latest version of the `develop` branch on your local machine. +You can run `pre-commit` manually across the entire repository like so: + +``` +pre-commit run --all-files +``` + +After running this step, your setup is complete, and you can test it on a python instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. # Command Line Usage 1. Generate a new manifest as a google sheet From 5c9ecfe889f5bf7c882affcd043aaa91151f6d13 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 15:44:33 -0400 Subject: [PATCH 19/34] Update README.md --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b20eed86a..3f1d82692 100644 --- a/README.md +++ b/README.md @@ -165,7 +165,11 @@ git clone https://github.com/Sage-Bionetworks/schematic.git ### 2. Install `poetry` -Install `poetry` (version 1.3.0 or later) using either the [official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or [pipx](https://python-poetry.org/docs/#installing-with-pipx). If you have an older installation of Poetry, we recommend uninstalling it first. +Install `poetry` (version 1.3.0 or later) using either the [official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or `pip`. If you have an older installation of Poetry, we recommend uninstalling it first. + +``` +pip install poetry +``` ### 3. Start the virtual environment From f2e7661bc0b5229a2d7de1b7279b2c32fb03bbb7 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 16:03:47 -0400 Subject: [PATCH 20/34] update Docker Usage section --- README.md | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 3f1d82692..7dcee7f5f 100644 --- a/README.md +++ b/README.md @@ -302,9 +302,15 @@ Please visit more documentation [here](https://sage-schematic.readthedocs.io/en/ # Docker Usage -### Example For REST API
+Here we will demonstrate how to run `schematic` with Docker, with different use-cases for running API endpoints, validating the manifests, and +using how to use `schematic` based on your OS (macOS/Linux). -#### Use file path of `config.yml` to run API endpoints: +### Running the REST API
+ +Use the Docker image to run `schematic`s REST API. You can either use the file path for the `config.yml` created using the installation instructions, +or set up authentication with environment variables. + +#### Example 1: Using the `config.yml` path ``` docker run --rm -p 3001:3001 \ -v $(pwd):/schematic -w /schematic --name schematic \ @@ -314,7 +320,7 @@ docker run --rm -p 3001:3001 \ python /usr/src/app/run_api.py ``` -#### Use content of `config.yml` and `schematic_service_account_creds.json`as an environment variable to run API endpoints: +#### Example 2: Use environment variables 1. save content of `config.yml` as to environment variable `SCHEMATIC_CONFIG_CONTENT` by doing: `export SCHEMATIC_CONFIG_CONTENT=$(cat /path/to/config.yml)` 2. Similarly, save the content of `schematic_service_account_creds.json` as `SERVICE_ACCOUNT_CREDS` by doing: `export SERVICE_ACCOUNT_CREDS=$(cat /path/to/schematic_service_account_creds.json)` @@ -330,9 +336,18 @@ docker run --rm -p 3001:3001 \ sagebionetworks/schematic \ python /usr/src/app/run_api.py ``` -### Example For Schematic on mac/linux
-To run example below, first clone schematic into your home directory `git clone https://github.com/sage-bionetworks/schematic ~/schematic`
-Then update .synapseConfig with your credentials +### Running `schematic` to Validate Manifests
+You can also use Docker to run `schematic` commands like validating manifests. Below are examples for different platforms. + +#### Example for macOS/Linux + +1. Clone the repository: +``` +git clone https://github.com/sage-bionetworks/schematic ~/schematic +``` +2. Update the `.synapseConfig` with your credentials. See the installation instructions for how to do this. + +3. Run Docker: ``` docker run \ -v ~/schematic:/schematic \ @@ -346,7 +361,9 @@ docker run \ -js /schematic/tests/data/example.model.jsonld ``` -### Example For Schematic on Windows
+#### Example for Windows + +Run the following command to validate manifests: ``` docker run -v %cd%:/schematic \ -w /schematic \ From c4f9fcc53587358bc50a67304ee278bcd0fda682 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 16:16:37 -0400 Subject: [PATCH 21/34] Update contribution guideline --- README.md | 58 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 34 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 7dcee7f5f..2723043b5 100644 --- a/README.md +++ b/README.md @@ -376,33 +376,42 @@ docker run -v %cd%:/schematic \ # Contribution Guidelines ### Development process instruction -For new features, bugs, enhancements - -1. Pull the latest code from [develop branch in the upstream repo](https://github.com/Sage-Bionetworks/schematic) -2. Checkout a new branch develop- from the develop branch -3. Do development on branch develop- - a. may need to ensure that schematic poetry toml and lock files are compatible with your local environment -4. Add changed files for tracking and commit changes using [best practices](https://www.perforce.com/blog/vcs/git-best-practices-git-commit) -5. Have granular commits: not “too many” file changes, and not hundreds of code lines of changes -6. Commits with work in progress are encouraged: - a. add WIP to the beginning of the commit message for “Work In Progress” commits -7. Keep commit messages descriptive but less than a page long, see best practices -8. Push code to develop- in upstream repo -9. Branch out off develop- if needed to work on multiple features associated with the same code base -10. After feature work is complete and before creating a PR to the develop branch in upstream +For new features, bugs, enhancements: + +#### 1. Branch Setup +* Pull the latest code from the develop branch in the upstream repository. +* Checkout a new branch formatted like so: `develop-` from the develop branch + +#### 2. Development Workflow +* Develop on your new branch. +* Ensure pyproject.toml and poetry.lock files are compatible with your environment. +* Add changed files for tracking and commit changes using [best practices](https://www.perforce.com/blog/vcs/git-best-practices-git-commit) +* Have granular commits: not “too many” file changes, and not hundreds of code lines of changes +* You can choose to create a draft PR if you prefer to develop this way + +#### 3. Branch Management +* Push code to `develop-` in upstream repo: + ``` + git push develop- + ``` +* Branch off `develop-` if you need to work on multiple features associated with the same code base +* After feature work is complete and before creating a PR to the develop branch in upstream a. ensure that code runs locally b. test for logical correctness locally c. wait for git workflow to complete (e.g. tests are run) on github -11. Create a PR from develop- into the develop branch of the upstream repo -12. Request a code review on the PR -13. Once code is approved merge in the develop branch -14. Delete the develop- branch + +#### 4. Pull Request and Review +* Create a PR from `develop-` into the develop branch of the upstream repo +* Request a code review on the PR +* Once code is approved merge in the develop branch. We recommend squashing your commits for a cleaner commit history. +* Once the actions pass on the main branch, delete the `develop-` branch + ## Updating readthedocs documentation -1. `cd docs` -2. After making relevant changes, you could run the `make html` command to re-generate the `build` folder. -3. Please contact the dev team to publish your updates +1. Navigate to the docs directory. +2. Run make html to regenerate the build after changes. +3. Contact the development team to publish the updates. -*Other helpful resources*: +*Helpful resources*: 1. [Getting started with Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) 2. [Installing Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) @@ -412,10 +421,11 @@ If you install external libraries by using `poetry add `, pleas ## Testing -All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the `tests/` subdirectory. +* All new code must include tests. -You can run the test suite in the following way: +* Tests are written using pytest and are located in the tests/ subdirectory. +* Run tests with: ``` pytest -vs tests/ ``` From 2c8c72a54bc6c29076b5e0301be390ff50231959 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 16:30:49 -0400 Subject: [PATCH 22/34] tldr --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 2723043b5..1a58848f6 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,12 @@ [![Build Status](https://img.shields.io/endpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2FSage-Bionetworks%2Fschematic%2Fbadge%3Fref%3Ddevelop&style=flat)](https://actions-badge.atrox.dev/Sage-Bionetworks/schematic/goto?ref=develop) [![Documentation Status](https://readthedocs.org/projects/sage-schematic/badge/?version=develop)](https://sage-schematic.readthedocs.io/en/develop/?badge=develop) [![PyPI version](https://badge.fury.io/py/schematicpy.svg)](https://badge.fury.io/py/schematicpy) # TL;DR -Under Construction. + +* `schematic` (Schema Engine for Manifest Ingress and Curation) is a python-based software tool that streamlines the retrieval, validation, and submission of metadata for biomedical datasets hosted on Sage Bionetworks' Synapse platform. +* Users can work with `schematic` in several ways, including through the CLI (see [Command Line Usage](#command-line-usage) for examples), through Docker (see [Docker Usage](#docker-usage) for examples), or with python. +* In order to communicate with Synapse, users will need to set up their credentials for authentication with Synapse and the Google Sheets API. Setup instructions are available in the Installation Guides: + * [Installation Guide For: Schematic CLI users](#installation-guide-for-schematic-cli-users) + * [Installation Guide For: Contributors](#installation-guide-for-contributors) # Table of Contents - [Schematic](#schematic) From ce064226f8745fc32ac112b6a43aebe0f1dd7a7f Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 17:01:29 -0400 Subject: [PATCH 23/34] address comments from FDS-1172 --- README.md | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 1a58848f6..5d6742028 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ * `schematic` (Schema Engine for Manifest Ingress and Curation) is a python-based software tool that streamlines the retrieval, validation, and submission of metadata for biomedical datasets hosted on Sage Bionetworks' Synapse platform. * Users can work with `schematic` in several ways, including through the CLI (see [Command Line Usage](#command-line-usage) for examples), through Docker (see [Docker Usage](#docker-usage) for examples), or with python. -* In order to communicate with Synapse, users will need to set up their credentials for authentication with Synapse and the Google Sheets API. Setup instructions are available in the Installation Guides: +* `schematic` needs to communicate with Synapse and Google Sheets in order for its processes to work. In order for this to happen, users will need to set up their credentials for authentication with Synapse and the Google Sheets API. Setup instructions are available in the Installation Guides: * [Installation Guide For: Schematic CLI users](#installation-guide-for-schematic-cli-users) * [Installation Guide For: Contributors](#installation-guide-for-contributors) @@ -48,7 +48,7 @@ SCHEMATIC is an acronym for _Schema Engine for Manifest Ingress and Curation_. T ## Installation Guide For: Schematic CLI users -The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section, and do not have an environment already active (e.g. with `pyenv`). ### 1. Verify your python version @@ -153,7 +153,7 @@ After running this step, your setup is complete, and you can test it on a `pytho ## Installation Guide For: Contributors -The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section. For development, we recommend working with versions > python 3.9 to avoid issues with `pre-commit`'s default hook configuration. +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section, and do not have an environment already active (e.g. with `pyenv`). For development, we recommend working with versions > python 3.9 to avoid issues with `pre-commit`'s default hook configuration. When contributing to this repository, please first discuss the change you wish to make via the [service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8) so that we may track these changes. @@ -176,6 +176,12 @@ Install `poetry` (version 1.3.0 or later) using either the [official installer]( pip install poetry ``` +Check to make sure your version of poetry is > v1.3.0 + +``` +poetry --version +``` + ### 3. Start the virtual environment `cd` into your cloned `schematic` repository, and initialize the virtual environment using the following command with `poetry`: @@ -184,6 +190,12 @@ pip install poetry poetry shell ``` +To make sure your poetry version and python version are consistent with the versions you expect, you can run the following command: + +``` +poetry debug info +``` + ### 4. Install `schematic` dependencies Before you begin, make sure you are in the latest `develop` of the repository. @@ -231,9 +243,13 @@ The `.synapseConfig` is used to log into Synapse if you are not using an environ In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. -Download the `config_example.yml` as a new file called `config.yml` and modify its contents according to your use case. +Copy the contents of the `config_example.yml` (located in the base directory of the cloned `schematic` repo) into a new file called `config.yml` -For example, if you wanted to change the folder where manifests are downloaded your config should look like: +```` +cp config_example.yml config.yml +``` + +Once you've copied the file, modify its contents according to your use case. For example, if you wanted to change the folder where manifests are downloaded your config should look like: ```text manifest: From 6c4193c819241003c3ec37136ad0087d577a99ef Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 2 Oct 2024 17:08:44 -0400 Subject: [PATCH 24/34] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5d6742028..45030d136 100644 --- a/README.md +++ b/README.md @@ -245,7 +245,7 @@ such as the Synapse ID of the main file view containing all your project assets, Copy the contents of the `config_example.yml` (located in the base directory of the cloned `schematic` repo) into a new file called `config.yml` -```` +``` cp config_example.yml config.yml ``` From e12d7eb70008afd35f43cbfb020a67a91525cebe Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Fri, 4 Oct 2024 16:44:58 -0400 Subject: [PATCH 25/34] Consolidate confluence docs --- README.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 85 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 45030d136..ac5e2d56f 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ * `schematic` (Schema Engine for Manifest Ingress and Curation) is a python-based software tool that streamlines the retrieval, validation, and submission of metadata for biomedical datasets hosted on Sage Bionetworks' Synapse platform. * Users can work with `schematic` in several ways, including through the CLI (see [Command Line Usage](#command-line-usage) for examples), through Docker (see [Docker Usage](#docker-usage) for examples), or with python. * `schematic` needs to communicate with Synapse and Google Sheets in order for its processes to work. In order for this to happen, users will need to set up their credentials for authentication with Synapse and the Google Sheets API. Setup instructions are available in the Installation Guides: - * [Installation Guide For: Schematic CLI users](#installation-guide-for-schematic-cli-users) + * [Installation Guide For: Schematic CLI users](#installation-guide-for-users) * [Installation Guide For: Contributors](#installation-guide-for-contributors) # Table of Contents @@ -46,9 +46,9 @@ SCHEMATIC is an acronym for _Schema Engine for Manifest Ingress and Curation_. T > To create Google Sheets files from Schematic, please follow our credential policy for Google credentials. You can find a detailed tutorial [here](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA). > If you're using config.yml, make sure to specify the path to `schematic_service_account_creds.json` (see the `google_sheets > service_account_creds` section for more information). -## Installation Guide For: Schematic CLI users +## Installation Guide For: Users -The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section, and do not have an environment already active (e.g. with `pyenv`). +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section, and do not have a Python environment already active. ### 1. Verify your python version @@ -63,12 +63,44 @@ If your current Python version is not supported by Schematic, you can switch to ### 2. Set up your virtual environment -Once you are working with a python version supported by Schematic, please activate a virtual environment within which you can install the package. Python 3 has built-in support for virtual environments with the `venv` module, so you no longer need to install `virtualenv`: +Once you are working with a python version supported by Schematic, please activate a virtual environment within which you can install the package. You can +set up your virtual environment. Below we will instruct how to creat your virtual environment with `venv` and with `conda`. + +#### 2a. Set up your virtual environment with `venv` + +Python 3 has built-in support for virtual environments with the `venv` module, so you no longer need to install `virtualenv`: + ``` python3 -m venv .venv source .venv/bin/activate ``` +#### 2b. Set up your virtual environment with `conda` + +`conda` is a powerful package and environment management tool that allows users to create isolated environments used particularly in data science and machine learning workflows. If you would like to manage your environments with `conda`, continue reading: + +1. **Download your preferred `conda` installer**: Begin by [installing `conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html). We personally recommend working with `Miniconda` which is a lightweight installer for `conda` that includes only `conda` and its dependencies. + +2. **Execute the `conda` installer**: Once you have downloaded your preferred installer, execute it using `bash` or `zsh`, depending on the shell configured for your terminal environment. For example: + + ``` + bash Miniconda3-latest-MacOSX-arm64.sh + ``` + +3. **Verify your `conda` setup**: Follow the prompts to complete your setup. Then verify your setup by running the `conda` command. + +4. **Create your `schematic` environment**: Begin by creating a fresh `conda` environment for `schematic` like so: + + ``` + conda create --name 'schematicpy' python=3.10 + ``` + +5. **Activate the environment**: Once your environment is set up, you can now activate your new environment with `conda`: + + ``` + conda activate schematicpy + ``` + ### 3. Install `schematic` dependencies Install the package using [pip](https://pip.pypa.io/en/stable/quickstart/): @@ -132,16 +164,38 @@ manifest: > [!NOTE] > `config.yml` is ignored by git. -### 5. Obtain Google credential files +### 5. Get your data model as a `JSON-LD` schema file + +Now you need a schema file, e.g. `model.jsonld`, to have a data model that schematic can work with. While you can download a super basic example data model [here](https://raw.githubusercontent.com/Sage-Bionetworks/schematic/refs/heads/develop/tests/data/example.model.jsonld), you’ll probably be working with a DCC-specific data model. For non-Sage employees/contributors using the CLI, you might care only about the minimum needed artifact, which is the `.jsonld`; locate and download only that from the right repo. + +Here are some example repos with schema files: +* https://github.com/ncihtan/data-models/ +* https://github.com/nf-osi/nf-metadata-dictionary/ + +> [!IMPORTANT] +> Your local working directory would typically have `model.jsonld` and `config.yml` side-by-side. The path to your data model should match what is in `config.yml` + +### 6. Obtain Google credential files + +Any function that interacts with a google sheet (such as `schematic manifest get`) requires google cloud credentials. + +1. **Option 1**: [Here](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA?referrer=workspace)’s a step-by-step guide on how to create these credentials in Google Cloud. + * Depending on your institution's policies, your institutional Google account may or may not have the required permissions to complete this. A possible workaround is to use a personal or temporary Google account. + +> [!WARNING] +> At the time of writing, Sage Bionetworks employees do not have the appropriate permissions to create projects with their Sage Bionetworks Google accounts. You would follow instructions using a personal Google account. + +2. **Option 2**: Ask your DCC/development team if they have credentials previously set up with a service account. + +Once you have obtained credentials, be sure that the json file generated is named in the same way as the `service_acct_creds` parameter in your `config.yml` file. -Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). +> [!NOTE] +> Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future. -> As of `schematic` v22.12.1, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. - > [!NOTE] > Use the ``schematic_service_account_creds.json`` file for the service > account mode of authentication (*for Google services/APIs*). Service accounts @@ -149,7 +203,8 @@ token-based authentication and keep only service account authentication in the f > programmatically via OAuth2.0, with the advantage being that they do not require > human authorization. -After running this step, your setup is complete, and you can test it on a `python` instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. +### 7. Verify your setup +After running the steps above, your setup is complete, and you can test it on a `python` instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. ## Installation Guide For: Contributors @@ -263,14 +318,29 @@ manifest: > `config.yml` is ignored by git. ### 6. Obtain Google credential files -Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). + +Any function that interacts with a google sheet (such as `schematic manifest get`) requires google cloud credentials. + +1. **Option 1**: [Here](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA?referrer=workspace)’s a step-by-step guide on how to create these credentials in Google Cloud. + * Depending on your institution's policies, your institutional Google account may or may not have the required permissions to complete this. A possible workaround is to use a personal or temporary Google account. + +> [!WARNING] +> At the time of writing, Sage Bionetworks employees do not have the appropriate permissions to create projects with their Sage Bionetworks Google accounts. You would follow instructions using a personal Google account. + +2. **Option 2**: Ask your DCC/development team if they have credentials previously set up with a service account. + +Once you have obtained credentials, be sure that the json file generated is named in the same way as the `service_acct_creds` parameter in your `config.yml` file. + +> [!IMPORTANT] +> For testing, make sure there is no environment variable `SCHEMATIC_SERVICE_ACCOUNT_CREDS`. Check the file `.env` to ensure this is not set. Also, check that config files used for testing, such as `config_example.yml` do not contain service_acct_creds_synapse_id. + +> [!NOTE] +> Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future. -> As of `schematic` v22.12.1, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. - > [!NOTE] > Use the ``schematic_service_account_creds.json`` file for the service > account mode of authentication (*for Google services/APIs*). Service accounts @@ -294,6 +364,9 @@ pre-commit run --all-files After running this step, your setup is complete, and you can test it on a python instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. +### 8. Verify your setup +After running the steps above, your setup is complete, and you can test it on a `python` instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. + # Command Line Usage 1. Generate a new manifest as a google sheet From eec2c8891f939d68e7ad6fe8de5bb14e4b4830a8 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Fri, 4 Oct 2024 16:49:19 -0400 Subject: [PATCH 26/34] Update table of contents --- README.md | 50 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index ac5e2d56f..5f17b0b48 100644 --- a/README.md +++ b/README.md @@ -11,29 +11,49 @@ # Table of Contents - [Schematic](#schematic) -- [Table of contents](#table-of-contents) +- [TL;DR](#tldr) +- [Table of Contents](#table-of-contents) - [Introduction](#introduction) - [Installation](#installation) - [Installation Requirements](#installation-requirements) - - [Installation guide for Schematic CLI users](#installation-guide-for-schematic-cli-users) - - [Installation guide for developers/contributors](#installation-guide-for-developerscontributors) - - [Development environment setup](#development-environment-setup) - - [Development process instruction](#development-process-instruction) - - [Example For REST API ](#example-for-rest-api-) - - [Use file path of `config.yml` to run API endpoints:](#use-file-path-of-configyml-to-run-api-endpoints) - - [Use content of `config.yml` and `schematic_service_account_creds.json`as an environment variable to run API endpoints:](#use-content-of-configyml-and-schematic_service_account_credsjsonas-an-environment-variable-to-run-api-endpoints) - - [Example For Schematic on mac/linux ](#example-for-schematic-on-maclinux-) - - [Example For Schematic on Windows ](#example-for-schematic-on-windows-) -- [Other Contribution Guidelines](#other-contribution-guidelines) + - [Installation Guide For: Users](#installation-guide-for-users) + - [1. Verify your python version](#1-verify-your-python-version) + - [2. Set up your virtual environment](#2-set-up-your-virtual-environment) + - [2a. Set up your virtual environment with `venv`](#2a-set-up-your-virtual-environment-with-venv) + - [2b. Set up your virtual environment with `conda`](#2b-set-up-your-virtual-environment-with-conda) + - [3. Install `schematic` dependencies](#3-install-schematic-dependencies) + - [4. Set up configuration files](#4-set-up-configuration-files) + - [5. Get your data model as a `JSON-LD` schema file](#5-get-your-data-model-as-a-json-ld-schema-file) + - [6. Obtain Google credential files](#6-obtain-google-credential-files) + - [7. Verify your setup](#7-verify-your-setup) + - [Installation Guide For: Contributors](#installation-guide-for-contributors) + - [1. Clone the `schematic` package repository](#1-clone-the-schematic-package-repository) + - [2. Install `poetry`](#2-install-poetry) + - [3. Start the virtual environment](#3-start-the-virtual-environment) + - [4. Install `schematic` dependencies](#4-install-schematic-dependencies) + - [5. Set up configuration files](#5-set-up-configuration-files) + - [6. Obtain Google credential files](#6-obtain-google-credential-files) + - [7. Set up pre-commit hooks](#7-set-up-pre-commit-hooks) + - [8. Verify your setup](#8-verify-your-setup) +- [Command Line Usage](#command-line-usage) +- [Docker Usage](#docker-usage) + - [Running the REST API](#running-the-rest-api) + - [Example 1: Using the `config.yml` path](#example-1-using-the-configyml-path) + - [Example 2: Use environment variables](#example-2-use-environment-variables) + - [Running `schematic` to Validate Manifests](#running-schematic-to-validate-manifests) + - [Example for macOS/Linux](#example-for-macoslinux) + - [Example for Windows](#example-for-windows) +- [Contribution Guidelines](#contribution-guidelines) + - [Development process instruction](#development-process-instruction) - [Updating readthedocs documentation](#updating-readthedocs-documentation) - [Update toml file and lock file](#update-toml-file-and-lock-file) - - [Reporting bugs or feature requests](#reporting-bugs-or-feature-requests) -- [Command Line Usage](#command-line-usage) - [Testing](#testing) - [Updating Synapse test resources](#updating-synapse-test-resources) - [Code style](#code-style) +- [Reporting bugs or feature requests](#reporting-bugs-or-feature-requests) - [Contributors](#contributors) + # Introduction SCHEMATIC is an acronym for _Schema Engine for Manifest Ingress and Curation_. The Python based infrastructure provides a _novel_ schema-based, metadata ingress ecosystem, that is meant to streamline the process of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors. @@ -399,7 +419,7 @@ Please visit more documentation [here](https://sage-schematic.readthedocs.io/en/ Here we will demonstrate how to run `schematic` with Docker, with different use-cases for running API endpoints, validating the manifests, and using how to use `schematic` based on your OS (macOS/Linux). -### Running the REST API
+### Running the REST API Use the Docker image to run `schematic`s REST API. You can either use the file path for the `config.yml` created using the installation instructions, or set up authentication with environment variables. @@ -430,7 +450,7 @@ docker run --rm -p 3001:3001 \ sagebionetworks/schematic \ python /usr/src/app/run_api.py ``` -### Running `schematic` to Validate Manifests
+### Running `schematic` to Validate Manifests You can also use Docker to run `schematic` commands like validating manifests. Below are examples for different platforms. #### Example for macOS/Linux From 18e154907408fc9b0732c96aa5f4d7134a0d0daf Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Tue, 15 Oct 2024 13:13:16 -0700 Subject: [PATCH 27/34] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 5f17b0b48..6bb57662a 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,8 @@ * `schematic` (Schema Engine for Manifest Ingress and Curation) is a python-based software tool that streamlines the retrieval, validation, and submission of metadata for biomedical datasets hosted on Sage Bionetworks' Synapse platform. * Users can work with `schematic` in several ways, including through the CLI (see [Command Line Usage](#command-line-usage) for examples), through Docker (see [Docker Usage](#docker-usage) for examples), or with python. -* `schematic` needs to communicate with Synapse and Google Sheets in order for its processes to work. In order for this to happen, users will need to set up their credentials for authentication with Synapse and the Google Sheets API. Setup instructions are available in the Installation Guides: +* `schematic` needs to communicate with Synapse and Google Sheets in order for its processes to work. As such, users will need to set up their credentials for authentication with Synapse and the Google Sheets API. +* To get started with `schematic`, follow one of the Installation Guides depending on your use case: * [Installation Guide For: Schematic CLI users](#installation-guide-for-users) * [Installation Guide For: Contributors](#installation-guide-for-contributors) @@ -83,8 +84,7 @@ If your current Python version is not supported by Schematic, you can switch to ### 2. Set up your virtual environment -Once you are working with a python version supported by Schematic, please activate a virtual environment within which you can install the package. You can -set up your virtual environment. Below we will instruct how to creat your virtual environment with `venv` and with `conda`. +Once you are working with a python version supported by `schematic`, you will need to activate a virtual environment within which you can install the package. Below we will show how to create your virtual environment either with `venv` or with `conda`. #### 2a. Set up your virtual environment with `venv` From 954aa543ec874bd20eb07e414f19dbe52729a3ca Mon Sep 17 00:00:00 2001 From: BryanFauble <17128019+BryanFauble@users.noreply.github.com> Date: Tue, 15 Oct 2024 13:27:00 -0700 Subject: [PATCH 28/34] [FDS-2502] Walk back up directory tree to get location in project (#1518) * Walk back up directory tree to get location in project --- schematic/store/synapse.py | 62 +++++++++- ...nested_manifest_table_and_file_replace.csv | 2 + tests/integration/test_submit_manifest.py | 112 ++++++++++++++++++ tests/integration/test_validate_attribute.py | 25 ++-- tests/test_api.py | 6 +- tests/test_manifest.py | 6 +- tests/test_store.py | 14 ++- 7 files changed, 197 insertions(+), 30 deletions(-) create mode 100644 tests/data/mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv create mode 100644 tests/integration/test_submit_manifest.py diff --git a/schematic/store/synapse.py b/schematic/store/synapse.py index 7ccb810cf..861789374 100644 --- a/schematic/store/synapse.py +++ b/schematic/store/synapse.py @@ -23,6 +23,7 @@ from schematic_db.rdb.synapse_database import SynapseDatabase from synapseclient import ( Column, + Entity, EntityViewSchema, EntityViewType, File, @@ -33,6 +34,7 @@ as_table_columns, ) from synapseclient.api import get_entity_id_bundle2 +from synapseclient.core.constants.concrete_types import PROJECT_ENTITY from synapseclient.core.exceptions import ( SynapseAuthenticationError, SynapseHTTPError, @@ -566,6 +568,55 @@ def getFilesInStorageDataset( self.syn, datasetId, includeTypes=["folder", "file"] ) + current_entity_location = self.syn.get(entity=datasetId, downloadFile=False) + + def walk_back_to_project( + current_location: Entity, location_prefix: str, skip_entry: bool + ) -> str: + """ + Recursively walk back up the project structure to get the paths of the + names of each of the directories where we started the walk function. + + Args: + current_location (Entity): The current entity location in the project structure. + location_prefix (str): The prefix to prepend to the path. + skip_entry (bool): Whether to skip the current entry in the path. When + this is True it means we are looking at our starting point. If our + starting point is the project itself we can go ahead and return + back the project as the prefix. + + Returns: + str: The path of the names of each of the directories up to the project root. + """ + if ( + skip_entry + and "concreteType" in current_location + and current_location["concreteType"] == PROJECT_ENTITY + ): + return f"{current_location.name}/{location_prefix}" + + updated_prefix = ( + location_prefix + if skip_entry + else f"{current_location.name}/{location_prefix}" + ) + if ( + "concreteType" in current_location + and current_location["concreteType"] == PROJECT_ENTITY + ): + return updated_prefix + return walk_back_to_project( + current_location=self.syn.get(entity=current_location["parentId"]), + location_prefix=updated_prefix, + skip_entry=False, + ) + + prefix = walk_back_to_project( + current_location=current_entity_location, + location_prefix="", + skip_entry=True, + ) + project = self.getDatasetProject(datasetId) project_name = self.syn.get(project, downloadFile=False).name file_list = [] @@ -585,17 +636,16 @@ def getFilesInStorageDataset( if fullpath: # append directory path to filename if dirpath[0].startswith(f"{project_name}/"): + path_without_project_prefix = ( + dirpath[0] + "/" + ).removeprefix(f"{project_name}/") path_filename = ( - dirpath[0] + "/" + path_filename[0], + prefix + path_without_project_prefix + path_filename[0], path_filename[1], ) else: path_filename = ( - project_name - + "/" - + dirpath[0] - + "/" - + path_filename[0], + prefix + dirpath[0] + "/" + path_filename[0], path_filename[1], ) diff --git a/tests/data/mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv b/tests/data/mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv new file mode 100644 index 000000000..6bcb468c6 --- /dev/null +++ b/tests/data/mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv @@ -0,0 +1,2 @@ +Filename,Sample ID,File Format,Component,Genome Build,Genome FASTA,Year of Birth,author,confidence,date,eTag,IsImportantBool,IsImportantText,impact,entityId,RandomizedAnnotation +schematic - main/TestDatasets/TestDataset-Annotations-nested-submit/Sample_C.txt,some sample id,FASTQ,BulkRNA-seqAssay,,,,,,,0bf00691-a6e4-4487-9cab-851e22416ed2,FALSE,FALSE,,syn63646199, \ No newline at end of file diff --git a/tests/integration/test_submit_manifest.py b/tests/integration/test_submit_manifest.py new file mode 100644 index 000000000..cc14de487 --- /dev/null +++ b/tests/integration/test_submit_manifest.py @@ -0,0 +1,112 @@ +import io +import logging +import uuid +from typing import Dict, Generator + +import flask +import pytest +from flask.testing import FlaskClient + +from schematic.store.synapse import SynapseStorage +from schematic_api.api import create_app +from tests.conftest import Helpers + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +DATA_MODEL_JSON_LD = "https://raw.githubusercontent.com/Sage-Bionetworks/schematic/develop/tests/data/example.model.jsonld" + + +@pytest.fixture(scope="class") +def app() -> flask.Flask: + app = create_app() + return app + + +@pytest.fixture(scope="class") +def client(app: flask.Flask) -> Generator[FlaskClient, None, None]: + app.config["SCHEMATIC_CONFIG"] = None + + with app.test_client() as client: + yield client + + +@pytest.fixture +def request_headers(syn_token: str) -> Dict[str, str]: + headers = {"Authorization": "Bearer " + syn_token} + return headers + + +@pytest.mark.schematic_api +class TestManifestSubmission: + @pytest.mark.synapse_credentials_needed + @pytest.mark.submission + def test_submit_nested_manifest_table_and_file_replace( + self, + client: FlaskClient, + request_headers: Dict[str, str], + helpers: Helpers, + synapse_store: SynapseStorage, + ) -> None: + # GIVEN the parameters to submit a manifest + params = { + "schema_url": DATA_MODEL_JSON_LD, + "data_type": "BulkRNA-seqAssay", + "restrict_rules": False, + "manifest_record_type": "table_and_file", + "asset_view": "syn63646213", + "dataset_id": "syn63646197", + "table_manipulation": "replace", + "data_model_labels": "class_label", + "table_column_names": "display_name", + } + + # AND a test manifest with a nested file entity + nested_manifest_replace_csv = helpers.get_data_path( + "mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv" + ) + + # AND a randomized annotation we can verify was added + df = helpers.get_data_frame(path=nested_manifest_replace_csv) + randomized_annotation_content = str(uuid.uuid4()) + df["RandomizedAnnotation"] = randomized_annotation_content + csv_file = io.BytesIO() + df.to_csv(csv_file, index=False) + csv_file.seek(0) # Rewind the buffer to the beginning + + # WHEN I submit that manifest + response_csv = client.post( + "http://localhost:3001/v1/model/submit", + query_string=params, + data={"file_name": (csv_file, "test.csv")}, + headers=request_headers, + ) + + # THEN the submission should be successful + assert response_csv.status_code == 200 + + # AND the file should be uploaded to Synapse with the new annotation + modified_file = synapse_store.syn.get(df["entityId"][0], downloadFile=False) + assert modified_file is not None + assert modified_file["RandomizedAnnotation"][0] == randomized_annotation_content + + # AND the manifest should exist in the dataset folder + manifest_synapse_id = synapse_store.syn.findEntityId( + name="synapse_storage_manifest_bulkrna-seqassay.csv", parent="syn63646197" + ) + assert manifest_synapse_id is not None + synapse_manifest_entity = synapse_store.syn.get( + entity=manifest_synapse_id, downloadFile=False + ) + assert synapse_manifest_entity is not None + assert ( + synapse_manifest_entity["_file_handle"]["fileName"] + == "synapse_storage_manifest_bulkrna-seqassay.csv" + ) + + # AND the manifest table is created + expected_table_name = "bulkrna-seqassay_synapse_storage_manifest_table" + synapse_id = synapse_store.syn.findEntityId( + parent="syn23643250", name=expected_table_name + ) + assert synapse_id is not None diff --git a/tests/integration/test_validate_attribute.py b/tests/integration/test_validate_attribute.py index b6d3b74b1..f00de7fde 100644 --- a/tests/integration/test_validate_attribute.py +++ b/tests/integration/test_validate_attribute.py @@ -74,15 +74,16 @@ def test_url_validation_invalid_url(self, dmge: DataModelGraphExplorer) -> None: [], ) - def test__get_target_manifest_dataframes( - self, dmge: DataModelGraphExplorer - ) -> None: - """ - This test checks that the method successfully returns manifests from Synapse - - """ - validator = ValidateAttribute(dmge=dmge) - manifests = validator._get_target_manifest_dataframes( # pylint:disable= protected-access - "patient", project_scope=["syn54126707"] - ) - assert list(manifests.keys()) == ["syn54126997", "syn54127001"] + # See slack discussion, to turn test back on at a later time: https://sagebionetworks.jira.com/browse/FDS-2509 + # def test__get_target_manifest_dataframes( + # self, dmge: DataModelGraphExplorer + # ) -> None: + # """ + # This test checks that the method successfully returns manifests from Synapse + + # """ + # validator = ValidateAttribute(dmge=dmge) + # manifests = validator._get_target_manifest_dataframes( # pylint:disable= protected-access + # "patient", project_scope=["syn54126707"] + # ) + # assert list(manifests.keys()) == ["syn54126997", "syn54127001"] diff --git a/tests/test_api.py b/tests/test_api.py index 08e0bd4a6..0a27b5c73 100644 --- a/tests/test_api.py +++ b/tests/test_api.py @@ -780,9 +780,9 @@ def test_generate_manifest_file_based_annotations( # make sure Filename, entityId, and component get filled with correct value assert google_sheet_df["Filename"].to_list() == [ - "schematic - main/TestDataset-Annotations-v3/Sample_A.txt", - "schematic - main/TestDataset-Annotations-v3/Sample_B.txt", - "schematic - main/TestDataset-Annotations-v3/Sample_C.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_A.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_B.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_C.txt", ] assert google_sheet_df["entityId"].to_list() == [ "syn25614636", diff --git a/tests/test_manifest.py b/tests/test_manifest.py index 06bd7b168..ade80fbe9 100644 --- a/tests/test_manifest.py +++ b/tests/test_manifest.py @@ -213,9 +213,9 @@ def test_get_manifest_first_time(self, manifest): # Confirm contents of Filename column assert output["Filename"].tolist() == [ - "schematic - main/TestDataset-Annotations-v3/Sample_A.txt", - "schematic - main/TestDataset-Annotations-v3/Sample_B.txt", - "schematic - main/TestDataset-Annotations-v3/Sample_C.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_A.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_B.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_C.txt", ] # Test dimensions of data frame diff --git a/tests/test_store.py b/tests/test_store.py index f79761b28..717b4542e 100644 --- a/tests/test_store.py +++ b/tests/test_store.py @@ -11,14 +11,14 @@ import uuid from contextlib import nullcontext as does_not_raise from typing import Any, Callable, Generator -from unittest.mock import AsyncMock, MagicMock, patch +from unittest.mock import AsyncMock, patch import pandas as pd import pytest from pandas.testing import assert_frame_equal from synapseclient import EntityViewSchema, Folder from synapseclient.core.exceptions import SynapseHTTPError -from synapseclient.entity import File +from synapseclient.entity import File, Project from synapseclient.models import Annotations from synapseclient.models import Folder as FolderModel @@ -406,7 +406,7 @@ def test_getDatasetAnnotations(self, dataset_id, synapse_store, force_batch): expected_df = pd.DataFrame.from_records( [ { - "Filename": "schematic - main/TestDataset-Annotations-v3/Sample_A.txt", + "Filename": "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_A.txt", "author": "bruno, milen, sujay", "impact": "42.9", "confidence": "high", @@ -416,13 +416,13 @@ def test_getDatasetAnnotations(self, dataset_id, synapse_store, force_batch): "IsImportantText": "TRUE", }, { - "Filename": "schematic - main/TestDataset-Annotations-v3/Sample_B.txt", + "Filename": "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_B.txt", "confidence": "low", "FileFormat": "csv", "date": "2020-02-01", }, { - "Filename": "schematic - main/TestDataset-Annotations-v3/Sample_C.txt", + "Filename": "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_C.txt", "FileFormat": "fastq", "IsImportantBool": "False", "IsImportantText": "FALSE", @@ -490,7 +490,9 @@ def test_getFilesInStorageDataset(self, synapse_store, full_path, expected): return_value="syn23643250", ) as mock_project_id_patch, patch( "synapseclient.entity.Entity.__getattr__", return_value="schematic - main" - ) as mock_project_name_patch: + ) as mock_project_name_patch, patch.object( + synapse_store.syn, "get", return_value=Project(name="schematic - main") + ): file_list = synapse_store.getFilesInStorageDataset( datasetId="syn_mock", fileNames=None, fullpath=full_path ) From 49c701c1edef7e67eea1ae21216ab9e168c137e5 Mon Sep 17 00:00:00 2001 From: BryanFauble <17128019+BryanFauble@users.noreply.github.com> Date: Tue, 15 Oct 2024 15:07:40 -0700 Subject: [PATCH 29/34] [FDS-2497] Wrap google API execute calls with a 5 attempt retry (#1513) * Wrap google API execute calls with a 5 attempt retry --- schematic/manifest/generator.py | 72 ++++++++++++++++++----------- schematic/utils/google_api_utils.py | 47 +++++++++++++++---- 2 files changed, 83 insertions(+), 36 deletions(-) diff --git a/schematic/manifest/generator.py b/schematic/manifest/generator.py index d954506a5..47acad4b4 100644 --- a/schematic/manifest/generator.py +++ b/schematic/manifest/generator.py @@ -27,6 +27,7 @@ build_service_account_creds, execute_google_api_requests, export_manifest_drive_service, + google_api_execute_wrapper, ) from schematic.utils.schema_utils import ( DisplayLabelType, @@ -190,11 +191,11 @@ def _gdrive_copy_file(self, origin_file_id, copy_title): copied_file = {"name": copy_title} # return new copy sheet ID - return ( + return google_api_execute_wrapper( self.drive_service.files() .copy(fileId=origin_file_id, body=copied_file) - .execute()["id"] - ) + .execute + )["id"] def _create_empty_manifest_spreadsheet(self, title: str) -> str: """ @@ -215,12 +216,11 @@ def _create_empty_manifest_spreadsheet(self, title: str) -> str: else: spreadsheet_body = {"properties": {"title": title}} - spreadsheet_id = ( + spreadsheet_id = google_api_execute_wrapper( self.sheet_service.spreadsheets() .create(body=spreadsheet_body, fields="spreadsheetId") - .execute() - .get("spreadsheetId") - ) + .execute + ).get("spreadsheetId") return spreadsheet_id @@ -265,7 +265,7 @@ def callback(request_id, response, exception): fields="id", ) ) - batch.execute() + google_api_execute_wrapper(batch.execute) def _store_valid_values_as_data_dictionary( self, column_id: int, valid_values: list, spreadsheet_id: str @@ -297,7 +297,7 @@ def _store_valid_values_as_data_dictionary( + str(len(values) + 1) ) valid_values = [{"userEnteredValue": "=" + target_range}] - response = ( + response = google_api_execute_wrapper( self.sheet_service.spreadsheets() .values() .update( @@ -306,7 +306,7 @@ def _store_valid_values_as_data_dictionary( valueInputOption="RAW", body=body, ) - .execute() + .execute ) return valid_values @@ -560,15 +560,31 @@ def _gs_add_and_format_columns(self, required_metadata_fields, spreadsheet_id): range = "Sheet1!A1:" + str(end_col_letter) + "1" # adding columns - self.sheet_service.spreadsheets().values().update( - spreadsheetId=spreadsheet_id, range=range, valueInputOption="RAW", body=body - ).execute() + google_api_execute_wrapper( + self.sheet_service.spreadsheets() + .values() + .update( + spreadsheetId=spreadsheet_id, + range=range, + valueInputOption="RAW", + body=body, + ) + .execute + ) # adding columns to 2nd sheet that can be used for storing data validation ranges (this avoids limitations on number of dropdown items in excel and openoffice) range = "Sheet2!A1:" + str(end_col_letter) + "1" - self.sheet_service.spreadsheets().values().update( - spreadsheetId=spreadsheet_id, range=range, valueInputOption="RAW", body=body - ).execute() + google_api_execute_wrapper( + self.sheet_service.spreadsheets() + .values() + .update( + spreadsheetId=spreadsheet_id, + range=range, + valueInputOption="RAW", + body=body, + ) + .execute + ) # format column header row header_format_body = { @@ -612,10 +628,10 @@ def _gs_add_and_format_columns(self, required_metadata_fields, spreadsheet_id): ] } - response = ( + response = google_api_execute_wrapper( self.sheet_service.spreadsheets() .batchUpdate(spreadsheetId=spreadsheet_id, body=header_format_body) - .execute() + .execute ) return response, ordered_metadata_fields @@ -664,13 +680,13 @@ def _gs_add_additional_metadata( "data": data, } - response = ( + response = google_api_execute_wrapper( self.sheet_service.spreadsheets() .values() .batchUpdate( spreadsheetId=spreadsheet_id, body=batch_update_values_request_body ) - .execute() + .execute ) return response @@ -765,11 +781,11 @@ def _request_regex_match_vr_formatting( split_rules = validation_rules[0].split(" ") if split_rules[0] == "regex" and split_rules[1] == "match": # Set things up: - ## Extract the regular expression we are validating against. + # Extract the regular expression we are validating against. regular_expression = split_rules[2] - ## Define text color to update to upon correct user entry + # Define text color to update to upon correct user entry text_color = {"red": 0, "green": 0, "blue": 0} - ## Define google sheets regular expression formula + # Define google sheets regular expression formula gs_formula = [ { "userEnteredValue": '=REGEXMATCH(INDIRECT("RC",FALSE), "{}")'.format( @@ -777,11 +793,11 @@ def _request_regex_match_vr_formatting( ) } ] - ## Set validaiton strictness based on user specifications. + # Set validaiton strictness based on user specifications. if split_rules[-1].lower() == "strict": strict = True - ## Create error message for users if they enter value with incorrect formatting + # Create error message for users if they enter value with incorrect formatting input_message = ( f"Values in this column are being validated " f"against the following regular expression ({regular_expression}) " @@ -790,7 +806,7 @@ def _request_regex_match_vr_formatting( ) # Create Requests: - ## Change request to change the text color of the column we are validating to red. + # Change request to change the text color of the column we are validating to red. requests_vr_format_body = self._request_update_base_color( i, color={ @@ -800,10 +816,10 @@ def _request_regex_match_vr_formatting( }, ) - ## Create request to for conditionally formatting user input. + # Create request to for conditionally formatting user input. requests_vr = self._request_regex_vr(gs_formula, i, text_color) - ## Create request to generate data validator. + # Create request to generate data validator. requests_data_validation_vr = self._get_column_data_validation_values( spreadsheet_id, valid_values=gs_formula, diff --git a/schematic/utils/google_api_utils.py b/schematic/utils/google_api_utils.py index b705e0419..6f09c0ea7 100644 --- a/schematic/utils/google_api_utils.py +++ b/schematic/utils/google_api_utils.py @@ -2,14 +2,23 @@ # pylint: disable=logging-fstring-interpolation -import os -import logging import json -from typing import Any, Union, no_type_check, TypedDict +import logging +import os +from typing import Any, Callable, TypedDict, Union, no_type_check import pandas as pd -from googleapiclient.discovery import build, Resource # type: ignore from google.oauth2 import service_account # type: ignore +from googleapiclient.discovery import Resource, build # type: ignore +from googleapiclient.errors import HttpError # type: ignore +from tenacity import ( + retry, + retry_if_exception_type, + stop_after_attempt, + wait_chain, + wait_fixed, +) + from schematic.configuration.configuration import CONFIG logger = logging.getLogger(__name__) @@ -86,10 +95,10 @@ def execute_google_api_requests(service, requests_body, **kwargs) -> Any: and kwargs["service_type"] == "batch_update" ): # execute all requests - response = ( + response = google_api_execute_wrapper( service.spreadsheets() .batchUpdate(spreadsheetId=kwargs["spreadsheet_id"], body=requests_body) - .execute() + .execute ) return response @@ -118,10 +127,10 @@ def export_manifest_drive_service( # use google drive # Pylint seems to have trouble with the google api classes, recognizing their methods - data = ( + data = google_api_execute_wrapper( drive_service.files() # pylint: disable=no-member .export(fileId=spreadsheet_id, mimeType=mime_type) - .execute() + .execute ) # open file and write data @@ -145,3 +154,25 @@ def export_manifest_csv(file_path: str, manifest: Union[pd.DataFrame, str]) -> N manifest.to_csv(file_path, index=False) else: export_manifest_drive_service(manifest, file_path, mime_type="text/csv") + + +@retry( + stop=stop_after_attempt(5), + wait=wait_chain( + *[wait_fixed(1) for i in range(2)] + + [wait_fixed(2) for i in range(2)] + + [wait_fixed(5)] + ), + retry=retry_if_exception_type(HttpError), + reraise=True, +) +def google_api_execute_wrapper(api_function_to_call: Callable[[], Any]) -> Any: + """Retry wrapper for Google API calls, with a backoff strategy. + + Args: + api_function_to_call (Callable[[], Any]): The function to call + + Returns: + Any: The result of the API call + """ + return api_function_to_call() From c3d45527dbc91374f14b31f710021120f9512cfa Mon Sep 17 00:00:00 2001 From: Jenny Medina Date: Wed, 16 Oct 2024 09:35:42 -0700 Subject: [PATCH 30/34] Consolidated CONTRIBUTION.md and contributing guidelines in the README --- CONTRIBUTION.md | 104 +++++++++++++++++++++++------------------------- README.md | 88 +--------------------------------------- 2 files changed, 52 insertions(+), 140 deletions(-) diff --git a/CONTRIBUTION.md b/CONTRIBUTION.md index a9876d4df..737c6f465 100644 --- a/CONTRIBUTION.md +++ b/CONTRIBUTION.md @@ -4,78 +4,77 @@ When contributing to this repository, please first discuss the change you wish t Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project. -## How to contribute +## How to report bugs or feature requests -### Reporting bugs or feature requests - -You can use [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8) to **create bug and feature requests**. Providing enough details to the developers to verify and troubleshoot your issue is paramount: +You can **create bug and feature requests** through [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8). Providing enough details to the developers to verify and troubleshoot your issue is paramount: - **Provide a clear and descriptive title as well as a concise summary** of the issue to identify the problem. - **Describe the exact steps which reproduce the problem** in as many details as possible. - **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior. - **Explain which behavior you expected to see** instead and why. - **Provide screenshots of the expected or actual behaviour** where applicable. -### General contribution instructions +## How to contribute code -1. Follow the [Github docs](https://help.github.com/articles/fork-a-repo/) to make a copy (a fork) of the repository to your own Github account. -2. [Clone the forked repository](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository-from-github/cloning-a-repository) to your local machine so you can begin making changes. -3. Make sure this repository is set as the [upstream remote repository](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/configuring-a-remote-for-a-fork) so you are able to fetch the latest commits. -4. Push all your changes to the `develop` branch of the forked repository. +### The development environment setup -*Note*: Make sure you have you have the latest version of the `develop` branch on your local machine. +For setting up your environment, please follow the instructions in the `README.md` under `Installation Guide For: Contributors`. -``` -git checkout develop -git pull upstream develop -``` +### The development workflow -5. Create pull requests to the upstream repository. +For new features, bugs, enhancements: -### The development lifecycle +#### 1. Branch Setup +* Pull the latest code from the develop branch in the upstream repository. +* Checkout a new branch formatted like so: `develop-` from the develop branch -1. Pull the latest content from the `develop` branch of this central repository (not your fork). -2. Create a branch off the `develop` branch. Name the branch appropriately, either briefly summarizing the bug (ex., `spatil/add-restapi-layer`) or feature or simply use the issue number in the name (ex., `spatil/issue-414-fix`). -3. After completing work and testing locally, push the code to the appropriate branch on your fork. -4. In Github, create a pull request from the bug/feature branch of your fork to the `develop` branch of the central repository. +#### 2. Development Workflow +* Develop on your new branch. +* Ensure pyproject.toml and poetry.lock files are compatible with your environment. +* Add changed files for tracking and commit changes using [best practices](https://www.perforce.com/blog/vcs/git-best-practices-git-commit) +* Have granular commits: not “too many” file changes, and not hundreds of code lines of changes +* You can choose to create a draft PR if you prefer to develop this way -> A Sage Bionetworks engineer must review and accept your pull request. A code review (which happens with both the contributor and the reviewer present) is required for contributing. +#### 3. Branch Management +* Push code to `develop-` in upstream repo: + ``` + git push develop- + ``` +* Branch off `develop-` if you need to work on multiple features associated with the same code base +* After feature work is complete and before creating a PR to the develop branch in upstream + a. ensure that code runs locally + b. test for logical correctness locally + c. wait for git workflow to complete (e.g. tests are run) on github -### Development environment setup +#### 4. Pull Request and Review +* Create a PR from `develop-` into the develop branch of the upstream repo +* Request a code review on the PR +* Once code is approved merge in the develop branch. We recommend squashing your commits for a cleaner commit history. +* Once the actions pass on the main branch, delete the `develop-` branch -1. Install [package dependencies](https://sage-schematic.readthedocs.io/en/develop/README.html#installation-requirements-and-pre-requisites). -2. Clone the `schematic` package repository. +### Updating readthedocs documentation +1. Navigate to the docs directory. +2. Run make html to regenerate the build after changes. +3. Contact the development team to publish the updates. -``` -git clone https://github.com/Sage-Bionetworks/schematic.git -``` +*Helpful resources*: -3. [Create and activate](https://sage-schematic.readthedocs.io/en/develop/README.html#virtual-environment-setup) a virtual environment. -4. Run the following commands to build schematic and install the package along with all of its dependencies: +1. [Getting started with Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) +2. [Installing Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) -``` -cd schematic # change directory to schematic -git checkout develop # switch to develop branch of schematic -poetry build # build source and wheel archives -pip install dist/schematicpy-x.y.z-py3-none-any.whl # install wheel file -``` - -*Note*: Use the appropriate version number (based on the version of the codebase you are pulling) while installing the wheel file above. - -5. [Obtain](https://sage-schematic.readthedocs.io/en/develop/README.html#obtain-google-credentials-file-s) appropriate Google credentials file(s). -6. [Obtain and Fill in](https://sage-schematic.readthedocs.io/en/develop/README.html#fill-in-configuration-file-s) the `config.yml` file and the `.synapseConfig` file as well as described in the `Fill in Configuration File(s)` part of the documentation. -7. [Run](https://docs.pytest.org/en/stable/usage.html) the test suite. +### Update toml file and lock file +If you install external libraries by using `poetry add `, please make sure that you include `pyproject.toml` and `poetry.lock` file in your commit. -*Note*: To ensure that all tests run successfully, contact your DCC liason and request to be added to the `schematic-dev` [team](https://www.synapse.org/#!Team:3419888) on Synapse. +### Code style -8. To test new changes made to any of the modules within `schematic`, do the following: +To ensure consistent code formatting across the project, we use the `pre-commit` hook. You can manually run `pre-commit` across the respository before making a pull request like so: ``` -# make changes to any files or modules -pip uninstall schematicpy # uninstall package -poetry build -pip install dist/schematicpy-x.y.z-py3-none-any.whl # install wheel file +pre-commit run --all-files ``` +Further, please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. +Be consistent and follow existing code conventions and spirit. + ## Release process Once the code has been merged into the `develop` branch on this repo, there are two processes that need to be completed to ensure a _release_ is complete. @@ -109,12 +108,13 @@ poetry publish # publish the package to PyPI > You'll need to [register](https://pypi.org/account/register/) for a PyPI account before uploading packages to the package index. Similarly for [Test PyPI](https://test.pypi.org/account/register/) as well. -## Testing +## Testing -All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the [tests](https://github.com/Sage-Bionetworks/schematic/tree/develop-docs-update/tests) subdirectory. +* All new code must include tests. -You can run the test suite in the following way: +* Tests are written using pytest and are located in the tests/ subdirectory. +* Run tests with: ``` pytest -vs tests/ ``` @@ -128,7 +128,3 @@ pytest -vs tests/ 5. Once the PR is merged, leave the original copies on Synapse to maintain support for feature branches that were forked from `develop` before your update. - If the old copies are problematic and need to be removed immediately (_e.g._ contain sensitive data), proceed with the deletion and alert the other contributors that they need to merge the latest `develop` branch into their feature branches for their tests to work. -## Code style - -* Please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. -* Be consistent and follow existing code conventions and spirit. diff --git a/README.md b/README.md index 6bb57662a..876e60e2d 100644 --- a/README.md +++ b/README.md @@ -232,6 +232,8 @@ The instructions below assume you have already installed [python](https://www.py When contributing to this repository, please first discuss the change you wish to make via the [service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8) so that we may track these changes. +Once you have finished setting up your development environment using the instructions below, please follow the guidelines in the `CONTRIBUTING.md` during your development. + Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project. ### 1. Clone the `schematic` package repository @@ -487,92 +489,6 @@ docker run -v %cd%:/schematic \ -c config.yml validate -mp tests/data/mock_manifests/inValid_Test_Manifest.csv -dt MockComponent -js /schematic/data/example.model.jsonld ``` -# Contribution Guidelines -### Development process instruction - -For new features, bugs, enhancements: - -#### 1. Branch Setup -* Pull the latest code from the develop branch in the upstream repository. -* Checkout a new branch formatted like so: `develop-` from the develop branch - -#### 2. Development Workflow -* Develop on your new branch. -* Ensure pyproject.toml and poetry.lock files are compatible with your environment. -* Add changed files for tracking and commit changes using [best practices](https://www.perforce.com/blog/vcs/git-best-practices-git-commit) -* Have granular commits: not “too many” file changes, and not hundreds of code lines of changes -* You can choose to create a draft PR if you prefer to develop this way - -#### 3. Branch Management -* Push code to `develop-` in upstream repo: - ``` - git push develop- - ``` -* Branch off `develop-` if you need to work on multiple features associated with the same code base -* After feature work is complete and before creating a PR to the develop branch in upstream - a. ensure that code runs locally - b. test for logical correctness locally - c. wait for git workflow to complete (e.g. tests are run) on github - -#### 4. Pull Request and Review -* Create a PR from `develop-` into the develop branch of the upstream repo -* Request a code review on the PR -* Once code is approved merge in the develop branch. We recommend squashing your commits for a cleaner commit history. -* Once the actions pass on the main branch, delete the `develop-` branch - -## Updating readthedocs documentation -1. Navigate to the docs directory. -2. Run make html to regenerate the build after changes. -3. Contact the development team to publish the updates. - -*Helpful resources*: - -1. [Getting started with Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) -2. [Installing Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) - -## Update toml file and lock file -If you install external libraries by using `poetry add `, please make sure that you include `pyproject.toml` and `poetry.lock` file in your commit. - -## Testing - -* All new code must include tests. - -* Tests are written using pytest and are located in the tests/ subdirectory. - -* Run tests with: -``` -pytest -vs tests/ -``` - -### Updating Synapse test resources - -1. Duplicate the entity being updated (or folder if applicable). -2. Edit the duplicates (_e.g._ annotations, contents, name). -3. Update the test suite in your branch to use these duplicates, including the expected values in the test assertions. -4. Open a PR as per the usual process (see above). -5. Once the PR is merged, leave the original copies on Synapse to maintain support for feature branches that were forked from `develop` before your update. - - If the old copies are problematic and need to be removed immediately (_e.g._ contain sensitive data), proceed with the deletion and alert the other contributors that they need to merge the latest `develop` branch into their feature branches for their tests to work. - -## Code style - -To ensure consistent code formatting across the project, we use the `pre-commit` hook. You can manually run `pre-commit` across the respository before making a pull request like so: - -``` -pre-commit run --all-files -``` - -Further, please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. -Be consistent and follow existing code conventions and spirit. - - -# Reporting bugs or feature requests -You can **create bug and feature requests** through [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8). Providing enough details to the developers to verify and troubleshoot your issue is paramount: -- **Provide a clear and descriptive title as well as a concise summary** of the issue to identify the problem. -- **Describe the exact steps which reproduce the problem** in as many details as possible. -- **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior. -- **Explain which behavior you expected to see** instead and why. -- **Provide screenshots of the expected or actual behaviour** where applicable. - # Contributors Main contributors and developers: From df8565598a4635dd0dc771e2e28d5da7bfbaec92 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 16 Oct 2024 09:42:28 -0700 Subject: [PATCH 31/34] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 876e60e2d..12d5b7436 100644 --- a/README.md +++ b/README.md @@ -232,7 +232,7 @@ The instructions below assume you have already installed [python](https://www.py When contributing to this repository, please first discuss the change you wish to make via the [service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8) so that we may track these changes. -Once you have finished setting up your development environment using the instructions below, please follow the guidelines in the `CONTRIBUTING.md` during your development. +Once you have finished setting up your development environment using the instructions below, please follow the guidelines in [CONTRIBUTION.md](https://github.com/Sage-Bionetworks/schematic/blob/develop-fds-2218-update-readme/CONTRIBUTION.md) during your development. Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project. From 48fbd2c8aa83444e4f0ffa1b46a8bfe96471b8d5 Mon Sep 17 00:00:00 2001 From: Jenny Medina Date: Wed, 16 Oct 2024 09:53:37 -0700 Subject: [PATCH 32/34] Fix broken links --- CONTRIBUTION.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTION.md b/CONTRIBUTION.md index 737c6f465..956cb13ff 100644 --- a/CONTRIBUTION.md +++ b/CONTRIBUTION.md @@ -58,8 +58,8 @@ For new features, bugs, enhancements: *Helpful resources*: -1. [Getting started with Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) -2. [Installing Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) +1. [Getting started with Sphinx](https://www.sphinx-doc.org/en/master/usage/quickstart.html) +2. [Installing Sphinx](https://www.sphinx-doc.org/en/master/usage/installation.html) ### Update toml file and lock file If you install external libraries by using `poetry add `, please make sure that you include `pyproject.toml` and `poetry.lock` file in your commit. From 46f09988b4d3125a472256d0178c4cfc3050a224 Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 16 Oct 2024 15:10:19 -0700 Subject: [PATCH 33/34] Gianna comments --- CONTRIBUTION.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTION.md b/CONTRIBUTION.md index 956cb13ff..930fbea81 100644 --- a/CONTRIBUTION.md +++ b/CONTRIBUTION.md @@ -43,12 +43,13 @@ For new features, bugs, enhancements: * After feature work is complete and before creating a PR to the develop branch in upstream a. ensure that code runs locally b. test for logical correctness locally + c. run `pre-commit` to style code if the hook is not installed c. wait for git workflow to complete (e.g. tests are run) on github #### 4. Pull Request and Review * Create a PR from `develop-` into the develop branch of the upstream repo * Request a code review on the PR -* Once code is approved merge in the develop branch. We recommend squashing your commits for a cleaner commit history. +* Once code is approved merge in the develop branch. We suggest creating a merge commit for a cleaner commit history on the `develop` branch. * Once the actions pass on the main branch, delete the `develop-` branch ### Updating readthedocs documentation @@ -112,7 +113,7 @@ poetry publish # publish the package to PyPI * All new code must include tests. -* Tests are written using pytest and are located in the tests/ subdirectory. +* Tests are written using pytest and are located in the [tests/](https://github.com/Sage-Bionetworks/schematic/tree/develop/tests) subdirectory. * Run tests with: ``` From d0e9ee96b2a43d7a325c4c2a1861544179b9e0fc Mon Sep 17 00:00:00 2001 From: Jenny V Medina Date: Wed, 16 Oct 2024 15:44:30 -0700 Subject: [PATCH 34/34] Update table of contents --- README.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/README.md b/README.md index 12d5b7436..72a30b70f 100644 --- a/README.md +++ b/README.md @@ -44,14 +44,6 @@ - [Running `schematic` to Validate Manifests](#running-schematic-to-validate-manifests) - [Example for macOS/Linux](#example-for-macoslinux) - [Example for Windows](#example-for-windows) -- [Contribution Guidelines](#contribution-guidelines) - - [Development process instruction](#development-process-instruction) - - [Updating readthedocs documentation](#updating-readthedocs-documentation) - - [Update toml file and lock file](#update-toml-file-and-lock-file) -- [Testing](#testing) - - [Updating Synapse test resources](#updating-synapse-test-resources) -- [Code style](#code-style) -- [Reporting bugs or feature requests](#reporting-bugs-or-feature-requests) - [Contributors](#contributors)