diff --git a/CONTRIBUTION.md b/CONTRIBUTION.md index a9876d4df..930fbea81 100644 --- a/CONTRIBUTION.md +++ b/CONTRIBUTION.md @@ -4,78 +4,78 @@ When contributing to this repository, please first discuss the change you wish t Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project. -## How to contribute +## How to report bugs or feature requests -### Reporting bugs or feature requests - -You can use [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8) to **create bug and feature requests**. Providing enough details to the developers to verify and troubleshoot your issue is paramount: +You can **create bug and feature requests** through [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8). Providing enough details to the developers to verify and troubleshoot your issue is paramount: - **Provide a clear and descriptive title as well as a concise summary** of the issue to identify the problem. - **Describe the exact steps which reproduce the problem** in as many details as possible. - **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior. - **Explain which behavior you expected to see** instead and why. - **Provide screenshots of the expected or actual behaviour** where applicable. -### General contribution instructions +## How to contribute code -1. Follow the [Github docs](https://help.github.com/articles/fork-a-repo/) to make a copy (a fork) of the repository to your own Github account. -2. [Clone the forked repository](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository-from-github/cloning-a-repository) to your local machine so you can begin making changes. -3. Make sure this repository is set as the [upstream remote repository](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/configuring-a-remote-for-a-fork) so you are able to fetch the latest commits. -4. Push all your changes to the `develop` branch of the forked repository. +### The development environment setup -*Note*: Make sure you have you have the latest version of the `develop` branch on your local machine. +For setting up your environment, please follow the instructions in the `README.md` under `Installation Guide For: Contributors`. -``` -git checkout develop -git pull upstream develop -``` +### The development workflow -5. Create pull requests to the upstream repository. +For new features, bugs, enhancements: -### The development lifecycle +#### 1. Branch Setup +* Pull the latest code from the develop branch in the upstream repository. +* Checkout a new branch formatted like so: `develop-` from the develop branch -1. Pull the latest content from the `develop` branch of this central repository (not your fork). -2. Create a branch off the `develop` branch. Name the branch appropriately, either briefly summarizing the bug (ex., `spatil/add-restapi-layer`) or feature or simply use the issue number in the name (ex., `spatil/issue-414-fix`). -3. After completing work and testing locally, push the code to the appropriate branch on your fork. -4. In Github, create a pull request from the bug/feature branch of your fork to the `develop` branch of the central repository. +#### 2. Development Workflow +* Develop on your new branch. +* Ensure pyproject.toml and poetry.lock files are compatible with your environment. +* Add changed files for tracking and commit changes using [best practices](https://www.perforce.com/blog/vcs/git-best-practices-git-commit) +* Have granular commits: not “too many” file changes, and not hundreds of code lines of changes +* You can choose to create a draft PR if you prefer to develop this way -> A Sage Bionetworks engineer must review and accept your pull request. A code review (which happens with both the contributor and the reviewer present) is required for contributing. +#### 3. Branch Management +* Push code to `develop-` in upstream repo: + ``` + git push develop- + ``` +* Branch off `develop-` if you need to work on multiple features associated with the same code base +* After feature work is complete and before creating a PR to the develop branch in upstream + a. ensure that code runs locally + b. test for logical correctness locally + c. run `pre-commit` to style code if the hook is not installed + c. wait for git workflow to complete (e.g. tests are run) on github -### Development environment setup +#### 4. Pull Request and Review +* Create a PR from `develop-` into the develop branch of the upstream repo +* Request a code review on the PR +* Once code is approved merge in the develop branch. We suggest creating a merge commit for a cleaner commit history on the `develop` branch. +* Once the actions pass on the main branch, delete the `develop-` branch -1. Install [package dependencies](https://sage-schematic.readthedocs.io/en/develop/README.html#installation-requirements-and-pre-requisites). -2. Clone the `schematic` package repository. +### Updating readthedocs documentation +1. Navigate to the docs directory. +2. Run make html to regenerate the build after changes. +3. Contact the development team to publish the updates. -``` -git clone https://github.com/Sage-Bionetworks/schematic.git -``` +*Helpful resources*: -3. [Create and activate](https://sage-schematic.readthedocs.io/en/develop/README.html#virtual-environment-setup) a virtual environment. -4. Run the following commands to build schematic and install the package along with all of its dependencies: +1. [Getting started with Sphinx](https://www.sphinx-doc.org/en/master/usage/quickstart.html) +2. [Installing Sphinx](https://www.sphinx-doc.org/en/master/usage/installation.html) -``` -cd schematic # change directory to schematic -git checkout develop # switch to develop branch of schematic -poetry build # build source and wheel archives -pip install dist/schematicpy-x.y.z-py3-none-any.whl # install wheel file -``` - -*Note*: Use the appropriate version number (based on the version of the codebase you are pulling) while installing the wheel file above. - -5. [Obtain](https://sage-schematic.readthedocs.io/en/develop/README.html#obtain-google-credentials-file-s) appropriate Google credentials file(s). -6. [Obtain and Fill in](https://sage-schematic.readthedocs.io/en/develop/README.html#fill-in-configuration-file-s) the `config.yml` file and the `.synapseConfig` file as well as described in the `Fill in Configuration File(s)` part of the documentation. -7. [Run](https://docs.pytest.org/en/stable/usage.html) the test suite. +### Update toml file and lock file +If you install external libraries by using `poetry add `, please make sure that you include `pyproject.toml` and `poetry.lock` file in your commit. -*Note*: To ensure that all tests run successfully, contact your DCC liason and request to be added to the `schematic-dev` [team](https://www.synapse.org/#!Team:3419888) on Synapse. +### Code style -8. To test new changes made to any of the modules within `schematic`, do the following: +To ensure consistent code formatting across the project, we use the `pre-commit` hook. You can manually run `pre-commit` across the respository before making a pull request like so: ``` -# make changes to any files or modules -pip uninstall schematicpy # uninstall package -poetry build -pip install dist/schematicpy-x.y.z-py3-none-any.whl # install wheel file +pre-commit run --all-files ``` +Further, please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. +Be consistent and follow existing code conventions and spirit. + ## Release process Once the code has been merged into the `develop` branch on this repo, there are two processes that need to be completed to ensure a _release_ is complete. @@ -109,12 +109,13 @@ poetry publish # publish the package to PyPI > You'll need to [register](https://pypi.org/account/register/) for a PyPI account before uploading packages to the package index. Similarly for [Test PyPI](https://test.pypi.org/account/register/) as well. -## Testing +## Testing -All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the [tests](https://github.com/Sage-Bionetworks/schematic/tree/develop-docs-update/tests) subdirectory. +* All new code must include tests. -You can run the test suite in the following way: +* Tests are written using pytest and are located in the [tests/](https://github.com/Sage-Bionetworks/schematic/tree/develop/tests) subdirectory. +* Run tests with: ``` pytest -vs tests/ ``` @@ -128,7 +129,3 @@ pytest -vs tests/ 5. Once the PR is merged, leave the original copies on Synapse to maintain support for feature branches that were forked from `develop` before your update. - If the old copies are problematic and need to be removed immediately (_e.g._ contain sensitive data), proceed with the deletion and alert the other contributors that they need to merge the latest `develop` branch into their feature branches for their tests to work. -## Code style - -* Please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. -* Be consistent and follow existing code conventions and spirit. diff --git a/README.md b/README.md index cf1cd96f6..72a30b70f 100644 --- a/README.md +++ b/README.md @@ -1,65 +1,119 @@ # Schematic [![Build Status](https://img.shields.io/endpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2FSage-Bionetworks%2Fschematic%2Fbadge%3Fref%3Ddevelop&style=flat)](https://actions-badge.atrox.dev/Sage-Bionetworks/schematic/goto?ref=develop) [![Documentation Status](https://readthedocs.org/projects/sage-schematic/badge/?version=develop)](https://sage-schematic.readthedocs.io/en/develop/?badge=develop) [![PyPI version](https://badge.fury.io/py/schematicpy.svg)](https://badge.fury.io/py/schematicpy) -# Table of contents +# TL;DR + +* `schematic` (Schema Engine for Manifest Ingress and Curation) is a python-based software tool that streamlines the retrieval, validation, and submission of metadata for biomedical datasets hosted on Sage Bionetworks' Synapse platform. +* Users can work with `schematic` in several ways, including through the CLI (see [Command Line Usage](#command-line-usage) for examples), through Docker (see [Docker Usage](#docker-usage) for examples), or with python. +* `schematic` needs to communicate with Synapse and Google Sheets in order for its processes to work. As such, users will need to set up their credentials for authentication with Synapse and the Google Sheets API. +* To get started with `schematic`, follow one of the Installation Guides depending on your use case: + * [Installation Guide For: Schematic CLI users](#installation-guide-for-users) + * [Installation Guide For: Contributors](#installation-guide-for-contributors) + +# Table of Contents - [Schematic](#schematic) -- [Table of contents](#table-of-contents) +- [TL;DR](#tldr) +- [Table of Contents](#table-of-contents) - [Introduction](#introduction) - [Installation](#installation) - [Installation Requirements](#installation-requirements) - - [Installation guide for Schematic CLI users](#installation-guide-for-schematic-cli-users) - - [Installation guide for developers/contributors](#installation-guide-for-developerscontributors) - - [Development environment setup](#development-environment-setup) - - [Development process instruction](#development-process-instruction) - - [Example For REST API ](#example-for-rest-api-) - - [Use file path of `config.yml` to run API endpoints:](#use-file-path-of-configyml-to-run-api-endpoints) - - [Use content of `config.yml` and `schematic_service_account_creds.json`as an environment variable to run API endpoints:](#use-content-of-configyml-and-schematic_service_account_credsjsonas-an-environment-variable-to-run-api-endpoints) - - [Example For Schematic on mac/linux ](#example-for-schematic-on-maclinux-) - - [Example For Schematic on Windows ](#example-for-schematic-on-windows-) -- [Other Contribution Guidelines](#other-contribution-guidelines) - - [Updating readthedocs documentation](#updating-readthedocs-documentation) - - [Update toml file and lock file](#update-toml-file-and-lock-file) - - [Reporting bugs or feature requests](#reporting-bugs-or-feature-requests) + - [Installation Guide For: Users](#installation-guide-for-users) + - [1. Verify your python version](#1-verify-your-python-version) + - [2. Set up your virtual environment](#2-set-up-your-virtual-environment) + - [2a. Set up your virtual environment with `venv`](#2a-set-up-your-virtual-environment-with-venv) + - [2b. Set up your virtual environment with `conda`](#2b-set-up-your-virtual-environment-with-conda) + - [3. Install `schematic` dependencies](#3-install-schematic-dependencies) + - [4. Set up configuration files](#4-set-up-configuration-files) + - [5. Get your data model as a `JSON-LD` schema file](#5-get-your-data-model-as-a-json-ld-schema-file) + - [6. Obtain Google credential files](#6-obtain-google-credential-files) + - [7. Verify your setup](#7-verify-your-setup) + - [Installation Guide For: Contributors](#installation-guide-for-contributors) + - [1. Clone the `schematic` package repository](#1-clone-the-schematic-package-repository) + - [2. Install `poetry`](#2-install-poetry) + - [3. Start the virtual environment](#3-start-the-virtual-environment) + - [4. Install `schematic` dependencies](#4-install-schematic-dependencies) + - [5. Set up configuration files](#5-set-up-configuration-files) + - [6. Obtain Google credential files](#6-obtain-google-credential-files) + - [7. Set up pre-commit hooks](#7-set-up-pre-commit-hooks) + - [8. Verify your setup](#8-verify-your-setup) - [Command Line Usage](#command-line-usage) -- [Testing](#testing) - - [Updating Synapse test resources](#updating-synapse-test-resources) -- [Code style](#code-style) +- [Docker Usage](#docker-usage) + - [Running the REST API](#running-the-rest-api) + - [Example 1: Using the `config.yml` path](#example-1-using-the-configyml-path) + - [Example 2: Use environment variables](#example-2-use-environment-variables) + - [Running `schematic` to Validate Manifests](#running-schematic-to-validate-manifests) + - [Example for macOS/Linux](#example-for-macoslinux) + - [Example for Windows](#example-for-windows) - [Contributors](#contributors) + # Introduction SCHEMATIC is an acronym for _Schema Engine for Manifest Ingress and Curation_. The Python based infrastructure provides a _novel_ schema-based, metadata ingress ecosystem, that is meant to streamline the process of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors. # Installation ## Installation Requirements -* Python version 3.9.0≤x<3.11.0 +* Your installed python version must be 3.9.0 ≤ version < 3.11.0 * You need to be a registered and certified user on [`synapse.org`](https://www.synapse.org/) -Note: Our credential policy for Google credentials in order to create Google sheet files from Schematic, see tutorial ['HERE'](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA). If you plan to use `config.yml`, please ensure that the path of `schematic_service_account_creds.json` is indicated there (see `google_sheets > service_account_creds` section) +> [!NOTE] +> To create Google Sheets files from Schematic, please follow our credential policy for Google credentials. You can find a detailed tutorial [here](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA). +> If you're using config.yml, make sure to specify the path to `schematic_service_account_creds.json` (see the `google_sheets > service_account_creds` section for more information). -## Installation guide for Schematic CLI users -1. **Verifying Python Version Compatibility** +## Installation Guide For: Users -To ensure compatibility with Schematic, please follow these steps: +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section, and do not have a Python environment already active. -Check your own Python version: +### 1. Verify your python version + +Ensure your python version meets the requirements from the [Installation Requirements](#installation-requirements) section using the following command: ``` python3 --version ``` +If your current Python version is not supported by Schematic, you can switch to the supported version using a tool like [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#switch-between-python-versions). Follow the instructions in the pyenv documentation to install and switch between Python versions easily. + +> [!NOTE] +> You can double-check the current supported python version by opening up the [pyproject.toml](https://github.com/Sage-Bionetworks/schematic/blob/main/pyproject.toml#L39) file in this repository and find the supported versions of python in the script. + +### 2. Set up your virtual environment -Check the Supported Python Version: Open the pyproject.toml file in the Schematic repository to find the version of Python that is supported. You can view this file directly on GitHub [here](https://github.com/Sage-Bionetworks/schematic/blob/main/pyproject.toml#L39). +Once you are working with a python version supported by `schematic`, you will need to activate a virtual environment within which you can install the package. Below we will show how to create your virtual environment either with `venv` or with `conda`. -Switching Python Versions: If your current Python version is not supported by Schematic, you can switch to the supported version using tools like [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#switch-between-python-versions). Follow the instructions in the pyenv documentation to install and switch between Python versions easily. +#### 2a. Set up your virtual environment with `venv` -2. **Setting Up the Virtual Environment** +Python 3 has built-in support for virtual environments with the `venv` module, so you no longer need to install `virtualenv`: -After switching to the version of Python supported by Schematic, please activate a virtual environment within which you can install the package: ``` python3 -m venv .venv source .venv/bin/activate ``` -Note: Python 3 has built-in support for virtual environments with the venv module, so you no longer need to install virtualenv. -3. **Installing Schematic** +#### 2b. Set up your virtual environment with `conda` + +`conda` is a powerful package and environment management tool that allows users to create isolated environments used particularly in data science and machine learning workflows. If you would like to manage your environments with `conda`, continue reading: + +1. **Download your preferred `conda` installer**: Begin by [installing `conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html). We personally recommend working with `Miniconda` which is a lightweight installer for `conda` that includes only `conda` and its dependencies. + +2. **Execute the `conda` installer**: Once you have downloaded your preferred installer, execute it using `bash` or `zsh`, depending on the shell configured for your terminal environment. For example: + + ``` + bash Miniconda3-latest-MacOSX-arm64.sh + ``` + +3. **Verify your `conda` setup**: Follow the prompts to complete your setup. Then verify your setup by running the `conda` command. + +4. **Create your `schematic` environment**: Begin by creating a fresh `conda` environment for `schematic` like so: + + ``` + conda create --name 'schematicpy' python=3.10 + ``` + +5. **Activate the environment**: Once your environment is set up, you can now activate your new environment with `conda`: + + ``` + conda activate schematicpy + ``` + +### 3. Install `schematic` dependencies Install the package using [pip](https://pip.pypa.io/en/stable/quickstart/): @@ -67,140 +121,248 @@ Install the package using [pip](https://pip.pypa.io/en/stable/quickstart/): python3 -m pip install schematicpy ``` -If you run into error: Failed building wheel for numpy, the error might be able to resolve by upgrading pip. Please try to upgrade pip by: +If you run into `ERROR: Failed building wheel for numpy`, the error might be able to resolve by upgrading pip. Please try to upgrade pip by: ``` pip3 install --upgrade pip ``` -## Installation guide for developers/contributors +### 4. Set up configuration files + +The following section will walk through setting up your configuration files with your credentials to allow for communication between `schematic` and the Synapse API. + +There are two main configuration files that need to be created + modified: +- `.synapseConfig` +- `config.yml` + +**Create and modify the `.synapseConfig`** + +The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. +You can automatically generate a `.synapseConfig` file by running the following in your command line and following the prompts. + +>[!TIP] +>You can generate a new authentication token on the Synapse website by going to `Account Settings` > `Personal Access Tokens`. + +``` +synapse config +``` + +After following the prompts, a new `.synapseConfig` file and `.synapseCache` folder will be created in your home directory. You can view these hidden +assets in your home directory with the following command: + +``` +ls -a ~ +``` + +The `.synapseConfig` is used to log into Synapse if you are not using an environment variable (i.e. `SYNAPSE_ACCESS_TOKEN`) for authentication, and the `.synapseCache` is where your assets are stored if you are not working with the CLI and/or you have specified `.synapseCache` as the location in which to store your manfiests, in your `config.yml` (more on the `config.yml` below). + +**Create and modify the `config.yml`** + +In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, +such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. + +Download the `config_example.yml` as a new file called `config.yml` and modify its contents according to your use case. + +For example, if you wanted to change the folder where manifests are downloaded your config should look like: + +```text +manifest: + manifest_folder: "my_manifest_folder_path" +``` + +> [!IMPORTANT] +> Be sure to update your `config.yml` with the location of your `.synapseConfig` created in the step above, to avoid authentication errors. Paths can be specified relative to the `config.yml` file or as absolute paths. + +> [!NOTE] +> `config.yml` is ignored by git. + +### 5. Get your data model as a `JSON-LD` schema file + +Now you need a schema file, e.g. `model.jsonld`, to have a data model that schematic can work with. While you can download a super basic example data model [here](https://raw.githubusercontent.com/Sage-Bionetworks/schematic/refs/heads/develop/tests/data/example.model.jsonld), you’ll probably be working with a DCC-specific data model. For non-Sage employees/contributors using the CLI, you might care only about the minimum needed artifact, which is the `.jsonld`; locate and download only that from the right repo. + +Here are some example repos with schema files: +* https://github.com/ncihtan/data-models/ +* https://github.com/nf-osi/nf-metadata-dictionary/ + +> [!IMPORTANT] +> Your local working directory would typically have `model.jsonld` and `config.yml` side-by-side. The path to your data model should match what is in `config.yml` + +### 6. Obtain Google credential files + +Any function that interacts with a google sheet (such as `schematic manifest get`) requires google cloud credentials. -When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change. +1. **Option 1**: [Here](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA?referrer=workspace)’s a step-by-step guide on how to create these credentials in Google Cloud. + * Depending on your institution's policies, your institutional Google account may or may not have the required permissions to complete this. A possible workaround is to use a personal or temporary Google account. + +> [!WARNING] +> At the time of writing, Sage Bionetworks employees do not have the appropriate permissions to create projects with their Sage Bionetworks Google accounts. You would follow instructions using a personal Google account. + +2. **Option 2**: Ask your DCC/development team if they have credentials previously set up with a service account. + +Once you have obtained credentials, be sure that the json file generated is named in the same way as the `service_acct_creds` parameter in your `config.yml` file. + +> [!NOTE] +> Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). +schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. +Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality +requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate +token-based authentication and keep only service account authentication in the future. + +> [!NOTE] +> Use the ``schematic_service_account_creds.json`` file for the service +> account mode of authentication (*for Google services/APIs*). Service accounts +> are special Google accounts that can be used by applications to access Google APIs +> programmatically via OAuth2.0, with the advantage being that they do not require +> human authorization. + +### 7. Verify your setup +After running the steps above, your setup is complete, and you can test it on a `python` instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. + +## Installation Guide For: Contributors + +The instructions below assume you have already installed [python](https://www.python.org/downloads/), with the release version meeting the constraints set in the [Installation Requirements](#installation-requirements) section, and do not have an environment already active (e.g. with `pyenv`). For development, we recommend working with versions > python 3.9 to avoid issues with `pre-commit`'s default hook configuration. + +When contributing to this repository, please first discuss the change you wish to make via the [service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8) so that we may track these changes. + +Once you have finished setting up your development environment using the instructions below, please follow the guidelines in [CONTRIBUTION.md](https://github.com/Sage-Bionetworks/schematic/blob/develop-fds-2218-update-readme/CONTRIBUTION.md) during your development. Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in all your interactions with the project. -### Development environment setup -1. Clone the `schematic` package repository. +### 1. Clone the `schematic` package repository + +For development, you will be working with the latest version of `schematic` on the repository to ensure compatibility between its latest state and your changes. Ensure your current working directory is where +you would like to store your local fork before running the following command: + ``` git clone https://github.com/Sage-Bionetworks/schematic.git ``` -2. Install `poetry` (version 1.3.0 or later) using either the [official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or [pipx](https://python-poetry.org/docs/#installing-with-pipx). If you have an older installation of Poetry, we recommend uninstalling it first. -3. Start the virtual environment by doing: +### 2. Install `poetry` + +Install `poetry` (version 1.3.0 or later) using either the [official installer](https://python-poetry.org/docs/#installing-with-the-official-installer) or `pip`. If you have an older installation of Poetry, we recommend uninstalling it first. + +``` +pip install poetry +``` + +Check to make sure your version of poetry is > v1.3.0 + +``` +poetry --version +``` + +### 3. Start the virtual environment + +`cd` into your cloned `schematic` repository, and initialize the virtual environment using the following command with `poetry`: + ``` poetry shell ``` -4. Install the dependencies by doing: + +To make sure your poetry version and python version are consistent with the versions you expect, you can run the following command: + +``` +poetry debug info +``` + +### 4. Install `schematic` dependencies + +Before you begin, make sure you are in the latest `develop` of the repository. + +The following command will install the dependencies based on what we specify in the `poetry.lock` file of this repository. If this step is taking a long time, try to go back to Step 2 and check your version of `poetry`. Alternatively, you can try deleting the lock file and regenerate it by doing `poetry install` (Please note this method should be used as a last resort because this would force other developers to change their development environment) + ``` poetry install --all-extras ``` -This command will install the dependencies based on what we specify in poetry.lock. If this step is taking a long time, try to go back to step 2 and check your version of poetry. Alternatively, you could also try deleting the lock file and regenerate it by doing `poetry install` (Please note this method should be used as a last resort because this would force other developers to change their development environment) +### 5. Set up configuration files -5. Fill in credential files: -*Note*: If you won't interact with Synapse, please ignore this section. +The following section will walk through setting up your configuration files with your credentials to allow for communication between `schematic` and the Synapse API. -There are two main configuration files that need to be edited: -- config.yml -- [synapseConfig](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/master/synapseclient/.synapseConfig) +There are two main configuration files that need to be created + modified: +- `.synapseConfig` +- `config.yml` -Configure .synapseConfig File +**Create and modify the `.synapseConfig`** -Download a copy of the ``.synapseConfig`` file, open the file in the editor of your -choice and edit the `username` and `authtoken` attribute under the `authentication` -section. **Note:** You must place the file at the root of the project like -`{project_root}/.synapseConfig` in order for any authenticated tests to work. +The `.synapseConfig` file is what enables communication between `schematic` and the Synapse API using your credentials. +You can automatically generate a `.synapseConfig` file by running the following in your command line and following the prompts. -*Note*: You could also visit [configparser](https://docs.python.org/3/library/configparser.html#module-configparser>) doc to see the format that `.synapseConfig` must have. For instance: ->[authentication]
username = ABC
authtoken = abc +>[!TIP] +>You can generate a new authentication token on the Synapse website by going to `Account Settings` > `Personal Access Tokens`. -Configure config.yml File +``` +synapse config +``` -There are some defaults in schematic that can be configured. These fields are in ``config_example.yml``: +After following the prompts, a new `.synapseConfig` file and `.synapseCache` folder will be created in your home directory. You can view these hidden +assets in your home directory with the following command: -```text +``` +ls -a ~ +``` -# This is an example config for Schematic. -# All listed values are those that are the default if a config is not used. -# Save this as config.yml, this will be gitignored. -# Remove any fields in the config you don't want to change -# Change the values of any fields you do want to change - - -# This describes where assets such as manifests are stored -asset_store: - # This is when assets are stored in a synapse project - synapse: - # Synapse ID of the file view listing all project data assets. - master_fileview_id: "syn23643253" - # Path to the synapse config file, either absolute or relative to this file - config: ".synapseConfig" - # Base name that manifest files will be saved as - manifest_basename: "synapse_storage_manifest" - -# This describes information about manifests as it relates to generation and validation -manifest: - # Location where manifests will saved to - manifest_folder: "manifests" - # Title or title prefix given to generated manifest(s) - title: "example" - # Data types of manifests to be generated or data type (singular) to validate manifest against - data_type: - - "Biospecimen" - - "Patient" - -# Describes the location of your schema -model: - # Location of your schema jsonld, it must be a path relative to this file or absolute - location: "tests/data/example.model.jsonld" - -# This section is for using google sheets with Schematic -google_sheets: - # Path to the synapse config file, either absolute or relative to this file - service_acct_creds: "schematic_service_account_creds.json" - # When doing google sheet validation (regex match) with the validation rules. - # true is alerting the user and not allowing entry of bad values. - # false is warning but allowing the entry on to the sheet. - strict_validation: true -``` - -If you want to change any of these copy ``config_example.yml`` to ``config.yml``, change any fields you want to, and remove any fields you don't. - -For example if you wanted to change the folder where manifests are downloaded your config should look like: +The `.synapseConfig` is used to log into Synapse if you are not using an environment variable (i.e. `SYNAPSE_ACCESS_TOKEN`) for authentication, and the `.synapseCache` is where your assets are stored if you are not working with the CLI and/or you have specified `.synapseCache` as the location in which to store your manfiests, in your `config.yml` (more on the `config.yml` below). -```text +> [!IMPORTANT] +> When developing on `schematic`, keep your `.synapseConfig` in your current working directory to avoid authentication errors. + +**Create and modify the `config.yml`** + +In this repository there is a `config_example.yml` file with default configurations to various components that are required before running `schematic`, +such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. +Copy the contents of the `config_example.yml` (located in the base directory of the cloned `schematic` repo) into a new file called `config.yml` + +``` +cp config_example.yml config.yml +``` + +Once you've copied the file, modify its contents according to your use case. For example, if you wanted to change the folder where manifests are downloaded your config should look like: + +```text manifest: manifest_folder: "my_manifest_folder_path" ``` -_Note_: `config.yml` is ignored by git. +> [!IMPORTANT] +> Be sure to update your `config.yml` with the location of your `.synapseConfig` created in the step above, to avoid authentication errors. Paths can be specified relative to the `config.yml` file or as absolute paths. -_Note_: Paths can be specified relative to the `config.yml` file or as absolute paths. +> [!NOTE] +> `config.yml` is ignored by git. -6. Login to Synapse by using the command line -On the CLI in your virtual environment, run the following command: -``` -synapse login -u -p --rememberMe -``` +### 6. Obtain Google credential files -7. Obtain Google credential Files -Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). +Any function that interacts with a google sheet (such as `schematic manifest get`) requires google cloud credentials. -> As v22.12.1 version of schematic, using `token` mode of authentication (in other words, using `token.pickle` and `credentials.json`) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click [here](https://developers.google.com/identity/protocols/oauth2/resources/oob-migration) to learn more. +1. **Option 1**: [Here](https://scribehow.com/shared/Get_Credentials_for_Google_Drive_and_Google_Sheets_APIs_to_use_with_schematicpy__yqfcJz_rQVeyTcg0KQCINA?referrer=workspace)’s a step-by-step guide on how to create these credentials in Google Cloud. + * Depending on your institution's policies, your institutional Google account may or may not have the required permissions to complete this. A possible workaround is to use a personal or temporary Google account. -*Notes*: Use the ``schematic_service_account_creds.json`` file for the service -account mode of authentication (*for Google services/APIs*). Service accounts -are special Google accounts that can be used by applications to access Google APIs -programmatically via OAuth2.0, with the advantage being that they do not require -human authorization. +> [!WARNING] +> At the time of writing, Sage Bionetworks employees do not have the appropriate permissions to create projects with their Sage Bionetworks Google accounts. You would follow instructions using a personal Google account. -*Background*: schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. +2. **Option 2**: Ask your DCC/development team if they have credentials previously set up with a service account. + +Once you have obtained credentials, be sure that the json file generated is named in the same way as the `service_acct_creds` parameter in your `config.yml` file. + +> [!IMPORTANT] +> For testing, make sure there is no environment variable `SCHEMATIC_SERVICE_ACCOUNT_CREDS`. Check the file `.env` to ensure this is not set. Also, check that config files used for testing, such as `config_example.yml` do not contain service_acct_creds_synapse_id. + +> [!NOTE] +> Running `schematic init` is no longer supported due to security concerns. To obtain `schematic_service_account_creds.json`, please follow the instructions [here](https://scribehow.com/shared/Enable_Google_Drive_and_Google_Sheets_APIs_for_project__yqfcJz_rQVeyTcg0KQCINA). +schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future. -8. Set up pre-commit hooks +> [!NOTE] +> Use the ``schematic_service_account_creds.json`` file for the service +> account mode of authentication (*for Google services/APIs*). Service accounts +> are special Google accounts that can be used by applications to access Google APIs +> programmatically via OAuth2.0, with the advantage being that they do not require +> human authorization. + +### 7. Set up pre-commit hooks This repository is configured to utilize pre-commit hooks as part of the development process. To enable these hooks, please run the following command and look for the following success message: ``` @@ -208,35 +370,55 @@ $ pre-commit install pre-commit installed at .git/hooks/pre-commit ``` -### Development process instruction +You can run `pre-commit` manually across the entire repository like so: -For new features, bugs, enhancements +``` +pre-commit run --all-files +``` -1. Pull the latest code from [develop branch in the upstream repo](https://github.com/Sage-Bionetworks/schematic) -2. Checkout a new branch develop- from the develop branch -3. Do development on branch develop- - a. may need to ensure that schematic poetry toml and lock files are compatible with your local environment -4. Add changed files for tracking and commit changes using [best practices](https://www.perforce.com/blog/vcs/git-best-practices-git-commit) -5. Have granular commits: not “too many” file changes, and not hundreds of code lines of changes -6. Commits with work in progress are encouraged: - a. add WIP to the beginning of the commit message for “Work In Progress” commits -7. Keep commit messages descriptive but less than a page long, see best practices -8. Push code to develop- in upstream repo -9. Branch out off develop- if needed to work on multiple features associated with the same code base -10. After feature work is complete and before creating a PR to the develop branch in upstream - a. ensure that code runs locally - b. test for logical correctness locally - c. wait for git workflow to complete (e.g. tests are run) on github -11. Create a PR from develop- into the develop branch of the upstream repo -12. Request a code review on the PR -13. Once code is approved merge in the develop branch -14. Delete the develop- branch +After running this step, your setup is complete, and you can test it on a python instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. -*Note*: Make sure you have the latest version of the `develop` branch on your local machine. +### 8. Verify your setup +After running the steps above, your setup is complete, and you can test it on a `python` instance or by running a command based on the examples in the [Command Line Usage](#command-line-usage) section. -### Example For REST API
+# Command Line Usage +1. Generate a new manifest as a google sheet -#### Use file path of `config.yml` to run API endpoints: +``` +schematic manifest -c /path/to/config.yml get -dt -s +``` + +2. Grab an existing manifest from synapse + +``` +schematic manifest -c /path/to/config.yml get -dt -d -s +``` + +3. Validate a manifest + +``` +schematic model -c /path/to/config.yml validate -dt -mp +``` + +4. Submit a manifest as a file + +``` +schematic model -c /path/to/config.yml submit -mp -d -vc -mrt file_only +``` + +Please visit more documentation [here](https://sage-schematic.readthedocs.io/en/stable/cli_reference.html#) for more information. + +# Docker Usage + +Here we will demonstrate how to run `schematic` with Docker, with different use-cases for running API endpoints, validating the manifests, and +using how to use `schematic` based on your OS (macOS/Linux). + +### Running the REST API + +Use the Docker image to run `schematic`s REST API. You can either use the file path for the `config.yml` created using the installation instructions, +or set up authentication with environment variables. + +#### Example 1: Using the `config.yml` path ``` docker run --rm -p 3001:3001 \ -v $(pwd):/schematic -w /schematic --name schematic \ @@ -246,7 +428,7 @@ docker run --rm -p 3001:3001 \ python /usr/src/app/run_api.py ``` -#### Use content of `config.yml` and `schematic_service_account_creds.json`as an environment variable to run API endpoints: +#### Example 2: Use environment variables 1. save content of `config.yml` as to environment variable `SCHEMATIC_CONFIG_CONTENT` by doing: `export SCHEMATIC_CONFIG_CONTENT=$(cat /path/to/config.yml)` 2. Similarly, save the content of `schematic_service_account_creds.json` as `SERVICE_ACCOUNT_CREDS` by doing: `export SERVICE_ACCOUNT_CREDS=$(cat /path/to/schematic_service_account_creds.json)` @@ -262,11 +444,18 @@ docker run --rm -p 3001:3001 \ sagebionetworks/schematic \ python /usr/src/app/run_api.py ``` +### Running `schematic` to Validate Manifests +You can also use Docker to run `schematic` commands like validating manifests. Below are examples for different platforms. +#### Example for macOS/Linux -### Example For Schematic on mac/linux
-To run example below, first clone schematic into your home directory `git clone https://github.com/sage-bionetworks/schematic ~/schematic`
-Then update .synapseConfig with your credentials +1. Clone the repository: +``` +git clone https://github.com/sage-bionetworks/schematic ~/schematic +``` +2. Update the `.synapseConfig` with your credentials. See the installation instructions for how to do this. + +3. Run Docker: ``` docker run \ -v ~/schematic:/schematic \ @@ -280,7 +469,9 @@ docker run \ -js /schematic/tests/data/example.model.jsonld ``` -### Example For Schematic on Windows
+#### Example for Windows + +Run the following command to validate manifests: ``` docker run -v %cd%:/schematic \ -w /schematic \ @@ -290,82 +481,6 @@ docker run -v %cd%:/schematic \ -c config.yml validate -mp tests/data/mock_manifests/inValid_Test_Manifest.csv -dt MockComponent -js /schematic/data/example.model.jsonld ``` -# Other Contribution Guidelines -## Updating readthedocs documentation -1. `cd docs` -2. After making relevant changes, you could run the `make html` command to re-generate the `build` folder. -3. Please contact the dev team to publish your updates - -*Other helpful resources*: - -1. [Getting started with Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) -2. [Installing Sphinx](https://haha.readthedocs.io/en/latest/intro/getting-started-with-sphinx.html) - -## Update toml file and lock file -If you install external libraries by using `poetry add `, please make sure that you include `pyproject.toml` and `poetry.lock` file in your commit. - -## Reporting bugs or feature requests -You can **create bug and feature requests** through [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8). Providing enough details to the developers to verify and troubleshoot your issue is paramount: -- **Provide a clear and descriptive title as well as a concise summary** of the issue to identify the problem. -- **Describe the exact steps which reproduce the problem** in as many details as possible. -- **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior. -- **Explain which behavior you expected to see** instead and why. -- **Provide screenshots of the expected or actual behaviour** where applicable. - -# Command Line Usage -1. Generate a new manifest as a google sheet - -``` -schematic manifest -c /path/to/config.yml get -dt -s -``` - -2. Grab an existing manifest from synapse - -``` -schematic manifest -c /path/to/config.yml get -dt -d -s -``` - -3. Validate a manifest - -``` -schematic model -c /path/to/config.yml validate -dt -mp -``` - -4. Submit a manifest as a file - -``` -schematic model -c /path/to/config.yml submit -mp -d -vc -mrt file_only -``` - -Please visit more documentation [here](https://sage-schematic.readthedocs.io/en/develop/cli_reference.html) for more information. - - - -# Testing - -All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the [tests](https://github.com/Sage-Bionetworks/schematic/tree/develop-docs-update/tests) subdirectory. - -You can run the test suite in the following way: - -``` -pytest -vs tests/ -``` - -## Updating Synapse test resources - -1. Duplicate the entity being updated (or folder if applicable). -2. Edit the duplicates (_e.g._ annotations, contents, name). -3. Update the test suite in your branch to use these duplicates, including the expected values in the test assertions. -4. Open a PR as per the usual process (see above). -5. Once the PR is merged, leave the original copies on Synapse to maintain support for feature branches that were forked from `develop` before your update. - - If the old copies are problematic and need to be removed immediately (_e.g._ contain sensitive data), proceed with the deletion and alert the other contributors that they need to merge the latest `develop` branch into their feature branches for their tests to work. - -# Code style - -* Please consult the [Google Python style guide](http://google.github.io/styleguide/pyguide.html) prior to contributing code to this project. -* Be consistent and follow existing code conventions and spirit. - - # Contributors Main contributors and developers: diff --git a/schematic/manifest/generator.py b/schematic/manifest/generator.py index d954506a5..47acad4b4 100644 --- a/schematic/manifest/generator.py +++ b/schematic/manifest/generator.py @@ -27,6 +27,7 @@ build_service_account_creds, execute_google_api_requests, export_manifest_drive_service, + google_api_execute_wrapper, ) from schematic.utils.schema_utils import ( DisplayLabelType, @@ -190,11 +191,11 @@ def _gdrive_copy_file(self, origin_file_id, copy_title): copied_file = {"name": copy_title} # return new copy sheet ID - return ( + return google_api_execute_wrapper( self.drive_service.files() .copy(fileId=origin_file_id, body=copied_file) - .execute()["id"] - ) + .execute + )["id"] def _create_empty_manifest_spreadsheet(self, title: str) -> str: """ @@ -215,12 +216,11 @@ def _create_empty_manifest_spreadsheet(self, title: str) -> str: else: spreadsheet_body = {"properties": {"title": title}} - spreadsheet_id = ( + spreadsheet_id = google_api_execute_wrapper( self.sheet_service.spreadsheets() .create(body=spreadsheet_body, fields="spreadsheetId") - .execute() - .get("spreadsheetId") - ) + .execute + ).get("spreadsheetId") return spreadsheet_id @@ -265,7 +265,7 @@ def callback(request_id, response, exception): fields="id", ) ) - batch.execute() + google_api_execute_wrapper(batch.execute) def _store_valid_values_as_data_dictionary( self, column_id: int, valid_values: list, spreadsheet_id: str @@ -297,7 +297,7 @@ def _store_valid_values_as_data_dictionary( + str(len(values) + 1) ) valid_values = [{"userEnteredValue": "=" + target_range}] - response = ( + response = google_api_execute_wrapper( self.sheet_service.spreadsheets() .values() .update( @@ -306,7 +306,7 @@ def _store_valid_values_as_data_dictionary( valueInputOption="RAW", body=body, ) - .execute() + .execute ) return valid_values @@ -560,15 +560,31 @@ def _gs_add_and_format_columns(self, required_metadata_fields, spreadsheet_id): range = "Sheet1!A1:" + str(end_col_letter) + "1" # adding columns - self.sheet_service.spreadsheets().values().update( - spreadsheetId=spreadsheet_id, range=range, valueInputOption="RAW", body=body - ).execute() + google_api_execute_wrapper( + self.sheet_service.spreadsheets() + .values() + .update( + spreadsheetId=spreadsheet_id, + range=range, + valueInputOption="RAW", + body=body, + ) + .execute + ) # adding columns to 2nd sheet that can be used for storing data validation ranges (this avoids limitations on number of dropdown items in excel and openoffice) range = "Sheet2!A1:" + str(end_col_letter) + "1" - self.sheet_service.spreadsheets().values().update( - spreadsheetId=spreadsheet_id, range=range, valueInputOption="RAW", body=body - ).execute() + google_api_execute_wrapper( + self.sheet_service.spreadsheets() + .values() + .update( + spreadsheetId=spreadsheet_id, + range=range, + valueInputOption="RAW", + body=body, + ) + .execute + ) # format column header row header_format_body = { @@ -612,10 +628,10 @@ def _gs_add_and_format_columns(self, required_metadata_fields, spreadsheet_id): ] } - response = ( + response = google_api_execute_wrapper( self.sheet_service.spreadsheets() .batchUpdate(spreadsheetId=spreadsheet_id, body=header_format_body) - .execute() + .execute ) return response, ordered_metadata_fields @@ -664,13 +680,13 @@ def _gs_add_additional_metadata( "data": data, } - response = ( + response = google_api_execute_wrapper( self.sheet_service.spreadsheets() .values() .batchUpdate( spreadsheetId=spreadsheet_id, body=batch_update_values_request_body ) - .execute() + .execute ) return response @@ -765,11 +781,11 @@ def _request_regex_match_vr_formatting( split_rules = validation_rules[0].split(" ") if split_rules[0] == "regex" and split_rules[1] == "match": # Set things up: - ## Extract the regular expression we are validating against. + # Extract the regular expression we are validating against. regular_expression = split_rules[2] - ## Define text color to update to upon correct user entry + # Define text color to update to upon correct user entry text_color = {"red": 0, "green": 0, "blue": 0} - ## Define google sheets regular expression formula + # Define google sheets regular expression formula gs_formula = [ { "userEnteredValue": '=REGEXMATCH(INDIRECT("RC",FALSE), "{}")'.format( @@ -777,11 +793,11 @@ def _request_regex_match_vr_formatting( ) } ] - ## Set validaiton strictness based on user specifications. + # Set validaiton strictness based on user specifications. if split_rules[-1].lower() == "strict": strict = True - ## Create error message for users if they enter value with incorrect formatting + # Create error message for users if they enter value with incorrect formatting input_message = ( f"Values in this column are being validated " f"against the following regular expression ({regular_expression}) " @@ -790,7 +806,7 @@ def _request_regex_match_vr_formatting( ) # Create Requests: - ## Change request to change the text color of the column we are validating to red. + # Change request to change the text color of the column we are validating to red. requests_vr_format_body = self._request_update_base_color( i, color={ @@ -800,10 +816,10 @@ def _request_regex_match_vr_formatting( }, ) - ## Create request to for conditionally formatting user input. + # Create request to for conditionally formatting user input. requests_vr = self._request_regex_vr(gs_formula, i, text_color) - ## Create request to generate data validator. + # Create request to generate data validator. requests_data_validation_vr = self._get_column_data_validation_values( spreadsheet_id, valid_values=gs_formula, diff --git a/schematic/store/synapse.py b/schematic/store/synapse.py index 7ccb810cf..861789374 100644 --- a/schematic/store/synapse.py +++ b/schematic/store/synapse.py @@ -23,6 +23,7 @@ from schematic_db.rdb.synapse_database import SynapseDatabase from synapseclient import ( Column, + Entity, EntityViewSchema, EntityViewType, File, @@ -33,6 +34,7 @@ as_table_columns, ) from synapseclient.api import get_entity_id_bundle2 +from synapseclient.core.constants.concrete_types import PROJECT_ENTITY from synapseclient.core.exceptions import ( SynapseAuthenticationError, SynapseHTTPError, @@ -566,6 +568,55 @@ def getFilesInStorageDataset( self.syn, datasetId, includeTypes=["folder", "file"] ) + current_entity_location = self.syn.get(entity=datasetId, downloadFile=False) + + def walk_back_to_project( + current_location: Entity, location_prefix: str, skip_entry: bool + ) -> str: + """ + Recursively walk back up the project structure to get the paths of the + names of each of the directories where we started the walk function. + + Args: + current_location (Entity): The current entity location in the project structure. + location_prefix (str): The prefix to prepend to the path. + skip_entry (bool): Whether to skip the current entry in the path. When + this is True it means we are looking at our starting point. If our + starting point is the project itself we can go ahead and return + back the project as the prefix. + + Returns: + str: The path of the names of each of the directories up to the project root. + """ + if ( + skip_entry + and "concreteType" in current_location + and current_location["concreteType"] == PROJECT_ENTITY + ): + return f"{current_location.name}/{location_prefix}" + + updated_prefix = ( + location_prefix + if skip_entry + else f"{current_location.name}/{location_prefix}" + ) + if ( + "concreteType" in current_location + and current_location["concreteType"] == PROJECT_ENTITY + ): + return updated_prefix + return walk_back_to_project( + current_location=self.syn.get(entity=current_location["parentId"]), + location_prefix=updated_prefix, + skip_entry=False, + ) + + prefix = walk_back_to_project( + current_location=current_entity_location, + location_prefix="", + skip_entry=True, + ) + project = self.getDatasetProject(datasetId) project_name = self.syn.get(project, downloadFile=False).name file_list = [] @@ -585,17 +636,16 @@ def getFilesInStorageDataset( if fullpath: # append directory path to filename if dirpath[0].startswith(f"{project_name}/"): + path_without_project_prefix = ( + dirpath[0] + "/" + ).removeprefix(f"{project_name}/") path_filename = ( - dirpath[0] + "/" + path_filename[0], + prefix + path_without_project_prefix + path_filename[0], path_filename[1], ) else: path_filename = ( - project_name - + "/" - + dirpath[0] - + "/" - + path_filename[0], + prefix + dirpath[0] + "/" + path_filename[0], path_filename[1], ) diff --git a/schematic/utils/google_api_utils.py b/schematic/utils/google_api_utils.py index b705e0419..6f09c0ea7 100644 --- a/schematic/utils/google_api_utils.py +++ b/schematic/utils/google_api_utils.py @@ -2,14 +2,23 @@ # pylint: disable=logging-fstring-interpolation -import os -import logging import json -from typing import Any, Union, no_type_check, TypedDict +import logging +import os +from typing import Any, Callable, TypedDict, Union, no_type_check import pandas as pd -from googleapiclient.discovery import build, Resource # type: ignore from google.oauth2 import service_account # type: ignore +from googleapiclient.discovery import Resource, build # type: ignore +from googleapiclient.errors import HttpError # type: ignore +from tenacity import ( + retry, + retry_if_exception_type, + stop_after_attempt, + wait_chain, + wait_fixed, +) + from schematic.configuration.configuration import CONFIG logger = logging.getLogger(__name__) @@ -86,10 +95,10 @@ def execute_google_api_requests(service, requests_body, **kwargs) -> Any: and kwargs["service_type"] == "batch_update" ): # execute all requests - response = ( + response = google_api_execute_wrapper( service.spreadsheets() .batchUpdate(spreadsheetId=kwargs["spreadsheet_id"], body=requests_body) - .execute() + .execute ) return response @@ -118,10 +127,10 @@ def export_manifest_drive_service( # use google drive # Pylint seems to have trouble with the google api classes, recognizing their methods - data = ( + data = google_api_execute_wrapper( drive_service.files() # pylint: disable=no-member .export(fileId=spreadsheet_id, mimeType=mime_type) - .execute() + .execute ) # open file and write data @@ -145,3 +154,25 @@ def export_manifest_csv(file_path: str, manifest: Union[pd.DataFrame, str]) -> N manifest.to_csv(file_path, index=False) else: export_manifest_drive_service(manifest, file_path, mime_type="text/csv") + + +@retry( + stop=stop_after_attempt(5), + wait=wait_chain( + *[wait_fixed(1) for i in range(2)] + + [wait_fixed(2) for i in range(2)] + + [wait_fixed(5)] + ), + retry=retry_if_exception_type(HttpError), + reraise=True, +) +def google_api_execute_wrapper(api_function_to_call: Callable[[], Any]) -> Any: + """Retry wrapper for Google API calls, with a backoff strategy. + + Args: + api_function_to_call (Callable[[], Any]): The function to call + + Returns: + Any: The result of the API call + """ + return api_function_to_call() diff --git a/tests/data/mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv b/tests/data/mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv new file mode 100644 index 000000000..6bcb468c6 --- /dev/null +++ b/tests/data/mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv @@ -0,0 +1,2 @@ +Filename,Sample ID,File Format,Component,Genome Build,Genome FASTA,Year of Birth,author,confidence,date,eTag,IsImportantBool,IsImportantText,impact,entityId,RandomizedAnnotation +schematic - main/TestDatasets/TestDataset-Annotations-nested-submit/Sample_C.txt,some sample id,FASTQ,BulkRNA-seqAssay,,,,,,,0bf00691-a6e4-4487-9cab-851e22416ed2,FALSE,FALSE,,syn63646199, \ No newline at end of file diff --git a/tests/integration/test_submit_manifest.py b/tests/integration/test_submit_manifest.py new file mode 100644 index 000000000..cc14de487 --- /dev/null +++ b/tests/integration/test_submit_manifest.py @@ -0,0 +1,112 @@ +import io +import logging +import uuid +from typing import Dict, Generator + +import flask +import pytest +from flask.testing import FlaskClient + +from schematic.store.synapse import SynapseStorage +from schematic_api.api import create_app +from tests.conftest import Helpers + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + +DATA_MODEL_JSON_LD = "https://raw.githubusercontent.com/Sage-Bionetworks/schematic/develop/tests/data/example.model.jsonld" + + +@pytest.fixture(scope="class") +def app() -> flask.Flask: + app = create_app() + return app + + +@pytest.fixture(scope="class") +def client(app: flask.Flask) -> Generator[FlaskClient, None, None]: + app.config["SCHEMATIC_CONFIG"] = None + + with app.test_client() as client: + yield client + + +@pytest.fixture +def request_headers(syn_token: str) -> Dict[str, str]: + headers = {"Authorization": "Bearer " + syn_token} + return headers + + +@pytest.mark.schematic_api +class TestManifestSubmission: + @pytest.mark.synapse_credentials_needed + @pytest.mark.submission + def test_submit_nested_manifest_table_and_file_replace( + self, + client: FlaskClient, + request_headers: Dict[str, str], + helpers: Helpers, + synapse_store: SynapseStorage, + ) -> None: + # GIVEN the parameters to submit a manifest + params = { + "schema_url": DATA_MODEL_JSON_LD, + "data_type": "BulkRNA-seqAssay", + "restrict_rules": False, + "manifest_record_type": "table_and_file", + "asset_view": "syn63646213", + "dataset_id": "syn63646197", + "table_manipulation": "replace", + "data_model_labels": "class_label", + "table_column_names": "display_name", + } + + # AND a test manifest with a nested file entity + nested_manifest_replace_csv = helpers.get_data_path( + "mock_manifests/TestManifestOperation_test_submit_nested_manifest_table_and_file_replace.csv" + ) + + # AND a randomized annotation we can verify was added + df = helpers.get_data_frame(path=nested_manifest_replace_csv) + randomized_annotation_content = str(uuid.uuid4()) + df["RandomizedAnnotation"] = randomized_annotation_content + csv_file = io.BytesIO() + df.to_csv(csv_file, index=False) + csv_file.seek(0) # Rewind the buffer to the beginning + + # WHEN I submit that manifest + response_csv = client.post( + "http://localhost:3001/v1/model/submit", + query_string=params, + data={"file_name": (csv_file, "test.csv")}, + headers=request_headers, + ) + + # THEN the submission should be successful + assert response_csv.status_code == 200 + + # AND the file should be uploaded to Synapse with the new annotation + modified_file = synapse_store.syn.get(df["entityId"][0], downloadFile=False) + assert modified_file is not None + assert modified_file["RandomizedAnnotation"][0] == randomized_annotation_content + + # AND the manifest should exist in the dataset folder + manifest_synapse_id = synapse_store.syn.findEntityId( + name="synapse_storage_manifest_bulkrna-seqassay.csv", parent="syn63646197" + ) + assert manifest_synapse_id is not None + synapse_manifest_entity = synapse_store.syn.get( + entity=manifest_synapse_id, downloadFile=False + ) + assert synapse_manifest_entity is not None + assert ( + synapse_manifest_entity["_file_handle"]["fileName"] + == "synapse_storage_manifest_bulkrna-seqassay.csv" + ) + + # AND the manifest table is created + expected_table_name = "bulkrna-seqassay_synapse_storage_manifest_table" + synapse_id = synapse_store.syn.findEntityId( + parent="syn23643250", name=expected_table_name + ) + assert synapse_id is not None diff --git a/tests/integration/test_validate_attribute.py b/tests/integration/test_validate_attribute.py index b6d3b74b1..f00de7fde 100644 --- a/tests/integration/test_validate_attribute.py +++ b/tests/integration/test_validate_attribute.py @@ -74,15 +74,16 @@ def test_url_validation_invalid_url(self, dmge: DataModelGraphExplorer) -> None: [], ) - def test__get_target_manifest_dataframes( - self, dmge: DataModelGraphExplorer - ) -> None: - """ - This test checks that the method successfully returns manifests from Synapse - - """ - validator = ValidateAttribute(dmge=dmge) - manifests = validator._get_target_manifest_dataframes( # pylint:disable= protected-access - "patient", project_scope=["syn54126707"] - ) - assert list(manifests.keys()) == ["syn54126997", "syn54127001"] + # See slack discussion, to turn test back on at a later time: https://sagebionetworks.jira.com/browse/FDS-2509 + # def test__get_target_manifest_dataframes( + # self, dmge: DataModelGraphExplorer + # ) -> None: + # """ + # This test checks that the method successfully returns manifests from Synapse + + # """ + # validator = ValidateAttribute(dmge=dmge) + # manifests = validator._get_target_manifest_dataframes( # pylint:disable= protected-access + # "patient", project_scope=["syn54126707"] + # ) + # assert list(manifests.keys()) == ["syn54126997", "syn54127001"] diff --git a/tests/test_api.py b/tests/test_api.py index 08e0bd4a6..0a27b5c73 100644 --- a/tests/test_api.py +++ b/tests/test_api.py @@ -780,9 +780,9 @@ def test_generate_manifest_file_based_annotations( # make sure Filename, entityId, and component get filled with correct value assert google_sheet_df["Filename"].to_list() == [ - "schematic - main/TestDataset-Annotations-v3/Sample_A.txt", - "schematic - main/TestDataset-Annotations-v3/Sample_B.txt", - "schematic - main/TestDataset-Annotations-v3/Sample_C.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_A.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_B.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_C.txt", ] assert google_sheet_df["entityId"].to_list() == [ "syn25614636", diff --git a/tests/test_manifest.py b/tests/test_manifest.py index 06bd7b168..ade80fbe9 100644 --- a/tests/test_manifest.py +++ b/tests/test_manifest.py @@ -213,9 +213,9 @@ def test_get_manifest_first_time(self, manifest): # Confirm contents of Filename column assert output["Filename"].tolist() == [ - "schematic - main/TestDataset-Annotations-v3/Sample_A.txt", - "schematic - main/TestDataset-Annotations-v3/Sample_B.txt", - "schematic - main/TestDataset-Annotations-v3/Sample_C.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_A.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_B.txt", + "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_C.txt", ] # Test dimensions of data frame diff --git a/tests/test_store.py b/tests/test_store.py index f79761b28..717b4542e 100644 --- a/tests/test_store.py +++ b/tests/test_store.py @@ -11,14 +11,14 @@ import uuid from contextlib import nullcontext as does_not_raise from typing import Any, Callable, Generator -from unittest.mock import AsyncMock, MagicMock, patch +from unittest.mock import AsyncMock, patch import pandas as pd import pytest from pandas.testing import assert_frame_equal from synapseclient import EntityViewSchema, Folder from synapseclient.core.exceptions import SynapseHTTPError -from synapseclient.entity import File +from synapseclient.entity import File, Project from synapseclient.models import Annotations from synapseclient.models import Folder as FolderModel @@ -406,7 +406,7 @@ def test_getDatasetAnnotations(self, dataset_id, synapse_store, force_batch): expected_df = pd.DataFrame.from_records( [ { - "Filename": "schematic - main/TestDataset-Annotations-v3/Sample_A.txt", + "Filename": "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_A.txt", "author": "bruno, milen, sujay", "impact": "42.9", "confidence": "high", @@ -416,13 +416,13 @@ def test_getDatasetAnnotations(self, dataset_id, synapse_store, force_batch): "IsImportantText": "TRUE", }, { - "Filename": "schematic - main/TestDataset-Annotations-v3/Sample_B.txt", + "Filename": "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_B.txt", "confidence": "low", "FileFormat": "csv", "date": "2020-02-01", }, { - "Filename": "schematic - main/TestDataset-Annotations-v3/Sample_C.txt", + "Filename": "schematic - main/TestDatasets/TestDataset-Annotations-v3/Sample_C.txt", "FileFormat": "fastq", "IsImportantBool": "False", "IsImportantText": "FALSE", @@ -490,7 +490,9 @@ def test_getFilesInStorageDataset(self, synapse_store, full_path, expected): return_value="syn23643250", ) as mock_project_id_patch, patch( "synapseclient.entity.Entity.__getattr__", return_value="schematic - main" - ) as mock_project_name_patch: + ) as mock_project_name_patch, patch.object( + synapse_store.syn, "get", return_value=Project(name="schematic - main") + ): file_list = synapse_store.getFilesInStorageDataset( datasetId="syn_mock", fileNames=None, fullpath=full_path )