Skip to content

Commit

Permalink
Merge pull request #1014 from Sage-Bionetworks/develop
Browse files Browse the repository at this point in the history
Release 22.11.2
  • Loading branch information
linglp authored Nov 17, 2022
2 parents 8a27c0c + b48329c commit 322cf42
Show file tree
Hide file tree
Showing 18 changed files with 1,094 additions and 93 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ jobs:
run: >
source .venv/bin/activate;
pytest --cov-report=term --cov-report=html:htmlcov --cov=schematic/
-m "not (google_credentials_needed or rule_combos)"
-m "not (google_credentials_needed or rule_combos or schematic_api)"
- name: Upload pytest test results
uses: actions/upload-artifact@v2
Expand Down
36 changes: 28 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ This command will install the dependencies based on what we specify in poetry.lo
5. Fill in credential files:
*Note*: If you won't interact with Synapse, please ignore this section.

There are two main configuration files that need to be edited :
There are two main configuration files that need to be edited:
[config.yml](https://github.com/Sage-Bionetworks/schematic/blob/develop/config.yml)
and [synapseConfig](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/v2.3.0-rc/synapseclient/.synapseConfig)

Expand All @@ -88,6 +88,8 @@ editor of your choice and edit the `username` and `authtoken` attribute under th
<strong>Configure config.yml File</strong>

*Note*: Below is only a brief explanation of some attributes in `config.yml`. <strong>Please use the link [here](https://github.com/Sage-Bionetworks/schematic/blob/develop/config.yml) to get the latest version of `config.yml` in `develop` branch</strong>.

Description of `config.yml` attributes

definitions:
Expand All @@ -104,20 +106,39 @@ Description of `config.yml` attributes
service_acct_creds: "syn25171627" # synapse ID of service_account_creds.json file

manifest:
title: "Patient Manifest " # title of metadata manifest file
data_type: "Patient" # component or data type from the data model
title: "example" # title of metadata manifest file
# to make all manifests enter only 'all manifests'
data_type:
- "Biospecimen"
- "Patient"

model:
input:
location: "data/schema_org_schemas/example.jsonld" # path to JSON-LD data model
file_type: "local" # only type "local" is supported currently
validation_schema: "~/path/to/validation_schema.json" # path to custom JSON Validation Schema JSON file
log_location: "~/path/to/log_folder/validation_schema.json" # auto-generated JSON Validation Schemas can be logged
style: # configuration of google sheet
google_manifest:
req_bg_color:
red: 0.9215
green: 0.9725
blue: 0.9803
opt_bg_color:
red: 1.0
green: 1.0
blue: 0.9019
master_template_id: '1LYS5qE4nV9jzcYw5sXwCza25slDfRA1CIg3cs-hCdpU'
strict_validation: true

*Note*: Paths can be specified relative to the `config.yml` file or as absolute paths.

6. Obtain Google credential Files
6. Login to Synapse by using the command line
On the CLI in your virtual environment, run the following command:
```
synapse login -u <synapse username> -p <synapse password> --rememberMe
```
Please make sure that you run the command before running `schematic init` below

7. Obtain Google credential Files

To obtain ``credentials.json`` and ``token.pickle``, please run:

Expand Down Expand Up @@ -152,7 +173,6 @@ requires token-based authentication. As browser support that requires the token-
token-based authentication and keep only service account authentication in the future.



### Development process instruction

For new features, bugs, enhancements
Expand Down
22 changes: 20 additions & 2 deletions api/openapi/api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,22 @@ paths:
nullable: true
description: ID of view listing all project data assets. E.g. for Synapse this would be the Synapse ID of the fileview listing all data assets for a given project.(i.e. master_fileview in config.yml)
required: false
- in: query
name: output_format
schema:
type: string
enum: ["excel", "google_sheet", "dataframe (only if getting existing manifests)"]
description: If "excel" gets selected, this approach would avoid sending metadata to Google sheet APIs; if "google_sheet" gets selected, this would return a Google sheet URL. This parameter could potentially override sheet_url parameter.
required: false
operationId: api.routes.get_manifest_route
responses:
"201":
description: Googlesheet link created
"200":
description: Googlesheet link created OR an excel file gets returned OR pandas dataframe gets returned
content:
application/vnd.ms-excel:
schema:
type: string
format: binary
application/json:
schema:
type: string
Expand Down Expand Up @@ -381,6 +392,13 @@ paths:
description: Title of Manifest
example: Example
required: false
- in: query
name: return_excel
schema:
type: boolean
nullable: true
description: If true, this would return an Excel spreadsheet.(This approach would avoid sending metadata to Google sheet APIs)
required: false
operationId: api.routes.populate_manifest_route
responses:
"200":
Expand Down
77 changes: 59 additions & 18 deletions api/routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import connexion
from connexion.decorators.uri_parsing import Swagger2URIParser
from flask import current_app as app, request, g, jsonify
from flask import current_app as app
from werkzeug.debug import DebuggedApplication

from schematic import CONFIG
Expand All @@ -25,6 +25,7 @@
import json
from schematic.utils.df_utils import load_df
import pickle
from flask import send_from_directory

# def before_request(var1, var2):
# # Do stuff before your route executes
Expand Down Expand Up @@ -196,7 +197,20 @@ def get_temp_jsonld(schema_url):
return tmp_file.name

# @before_request
def get_manifest_route(schema_url, title, oauth, use_annotations, dataset_ids=None, asset_view = None):
def get_manifest_route(schema_url: str, oauth: bool, use_annotations: bool, dataset_ids=None, asset_view = None, output_format=None, title=None):
"""Get the immediate dependencies that are related to a given source node.
Args:
schema_url: link to data model in json ld format
title: title of a given manifest.
oauth: if user wants to use OAuth for Google authentication
dataset_id: Synapse ID of the "dataset" entity on Synapse (for a given center/project).
output_format: contains three option: "excel", "google_sheet", and "dataframe". if set to "excel", return an excel spreadsheet
use_annotations: Whether to use existing annotations during manifest generation
asset_view: ID of view listing all project data assets. For example, for Synapse this would be the Synapse ID of the fileview listing all data assets for a given project.
Returns:
Googlesheet URL (if sheet_url is True), or pandas dataframe (if sheet_url is False).
"""

# call config_handler()
config_handler(asset_view = asset_view)

Expand Down Expand Up @@ -238,20 +252,32 @@ def get_manifest_route(schema_url, title, oauth, use_annotations, dataset_ids=No
)


def create_single_manifest(data_type, dataset_id=None):
def create_single_manifest(data_type, title, dataset_id=None, output_format=None):
# create object of type ManifestGenerator
manifest_generator = ManifestGenerator(
path_to_json_ld=jsonld,
title=t,
title=title,
root=data_type,
oauth=oauth,
use_annotations=use_annotations,
alphabetize_valid_values = 'ascending',
)

# if returning a dataframe
if output_format:
if "dataframe" in output_format:
output_format = "dataframe"

result = manifest_generator.get_manifest(
dataset_id=dataset_id, sheet_url=True,
dataset_id=dataset_id, sheet_url=True, output_format=output_format
)

# return an excel file if output_format is set to "excel"
if output_format == "excel":
dir_name = os.path.dirname(result)
file_name = os.path.basename(result)
mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
return send_from_directory(directory=dir_name, filename=file_name, as_attachment=True, mimetype=mimetype, cache_timeout=0)

return result

Expand All @@ -262,22 +288,37 @@ def create_single_manifest(data_type, dataset_id=None):
component_digraph = sg.se.get_digraph_by_edge_type('requiresComponent')
components = component_digraph.nodes()
for component in components:
t = f'{title}.{component}.manifest'
result = create_single_manifest(data_type = component)
all_results.append(result)
if title:
t = f'{title}.{component}.manifest'
else:
t = f'Example.{component}.manifest'
if output_format != "excel":
result = create_single_manifest(data_type=component, output_format=output_format, title=t)
all_results.append(result)
else:
app.logger.error('Currently we do not support returning multiple files as Excel format at once. Please choose a different output format. ')
else:
for i, dt in enumerate(data_type):
if len(data_type) > 1:
t = f'{title}.{dt}.manifest'
else:
t = title

if not title:
t = f'Example.{dt}.manifest'
else:
if len(data_type) > 1:
t = f'{title}.{dt}.manifest'
else:
t = title
if dataset_ids:
# if a dataset_id is provided add this to the function call.
result = create_single_manifest(data_type = dt, dataset_id = dataset_ids[i])
result = create_single_manifest(data_type=dt, dataset_id=dataset_ids[i], output_format=output_format, title=t)
else:
result = create_single_manifest(data_type = dt)
all_results.append(result)
result = create_single_manifest(data_type=dt, output_format=output_format, title=t)

# if output is pandas dataframe or google sheet url
if isinstance(result, str) or isinstance(result, pd.DataFrame):
all_results.append(result)
else:
if len(data_type) > 1:
app.logger.warning(f'Currently we do not support returning multiple files as Excel format at once. Only {t} would get returned. ')
return result

return all_results

Expand Down Expand Up @@ -341,7 +382,7 @@ def submit_manifest_route(schema_url, asset_view=None, manifest_record_type=None

return manifest_id

def populate_manifest_route(schema_url, title=None, data_type=None):
def populate_manifest_route(schema_url, title=None, data_type=None, return_excel=None):
# call config_handler()
config_handler()

Expand All @@ -355,7 +396,7 @@ def populate_manifest_route(schema_url, title=None, data_type=None):
metadata_model = MetadataModel(inputMModelLocation=jsonld, inputMModelLocationType='local')

#Call populateModelManifest class
populated_manifest_link = metadata_model.populateModelManifest(title=title, manifestPath=temp_path, rootNode=data_type)
populated_manifest_link = metadata_model.populateModelManifest(title=title, manifestPath=temp_path, rootNode=data_type, return_excel=return_excel)

return populated_manifest_link

Expand Down
6 changes: 5 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,10 @@ filterwarnings = [
markers = [
"""\
google_credentials_needed: marks tests requiring \
Google credentials (skipped on GitHub CI)\
Google credentials (skipped on GitHub CI) \
""",
"""\
schematic_api: marks tests requiring \
running API locally (skipped on GitHub CI)
"""
]
34 changes: 28 additions & 6 deletions schematic/manifest/commands.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import os
import logging

from pathlib import Path
import click
import click_log
import logging
Expand All @@ -12,7 +12,7 @@
from schematic.help import manifest_commands
from schematic import CONFIG
from schematic.schemas.generator import SchemaGenerator
from schematic.utils.google_api_utils import export_manifest_csv, export_manifest_excel
from schematic.utils.google_api_utils import export_manifest_csv, export_manifest_excel, export_manifest_drive_service
from schematic.store.synapse import SynapseStorage

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -147,8 +147,26 @@ def create_single_manifest(data_type, output_csv=None, output_xlsx=None):
)

# call get_manifest() on manifest_generator
# if output_xlsx gets specified, output_format = "excel"
if output_xlsx:
output_format = "excel"

# if file name is in the path, and that file does not exist
if not os.path.exists(output_xlsx):
if ".xlsx" or ".xls" in output_xlsx:
path = Path(output_xlsx)
output_path = path.parent.absolute()
else:
logger.error(f"{output_xlsx} does not exists. Please try a valid file path")

else:
output_path = output_xlsx
else:
output_format = None
output_path = None

result = manifest_generator.get_manifest(
dataset_id=dataset_id, sheet_url=sheet_url, json_schema=json_schema,
dataset_id=dataset_id, sheet_url=sheet_url, json_schema=json_schema, output_format = output_format, output_path = output_path
)

if sheet_url:
Expand All @@ -160,13 +178,13 @@ def create_single_manifest(data_type, output_csv=None, output_xlsx=None):
if prefix_ext == ".model":
prefix = prefix_root
output_csv = f"{prefix}.{data_type}.manifest.csv"

elif output_xlsx:
export_manifest_excel(output_excel=output_xlsx, manifest=result)
logger.info(
f"Find the manifest template using this Excel file path: {output_xlsx}"
)
return result
export_manifest_csv(file_name=output_csv, manifest=result)
export_manifest_csv(file_path=output_csv, manifest=result)
logger.info(
f"Find the manifest template using this CSV file path: {output_csv}"
)
Expand All @@ -184,8 +202,12 @@ def create_single_manifest(data_type, output_csv=None, output_xlsx=None):
result = create_single_manifest(data_type = component)
else:
for dt in data_type:
if len(data_type) > 1:
if len(data_type) > 1 and not output_xlsx:
t = f'{title}.{dt}.manifest'
elif output_xlsx:
if ".xlsx" or ".xls" in output_xlsx:
title_with_extension = os.path.basename(output_xlsx)
t = title_with_extension.split('.')[0]
else:
t = title
result = create_single_manifest(data_type = dt, output_csv=output_csv, output_xlsx=output_xlsx)
Expand Down
Loading

0 comments on commit 322cf42

Please sign in to comment.