Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMATIC-30, SCHEMATIC-200] Add version to click cli / use pathlib.Path module for checking cache size #1542

Merged
merged 31 commits into from
Nov 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
740d520
Add version to click cli
thomasyu888 Nov 9, 2024
99d10ef
Add version
thomasyu888 Nov 9, 2024
827521a
Merge branch 'develop' into schematic-30-add-version-to-cli
thomasyu888 Nov 13, 2024
8c8f108
Run black
thomasyu888 Nov 13, 2024
e8268b9
Reformat
thomasyu888 Nov 13, 2024
89fe93d
Fix
thomasyu888 Nov 13, 2024
ec410ed
Update schematic/schemas/data_model_parser.py
thomasyu888 Nov 13, 2024
67df20a
Add test for check_synapse_cache_size
thomasyu888 Nov 13, 2024
4ed0221
Merge branch 'schematic-30-add-version-to-cli' of github.com:Sage-Bio…
thomasyu888 Nov 13, 2024
619082c
Reformat
thomasyu888 Nov 13, 2024
fbd6821
Fix tests
thomasyu888 Nov 13, 2024
d90eaef
Remove unused parameter
thomasyu888 Nov 13, 2024
89b7a35
Merge branch 'develop' into schematic-30-add-version-to-cli
thomasyu888 Nov 13, 2024
d0c79ad
Install all-extras for now
thomasyu888 Nov 13, 2024
ac8b0fc
Make otel flash non-optional
thomasyu888 Nov 13, 2024
d0f63c9
Merge branch 'add-otel-flask-as-non-extra-dep' into schematic-30-add-…
thomasyu888 Nov 13, 2024
c712933
Update dockerfile
thomasyu888 Nov 13, 2024
e8e5d85
Add dependencies as non-optional
thomasyu888 Nov 13, 2024
d5a957c
Merge branch 'add-otel-flask-as-non-extra-dep' into schematic-30-add-…
thomasyu888 Nov 13, 2024
7e5704d
Update pyproject toml
thomasyu888 Nov 13, 2024
0f34eff
Fix trivy issue
thomasyu888 Nov 14, 2024
86b7c35
Add service version
thomasyu888 Nov 14, 2024
8bfeab9
Run black
thomasyu888 Nov 14, 2024
b809c6e
Merge branch 'develop' into schematic-30-add-version-to-cli
thomasyu888 Nov 14, 2024
ffa9498
Move all utils.general tests into separate folder
thomasyu888 Nov 19, 2024
f60f86e
Fix merge conflicts
thomasyu888 Nov 19, 2024
a368f5d
Merge branch 'develop' into schematic-30-add-version-to-cli
thomasyu888 Nov 19, 2024
93dff95
Use pre-commit
thomasyu888 Nov 20, 2024
872217e
Add updates to contribution doc
thomasyu888 Nov 20, 2024
9f499b4
Fix
thomasyu888 Nov 20, 2024
84d03f9
Add service version to log provider
BryanFauble Nov 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/scan_repo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ jobs:
trivy:
name: Trivy
runs-on: ubuntu-latest
env:
thomasyu888 marked this conversation as resolved.
Show resolved Hide resolved
TRIVY_DB_REPOSITORY: public.ecr.aws/aquasecurity/trivy-db:2
TRIVY_JAVA_DB_REPOSITORY: public.ecr.aws/aquasecurity/trivy-java-db:1
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand Down
14 changes: 7 additions & 7 deletions CONTRIBUTION.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Please note we have a [code of conduct](CODE_OF_CONDUCT.md), please follow it in

## How to report bugs or feature requests

You can **create bug and feature requests** through [Sage Bionetwork's FAIR Data service desk](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/8). Providing enough details to the developers to verify and troubleshoot your issue is paramount:
You can **create bug and feature requests** through [Sage Bionetwork's DPE schematic support](https://sagebionetworks.jira.com/servicedesk/customer/portal/5/group/7/create/225). Providing enough details to the developers to verify and troubleshoot your issue is paramount:
- **Provide a clear and descriptive title as well as a concise summary** of the issue to identify the problem.
- **Describe the exact steps which reproduce the problem** in as many details as possible.
- **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior.
Expand All @@ -25,7 +25,7 @@ For new features, bugs, enhancements:

#### 1. Branch Setup
* Pull the latest code from the develop branch in the upstream repository.
* Checkout a new branch formatted like so: `develop-<feature/fix-name>` from the develop branch
* Checkout a new branch formatted like so: `<JIRA-ID>-<feature/fix-name>` from the develop branch

#### 2. Development Workflow
* Develop on your new branch.
Expand All @@ -35,22 +35,22 @@ For new features, bugs, enhancements:
* You can choose to create a draft PR if you prefer to develop this way

#### 3. Branch Management
* Push code to `develop-<feature/fix-name>` in upstream repo:
* Push code to `<JIRA-ID>-<feature/fix-name>` in upstream repo:
```
git push <upstream> develop-<feature/fix-name>
git push <upstream> <JIRA-ID>-<feature/fix-name>
```
* Branch off `develop-<feature/fix-name>` if you need to work on multiple features associated with the same code base
* Branch off `<JIRA-ID>-<feature/fix-name>` if you need to work on multiple features associated with the same code base
* After feature work is complete and before creating a PR to the develop branch in upstream
a. ensure that code runs locally
b. test for logical correctness locally
c. run `pre-commit` to style code if the hook is not installed
c. wait for git workflow to complete (e.g. tests are run) on github

#### 4. Pull Request and Review
* Create a PR from `develop-<feature/fix-name>` into the develop branch of the upstream repo
* Create a PR from `<JIRA-ID>-<feature/fix-name>` into the develop branch of the upstream repo
* Request a code review on the PR
* Once code is approved merge in the develop branch. The **"Squash and merge"** strategy should be used for a cleaner commit history on the `develop` branch. The description of the squash commit should include enough information to understand the context of the changes that were made.
* Once the actions pass on the main branch, delete the `develop-<feature/fix-name>` branch
* Once the actions pass on the main branch, delete the `<JIRA-ID>-<feature/fix-name>` branch

### Updating readthedocs documentation
1. Navigate to the docs directory.
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ RUN poetry install --no-interaction --no-ansi --no-root

COPY . ./

RUN poetry install --only-root
RUN poetry install --only-root
15 changes: 9 additions & 6 deletions schematic/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import DEPLOYMENT_ENVIRONMENT, SERVICE_NAME, Resource
from opentelemetry.sdk.resources import (
DEPLOYMENT_ENVIRONMENT,
SERVICE_NAME,
SERVICE_VERSION,
Resource,
)
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, Span
from opentelemetry.sdk.trace.sampling import ALWAYS_OFF
Expand All @@ -20,6 +25,7 @@

from schematic.configuration.configuration import CONFIG
from schematic.loader import LOADER
from schematic.version import __version__
from schematic_api.api.security_controller import info_from_bearer_auth

Synapse.allow_client_caching(False)
Expand Down Expand Up @@ -96,11 +102,7 @@ def set_up_tracing(session: requests.Session) -> None:
resource=Resource(
attributes={
SERVICE_NAME: tracing_service_name,
# TODO: Revisit this portion later on. As of 11/12/2024 when
# deploying this to ECS or running within a docker container,
# the package version errors out with the following error:
# importlib.metadata.PackageNotFoundError: No package metadata was found for schematicpy
# SERVICE_VERSION: package_version,
SERVICE_VERSION: __version__,
DEPLOYMENT_ENVIRONMENT: deployment_environment,
}
)
Expand All @@ -127,6 +129,7 @@ def set_up_logging(session: requests.Session) -> None:
{
SERVICE_NAME: logging_service_name,
DEPLOYMENT_ENVIRONMENT: deployment_environment,
SERVICE_VERSION: __version__,
}
)

Expand Down
2 changes: 2 additions & 0 deletions schematic/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from schematic.visualization.commands import (
viz as viz_cli,
) # viz generation commands
from schematic import __version__

logger = logging.getLogger()
click_log.basic_config(logger)
Expand All @@ -24,6 +25,7 @@
# invoke_without_command=True -> forces the application not to show aids before losing them with a --h
@click.group(context_settings=CONTEXT_SETTINGS, invoke_without_command=True)
@click_log.simple_verbosity_option(logger)
@click.version_option(version=__version__, prog_name="schematic")
def main():
"""
Command line interface to the `schematic` backend services.
Expand Down
35 changes: 7 additions & 28 deletions schematic/utils/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import logging
import os
import pstats
import subprocess
from pathlib import Path
import tempfile
from cProfile import Profile
from datetime import datetime, timedelta
Expand Down Expand Up @@ -129,40 +129,19 @@ def calculate_datetime(
return date_time_result


def check_synapse_cache_size(
directory: str = "/root/.synapseCache",
) -> float:
"""use du --sh command to calculate size of .synapseCache.
def check_synapse_cache_size(directory: str = "/root/.synapseCache") -> float:
thomasyu888 marked this conversation as resolved.
Show resolved Hide resolved
"""Calculate size of .synapseCache directory in bytes using pathlib.

Args:
directory (str, optional): .synapseCache directory. Defaults to '/root/.synapseCache'

Returns:
float: returns size of .synapsecache directory in bytes
float: size of .synapsecache directory in bytes
"""
# Note: this command might fail on windows user.
# But since this command is primarily for running on AWS, it is fine.
command = ["du", "-sh", directory]
output = subprocess.run(command, capture_output=True, check=False).stdout.decode(
"utf-8"
total_size = sum(
f.stat().st_size for f in Path(directory).rglob("*") if f.is_file()
)

# Parsing the output to extract the directory size
size = output.split("\t")[0]
if "K" in size:
size_in_kb = float(size.rstrip("K"))
byte_size = size_in_kb * 1000
elif "M" in size:
size_in_mb = float(size.rstrip("M"))
byte_size = size_in_mb * 1000000
elif "G" in size:
size_in_gb = float(size.rstrip("G"))
byte_size = size_in_gb * (1024**3)
elif "B" in size:
byte_size = float(size.rstrip("B"))
else:
logger.error("Cannot recognize the file size unit")
return byte_size
return total_size


def clear_synapse_cache(synapse_cache: cache.Cache, minutes: int) -> int:
Expand Down
2 changes: 1 addition & 1 deletion schematic/version.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Sets the version of the package"""
import importlib.metadata

__version__ = importlib.metadata.version("schematic")
__version__ = importlib.metadata.version("schematicpy")
181 changes: 1 addition & 180 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,11 @@
import json
import logging
import os
import tempfile
import time
from datetime import datetime
from pathlib import Path
from typing import Generator, Union

import numpy as np
import pandas as pd
import pytest
import synapseclient.core.cache as cache
from _pytest.fixtures import FixtureRequest
from pandas.testing import assert_frame_equal
from synapseclient.core.exceptions import SynapseHTTPError

from schematic.models.metadata import MetadataModel
from schematic.models.validate_manifest import ValidateManifest
Expand All @@ -26,14 +18,8 @@
convert_graph_to_jsonld,
)
from schematic.schemas.data_model_parser import DataModelParser
from schematic.utils import cli_utils, df_utils, general, io_utils, validate_utils
from schematic.utils import cli_utils, df_utils, io_utils, validate_utils
from schematic.utils.df_utils import load_df
from schematic.utils.general import (
calculate_datetime,
check_synapse_cache_size,
clear_synapse_cache,
entity_type_mapping,
)
from schematic.utils.schema_utils import (
check_for_duplicate_components,
check_if_display_name_is_valid_label,
Expand Down Expand Up @@ -168,13 +154,6 @@

DATA_MODEL_DICT = {"example.model.csv": "CSV", "example.model.jsonld": "JSONLD"}

test_disk_storage = [
(2, 4000, 16000),
(1000, 4000, 16000),
(2000000, 1900000, 2000000),
(1073741825, 1073741824, 1181116006.4),
]


def get_metadataModel(helpers, model_name: str):
metadataModel = MetadataModel(
Expand All @@ -185,164 +164,6 @@ def get_metadataModel(helpers, model_name: str):
return metadataModel


# create temporary files with various size based on request
@pytest.fixture()
def create_temp_query_file(
tmp_path: Path, request: FixtureRequest
) -> Generator[tuple[Path, Path, Path], None, None]:
"""create temporary files of various size based on request parameter.

Args:
tmp_path (Path): temporary file path
request (any): a request for a fixture from a test

Yields:
Generator[Tuple[Path, Path, Path]]: return path of mock synapse cache directory, mock table query folder and csv
"""
# define location of mock synapse cache
mock_synapse_cache_dir = tmp_path / ".synapseCache/"
mock_synapse_cache_dir.mkdir()
mock_sub_folder = mock_synapse_cache_dir / "123"
mock_sub_folder.mkdir()
mock_table_query_folder = mock_sub_folder / "456"
mock_table_query_folder.mkdir()

# create mock table query csv
mock_synapse_table_query_csv = (
mock_table_query_folder / "mock_synapse_table_query.csv"
)
with open(mock_synapse_table_query_csv, "wb") as f:
f.write(b"\0" * request.param)
yield mock_synapse_cache_dir, mock_table_query_folder, mock_synapse_table_query_csv


class TestGeneral:
@pytest.mark.parametrize("create_temp_query_file", [3, 1000], indirect=True)
def test_clear_synapse_cache(self, create_temp_query_file) -> None:
# define location of mock synapse cache
(
mock_synapse_cache_dir,
mock_table_query_folder,
mock_synapse_table_query_csv,
) = create_temp_query_file
# create a mock cache map
mock_cache_map = mock_table_query_folder / ".cacheMap"
mock_cache_map.write_text(
f"{mock_synapse_table_query_csv}: '2022-06-13T19:24:27.000Z'"
)

assert os.path.exists(mock_synapse_table_query_csv)

# since synapse python client would compare last modified date and before date
# we have to create a little time gap here
time.sleep(1)

# clear cache
my_cache = cache.Cache(cache_root_dir=mock_synapse_cache_dir)
clear_synapse_cache(my_cache, minutes=0.0001)
# make sure that cache files are now gone
assert os.path.exists(mock_synapse_table_query_csv) == False
assert os.path.exists(mock_cache_map) == False

def test_calculate_datetime_before_minutes(self):
input_date = datetime.strptime("07/20/23 17:36:34", "%m/%d/%y %H:%M:%S")
minutes_before = calculate_datetime(
input_date=input_date, minutes=10, before_or_after="before"
)
expected_result_date_before = datetime.strptime(
"07/20/23 17:26:34", "%m/%d/%y %H:%M:%S"
)
assert minutes_before == expected_result_date_before

def test_calculate_datetime_after_minutes(self):
input_date = datetime.strptime("07/20/23 17:36:34", "%m/%d/%y %H:%M:%S")
minutes_after = calculate_datetime(
input_date=input_date, minutes=10, before_or_after="after"
)
expected_result_date_after = datetime.strptime(
"07/20/23 17:46:34", "%m/%d/%y %H:%M:%S"
)
assert minutes_after == expected_result_date_after

def test_calculate_datetime_raise_error(self):
with pytest.raises(ValueError):
input_date = datetime.strptime("07/20/23 17:36:34", "%m/%d/%y %H:%M:%S")
minutes = calculate_datetime(
input_date=input_date, minutes=10, before_or_after="error"
)

# this test might fail for windows machine
@pytest.mark.not_windows
@pytest.mark.parametrize(
"create_temp_query_file,local_disk_size,gh_disk_size",
test_disk_storage,
indirect=["create_temp_query_file"],
)
def test_check_synapse_cache_size(
self,
create_temp_query_file,
local_disk_size: int,
gh_disk_size: Union[int, float],
) -> None:
mock_synapse_cache_dir, _, _ = create_temp_query_file
disk_size = check_synapse_cache_size(mock_synapse_cache_dir)

# For some reasons, when running in github action, the size of file changes.
if IN_GITHUB_ACTIONS:
assert disk_size == gh_disk_size
else:
assert disk_size == local_disk_size

def test_find_duplicates(self):
mock_list = ["foo", "bar", "foo"]
mock_dups = {"foo"}

test_dups = general.find_duplicates(mock_list)
assert test_dups == mock_dups

def test_dict2list_with_dict(self):
mock_dict = {"foo": "bar"}
mock_list = [{"foo": "bar"}]

test_list = general.dict2list(mock_dict)
assert test_list == mock_list

def test_dict2list_with_list(self):
# mock_dict = {'foo': 'bar'}
mock_list = [{"foo": "bar"}]

test_list = general.dict2list(mock_list)
assert test_list == mock_list

@pytest.mark.parametrize(
"entity_id,expected_type",
[
("syn27600053", "folder"),
("syn29862078", "file"),
("syn23643253", "asset view"),
("syn30988314", "folder"),
("syn51182432", "org.sagebionetworks.repo.model.table.TableEntity"),
],
)
def test_entity_type_mapping(self, synapse_store, entity_id, expected_type):
syn = synapse_store.syn

entity_type = entity_type_mapping(syn, entity_id)
assert entity_type == expected_type

def test_entity_type_mapping_invalid_entity_id(self, synapse_store):
syn = synapse_store.syn

# test with an invalid entity id
with pytest.raises(SynapseHTTPError) as exception_info:
entity_type_mapping(syn, "syn123456")

def test_download_manifest_to_temp_folder(self):
with tempfile.TemporaryDirectory() as tmpdir:
path_dir = general.create_temp_folder(tmpdir)
assert os.path.exists(path_dir)


class TestCliUtils:
def test_query_dict(self):
mock_dict = {"k1": {"k2": {"k3": "foobar"}}}
Expand Down
Loading