Skip to content

Get DS file structure with serviceX tool #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
afccc4b
servicex token ignore
Mar 6, 2025
ae69520
adding get_structure utility function and dependencies
Mar 6, 2025
b8d737f
adding suppport for multiple files
Mar 12, 2025
d5e37ca
adding cli option, save_to_text and print flags
Mar 12, 2025
04bed65
CI tests for helper functions + .txt file
Mar 18, 2025
fe6c414
Split get_structure into more helpers
Mar 18, 2025
139708e
adding more tests for deliver spec builder
Mar 18, 2025
e71a01e
Comments on helpers
Mar 21, 2025
7d87a47
docstrings & error msg improvements
Mar 21, 2025
2c88a67
helper test for raw decoding into array - should be in file_peeking.p…
Mar 21, 2025
e1954db
remove unused raw arg
Mar 21, 2025
054e4a6
Removing decode-raw and implementing it in file_peeking
Mar 24, 2025
d987b41
Add tests for str_to_array
Mar 24, 2025
fb4914e
return type instead of array
Mar 25, 2025
a827352
json-based file structure enconding and decoding
Apr 9, 2025
61c7114
simplify serivex import
Apr 9, 2025
95803a6
miniopy dependencie to fix CI importError
Apr 9, 2025
30e0955
Removing --save-to-txt arg on CLI
Apr 10, 2025
01c49a2
logging and better exception handling
Apr 10, 2025
9bfc039
adding handle function for dataset CLI argument
Apr 10, 2025
ae4d039
Adding flake8 in CI - Issue 5
Apr 11, 2025
f21e92e
Pipe line fail if flake8 fail - issue 5
Apr 11, 2025
20361c3
run black instead of flake8
Apr 14, 2025
3289625
Fix black format name
Apr 14, 2025
b1b6bb1
CLI Typer replcement of arg parser , improved sample name display
Apr 14, 2025
c5464d6
Update __init__.py for versioning - resolved merge conflict
ArturU043 Apr 14, 2025
58ba6ae
black-format __init__.py
Apr 14, 2025
ef67554
Merge branch 'main' into fille_peek_dev
ArturU043 Apr 14, 2025
9b55e7f
format after conflict resolve
Apr 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,26 @@ on:
workflow_dispatch:

jobs:

black-format:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.x"

- name: Run Black
run: |
pipx run black --check .

test:
needs:
- black-format

runs-on: ubuntu-latest

steps:
Expand Down
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,10 @@ servicex.yaml

#Distribution
dist/

#ServiceX
servicex.yaml

#Testing
samples_structure.txt

8 changes: 7 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ readme = "README.md"
license = { text = "BSD-3-Clause" }
requires-python = ">=3.9"
dependencies = [
"servicex",
"uproot>=5.0",
"awkward>=2.6",
"dask-awkward>=2024.12.2",
Expand All @@ -32,8 +33,13 @@ test = [
"pytest>=7.2.0",
"numpy>=1.21",
"pyarrow>=8.0.0",
"pandas"
"pandas",
"miniopy-async==1.21.1"
]

[project.scripts]
servicex-get-structure = "servicex_analysis_utils.cli:app"


[tool.hatch.build.targets.wheel]
packages = ["servicex_analysis_utils"]
5 changes: 3 additions & 2 deletions servicex_analysis_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
from .materialization import to_awk
from .materialization import to_awk
from .file_peeking import get_structure

__version__ = "1.0.0"
__all__ = ['to_awk']
__all__ = ["to_awk", "get_structure"]
73 changes: 73 additions & 0 deletions servicex_analysis_utils/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import sys
import json
import os
import logging
from .file_peeking import get_structure
import typer
from typing import List

app = typer.Typer()


def make_dataset_list(dataset_arg):
"""
Helper to handle the user input daset argument.
Loads to dict if input is .json else returns default input
Output is given to get_structure()

Parameters:
dataset_arg (str, [str]): Single DS identifier, list of multiple identifiers or path/to/.json containig identifiers and sample names.

Returns:
dataset (str, [str], dict): dictionary loaded from the json
"""
if len(dataset_arg) == 1 and dataset_arg[0].endswith(".json"):
dataset_file = dataset_arg[0]

if not os.path.isfile(dataset_file):
logging.error(f"Error: JSON file '{dataset_file}' not found.")
sys.exit(1)

try:
with open(dataset_file, "r") as f:
dataset = json.load(f)

if not isinstance(dataset, dict):
logging.error(f"Error: The JSON file must contain a dictionary.")
sys.exit(1)

except json.JSONDecodeError:
logging.error(
f"Error: '{dataset_file}' is not a valid JSON file.", exc_info=True
)
sys.exit(1)

else:
# If DS is provided in CLI instead of json, use it as a list (default)
dataset = dataset_arg

return dataset


@app.command()
def run_from_command(
dataset: List[str] = typer.Argument(
...,
help="Input datasets (Rucio DID) or path to JSON file containing datasets in a dict.",
),
filter_branch: str = typer.Option(
"", "--filter-branch", help="Only display branches containing this string."
),
):
"""
Calls the get_structure function and sends results to stdout.
To run on command line: servicex-get-structure -dataset --filter-branch
"""
ds_format = make_dataset_list(dataset)
result = get_structure(ds_format, filter_branch=filter_branch, do_print=False)

print(result)


if __name__ == "__main__":
app()
Loading