Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file_indices: adapt to the new schema of metadata for the file indices #147

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 24 additions & 41 deletions cernopendata_client/searcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,51 +196,34 @@ def get_files_list(
if server != SERVER_HTTP_URI and searcher_protocol != "xrootd":
searcher_protocol = server.split(":")[0]
files_list = []

new_server = SERVER_ROOT_URI
if searcher_protocol == "http":
new_server = server
elif searcher_protocol == "https":
new_server = SERVER_HTTPS_URI

for file_ in record_json["metadata"]["files"]:
files_list.append((file_["uri"], file_["size"], file_["checksum"]))
if expand:
# let's unwind file indexes
files_list_expanded = []
for file_ in files_list:
if file_[0].endswith("_file_index.json"):
try:
url_file = "{}/record/{}/files/{}".format(
server, str(record_json["id"]), file_[0].split("/")[-1]
for file_ in record_json["metadata"]["_file_indices"]:
if expand:
# let's unwind file indexes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the changes do not pass unit tests, e.g. see the CI report for Python 3.12:

================== 22 failed, 50 passed, 8 skipped in 45.12s ===================

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After CERN Open Data portal service update, I'm still getting locally failed tests:

$ tox -e py312
...
FAILED tests/test_cli_download_files.py::test_download_files_http_requests - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_https_requests - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_download_engine - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_with_verify - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_name - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_name_multiple_values - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_regexp_single_file - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_regexp_multiple_files - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_range - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_range_multiple_values - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_single_range_single_regexp - assert 1 == 0
FAILED tests/test_cli_download_files.py::test_download_files_filter_multiple_range_single_regexp - assert 1 == 0
FAILED tests/test_cli_get_file_locations.py::test_get_file_locations_from_recid_without_files - AssertionError: assert 1 == 0
FAILED tests/test_cli_verify_files.py::test_verify_files - assert 1 == 0
FAILED tests/test_cli_verify_files.py::test_verify_files_https_server - assert 1 == 0
FAILED tests/test_metadater.py::test_get_metadata_from_filter_metadata_two - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_count - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_checksum - assert 1 == 0
FAILED tests/test_verifier.py::test_get_file_info_local_good_input_wrong_size - assert 1 == 0

For example, this command works:

$ cernopendata-client download-files --recid 1 --no-expand
==> Downloading file 1 of 6
  -> File: ./1/CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0000_file_index.json
  -> Progress: 322/322 KiB (100%)
^C

$ cernopendata-client download-files --recid 1
==> Downloading file 1 of 2916
  -> File 00E16FBB-9071-E011-83D3-003048673F12.root is incomplete. Resuming download.
  -> File: ./1/00E16FBB-9071-E011-83D3-003048673F12.root
^C-> Progress: 124229/596996 KiB (20%)
Aborted!

Whilst this (simplest) use case of directly attached files does not work:

$ cernopendata-client download-files --recid 5500
==> Downloading file 1 of 11
==> ERROR: Download error occured. Please try again.
Traceback (most recent call last):
  File "/home/tibor/.virtualenvs/cernopendata-client/bin/cernopendata-client", line 8, in <module>
    sys.exit(cernopendata_client())
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/cli.py", line 377, in download_files
    download_single_file(
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/downloader.py", line 340, in download_single_file
    downloader.file_downloader()
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/cernopendata_client/downloader.py", line 80, in file_downloader
    response = requests.get(self.file_location, headers=headers, stream=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 697, in send
    adapter = self.get_adapter(url=request.url)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tibor/.virtualenvs/cernopendata-client/lib/python3.12/site-packages/requests/sessions.py", line 792, in get_adapter
    raise InvalidSchema(f"No connection adapters were found for {url!r}")
requests.exceptions.InvalidSchema: No connection adapters were found for 'root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/BuildFile.xml'

for inner_file in file_["files"]:
files_list.append(
(
inner_file["uri"].replace(SERVER_ROOT_URI, new_server),
inner_file["size"],
inner_file["checksum"],
)
json_files = requests.get(url_file).json()
except Exception:
display_message(
msg_type="error",
msg="Error occured while fetching file info. Please try again.",
)
sys.exit(1)
for file_ in json_files:
files_list_expanded.append(
(
file_["uri"],
file_["size"],
file_["checksum"],
)
)
elif file_[0].endswith("_file_index.txt"):
pass
else:
files_list_expanded.append(file_)
files_list = files_list_expanded
if searcher_protocol == "http":
files_list = [
(file_[0].replace(SERVER_ROOT_URI, server), file_[1], file_[2])
for file_ in files_list
]
elif searcher_protocol == "https":
files_list = [
(
file_[0].replace(SERVER_ROOT_URI, SERVER_HTTPS_URI),
file_[1],
file_[2],
)
else:
files_list.append(
(
f"{SERVER_HTTPS_URI}/record/{record_json['metadata']['recid']}/file_index/{file_['key']}",
file_["size"],
"",
)
)
for file_ in files_list
]
return files_list


Expand Down
Loading