Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow-dags: Scoap3 #331

Closed
15 of 26 tasks
ErnestaP opened this issue May 27, 2024 · 1 comment
Closed
15 of 26 tasks

Airflow-dags: Scoap3 #331

ErnestaP opened this issue May 27, 2024 · 1 comment
Assignees

Comments

@ErnestaP
Copy link

ErnestaP commented May 27, 2024

  • * APS_PULL_API: OK

  • * IOP_PULL_SFTP: OK

  • * IOP_PROCESS_FILE: File "/opt/airflow/dags/repo/dags/scoap3/common/utils.py", line 318, in upload_json_to_s3 current_date = datetime.now().date() AttributeError: module 'datetime' has no attribute 'now'

  • * APS_PROCESS_FILE: save_to_s3: File "/opt/airflow/dags/repo/dags/scoap3/common/utils.py", line 318, in upload_json_to_s3 current_date = datetime.now().date() , create_or_update: requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with URL: /api/article-workflow-import/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f502b88ddc0>: Failed to establish a new connection: [Errno 111] Connection refused')) - looks like that django url is not exported

  • * ELSEVIER_PULL_SFTP: migeare_from_ftp: File "/home/airflow/.local/lib/python3.8/site-packages/paramiko/client.py", line 409, in connect raise NoValidConnectionsError(errors) paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 2222 on 127.0.0.1 or ::1

  • * HINDAWI_PULL_API: requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://www.hindawi.com/oai-pmh/oai.aspx?from=2024-04-11&until=2024-05-27&verb=listrecords&set=HINDAWI.AHEP&metadataprefix=marc21

  • * SPRINGER_PULL_SFTP: pull sftp [2024-05-27, 12:28:11 UTC] {transport.py:1909} INFO - Auth banner: b'EFT Login - %DATE% %TIME% - Please enter valid credentials to continue'

  • * OUP_PULL_FRP: migrate_from_ftp: File "/usr/local/lib/python3.8/ftplib.py", line 250, in getresp raise error_perm(resp) ftplib.error_perm: 500 OOPS: cannot change directory:/data/GAB/Editorial


APS

  • All the tasks run successfully
  • Files are in S3
  • Files are in django backend

Elsevier:

  • All the tasks run successfully
  • Files are in S3
  • Files are in django backend

IOP

  • All the tasks run successfully
  • Files are in S3
  • Files are in django backend

OUP:

  • All the tasks run successfully (tplib.error_perm: 500 OOPS: cannot change directory:/data/GAB/Editorial)
  • Files are in S3
  • Files are in django backend

SPRINGER:

  • All the tasks run successfully
  • Files are in S3
  • Files are in django backend

HINDAWI:


Springer: sometimes cannot find the process dag in a dag bag;
Screenshot-2024-05-28-at-15 43 26
Sometimes cannot push the record to Django API:

scoap3-springer-process-file-create-or-update-y80pwdz6
*** Found local files:
***   * /opt/airflow/logs/dag_id=scoap3_springer_process_file/run_id=springer__2024-05-28T13:46:04.067318+0000/task_id=create_or_update/attempt=1.log
[2024-05-28, 13:50:16 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: scoap3_springer_process_file.create_or_update springer__2024-05-28T13:46:04.067318+0000 [queued]>
[2024-05-28, 13:50:16 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: scoap3_springer_process_file.create_or_update springer__2024-05-28T13:46:04.067318+0000 [queued]>
[2024-05-28, 13:50:16 UTC] {taskinstance.py:2193} INFO - Starting attempt 1 of 1
[2024-05-28, 13:50:16 UTC] {taskinstance.py:2217} INFO - Executing <Task(_PythonDecoratedOperator): create_or_update> on 2024-05-28 13:46:04.116264+00:00
[2024-05-28, 13:50:16 UTC] {standard_task_runner.py:60} INFO - Started process 22 to run task
[2024-05-28, 13:50:16 UTC] {standard_task_runner.py:87} INFO - Running: ['airflow', 'tasks', 'run', 'scoap3_springer_process_file', 'create_or_update', 'springer__2024-05-28T13:46:04.067318+0000', '--job-id', '1618', '--raw', '--subdir', 'DAGS_FOLDER/scoap3/springer/springer_process_file.py', '--cfg-path', '/tmp/tmpb6rsqv9x']
[2024-05-28, 13:50:16 UTC] {standard_task_runner.py:88} INFO - Job 1618: Subtask create_or_update
[2024-05-28, 13:50:16 UTC] {task_command.py:423} INFO - Running <TaskInstance: scoap3_springer_process_file.create_or_update springer__2024-05-28T13:46:04.067318+0000 [running]> on host scoap3-springer-process-file-create-or-update-y80pwdz6
[2024-05-28, 13:50:16 UTC] {logging_mixin.py:188} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/template_rendering.py:46 AirflowProviderDeprecationWarning: This function is deprecated. Please use `create_unique_id`.
[2024-05-28, 13:50:16 UTC] {logging_mixin.py:188} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/kubernetes_helper_functions.py:145 AirflowProviderDeprecationWarning: This function is deprecated. Please use `add_unique_suffix`.
[2024-05-28, 13:50:16 UTC] {pod_generator.py:555} WARNING - Model file /opt/airflow/pod_templates/pod_template_file.yaml does not exist
[2024-05-28, 13:50:17 UTC] {taskinstance.py:2513} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='scoap3_springer_process_file' AIRFLOW_CTX_TASK_ID='create_or_update' AIRFLOW_CTX_EXECUTION_DATE='2024-05-28T13:46:04.116264+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='springer__2024-05-28T13:46:04.067318+0000'
[2024-05-28, 13:50:17 UTC] {logging_mixin.py:188} INFO - 2024-05-28 13:50:17 [info     ] Sending data to the backend    data={'dois': [{'value': '10.1140/epjc/s10052-024-12798-3'}], 'arxiv_eprints': [{'value': '2401.04587', 'categories': ['hep-ph']}], 'page_nr': [8], 'authors': [{'surname': 'Lin', 'given_names': 'Jia-Xin', 'affiliations': [{'value': 'School of Physics, Southeast University, Nanjing, 210094, China', 'organization': 'Southeast University', 'country': 'China'}], 'full_name': 'Lin, Jia-Xin'}, {'surname': 'Chen', 'given_names': 'Hua-Xing', 'email': '[email protected]', 'affiliations': [{'value': 'School of Physics, Southeast University, Nanjing, 210094, China', 'organization': 'Southeast University', 'country': 'China'}], 'full_name': 'Chen, Hua-Xing'}, {'surname': 'Liang', 'given_names': 'Wei-Hong', 'email': '[email protected]', 'affiliations': [{'value': 'Department of Physics, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}, {'value': 'Guangxi Key Laboratory of Nuclear Physics and Technology, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}], 'full_name': 'Liang, Wei-Hong'}, {'surname': 'Xiao', 'given_names': 'Chu-Wen', 'email': '[email protected]', 'affiliations': [{'value': 'Department of Physics, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}, {'value': 'Guangxi Key Laboratory of Nuclear Physics and Technology, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}], 'full_name': 'Xiao, Chu-Wen'}, {'surname': 'Oset', 'given_names': 'Eulogio', 'email': '[email protected]', 'affiliations': [{'value': 'Department of Physics, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}, {'value': 'Departamento de Física Teórica and IFIC, Centro Mixto Universidad de Valencia-CSIC Institutos de Investigación de Paterna, Aptdo. 22085, Valencia, 46071, Spain', 'organization': 'Centro Mixto Universidad de Valencia-CSIC Institutos de Investigación de Paterna', 'country': 'Spain'}], 'full_name': 'Oset, Eulogio'}], 'license': [{'url': 'https://creativecommons.org/licenses/by/4.0', 'license': 'CC-BY-4.0'}], 'collections': [{'primary': 'European Physical Journal C'}], 'files': {'pdfa': 'scoap3-dev-backend/media/harvested_files/10.1140/epjc/s10052-024-12798-3/10052_2024_Article_12798.pdf', 'xml': 'scoap3-dev-backend/media/harvested_files/10.1140/epjc/s10052-024-12798-3/10052_2024_Article_12798.xml.Meta.xml'}, 'publication_info': [{'journal_title': 'European Physical Journal C', 'journal_volume': '84', 'year': 2024, 'journal_issue': '4', 'artid': 's10052-024-12798-3', 'page_start': '1', 'page_end': '8', 'material': 'article'}], 'abstracts': [{'value': 'Starting from the molecular picture for the $$D_{s1}(2460)$$ and $$D_{s1}(2536)$$ resonances, which are dynamically generated by the interaction of coupled channels, the most important of which are the $$D^*K$$ for the $$D_{s1}(2460)$$ and $$DK^*$$ for the $$D_{s1}(2536)$$ , we evaluate the ratio of decay widths for the $$\\bar{B}_s^0 \\rightarrow D_{s1}(2460)^+ K^-$$ and $$\\bar{B}_s^0 \\rightarrow D_{s1}(2536)^+ K^-$$ decays, the latter of which has been recently investigated by the LHCb collaboration, and we obtain a ratio of the order of unity. The present results should provide an incentive for the related decay into the $$D_{s1}(2460)$$ resonance to be performed, which would provide valuable information on the nature of these two resonances.', 'source': 'Springer'}], 'acquisition_source': {'source': 'Springer', 'method': 'Springer', 'date': '2024-05-28T13:48:02.955902'}, 'copyright': [{'holder': 'The Author(s)', 'year': 2024}], 'imprints': [{'date': '2024-04-29', 'publisher': 'Springer'}], 'record_creation_date': '2024-05-28T13:48:02.955902', 'titles': [{'source': 'Springer'}]}
[2024-05-28, 13:50:17 UTC] {logging_mixin.py:188} INFO - 2024-05-28 13:50:17 [error    ] b'{"message":"null value in column \\"title\\" of relation \\"articles_article\\" violates not-null constraint\\nDETAIL:  Failing row contains (8283, null, null, 2024-04-29, null, null, , Starting from the molecular picture for the $$D_{s1}(2460)$$ and..., 2024-05-28 13:50:17.132762+00, 2024-05-28 13:50:17.132791+00).\\n"}'
[2024-05-28, 13:50:17 UTC] {taskinstance.py:2731} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 444, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/decorators/base.py", line 241, in execute
    return_value = super().execute(context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 200, in execute
    return_value = self.execute_callable()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 217, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/opt/airflow/dags/repo/dags/scoap3/springer/springer_process_file.py", line 89, in create_or_update
    create_or_update_article(enriched_file)
  File "/home/airflow/.local/lib/python3.8/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/opt/airflow/dags/repo/dags/scoap3/common/utils.py", line 278, in create_or_update_article
    response.raise_for_status()
  File "/home/airflow/.local/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://backend.dev.scoap3.org/api/article-workflow-import/
[2024-05-28, 13:50:17 UTC] {taskinstance.py:1149} INFO - Marking task as FAILED. dag_id=scoap3_springer_process_file, task_id=create_or_update, execution_date=20240528T134604, start_date=20240528T135016, end_date=20240528T135017
[2024-05-28, 13:50:17 UTC] {standard_task_runner.py:107} ERROR - Failed to execute job 1618 for task create_or_update (400 Client Error: Bad Request for url: https://backend.dev.scoap3.org/api/article-workflow-import/; 22)
[2024-05-28, 13:50:17 UTC] {local_task_job_runner.py:234} INFO - Task exited with return code 1
[2024-05-28, 13:50:17 UTC] {taskinstance.py:3312} INFO - 0 downstream tasks scheduled from follow-on schedule check

@ErnestaP ErnestaP self-assigned this May 28, 2024
@ErnestaP
Copy link
Author

ErnestaP commented Jun 5, 2024

airflow was migrated back to a separate instance for each project

@ErnestaP ErnestaP closed this as completed Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant