Parallel pyiron table #1050

jan-janssen · 2023-02-27T05:39:35Z

Currently the initialisation for the individual subprocesses is too expensive, so it does not make a lot of sense to focus on a parallel pyiron table implementation. Still it is maybe a good point to start. Example code:

from pyiron_atomistics import Project
pr = Project("test")
pr.remove_jobs(silently=True, recursive=True)

job = pr.create.job.Lammps(job_name="lmp")
job.structure = pr.create.structure.ase.bulk("Al", cubic=True)
murn = job.create_job(pr.job_type.Murnaghan, "murn")
murn.run()

def filter_function(job):
    return job.__name__ == "Lammps"

table = pr.create_table()
table.filter_function = filter_function
table.add.get_energy_tot
table.add.get_volume
table.server.cores = 4
table.run(delete_existing_job=True)

table.get_dataframe()

jan-janssen · 2023-02-27T05:44:15Z

pyiron_base/jobs/datamining.py

@@ -432,7 +420,7 @@ def _collect_job_update_lst(self, job_status_list, job_stored_ids=None):
                and job.status in job_status_list
                and self.filter_function(job)
            ):
-                job_update_lst.append(job)
+                job_update_lst.append(job_id)


This is the tricky part. To apply the filter_function the job is already loaded in inspect mode, but as the job cannot be communicated to the subprocess, it has to be loaded again inside the subprocess.

jan-janssen · 2023-02-27T05:45:26Z

Traceback (most recent call last):
[268](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:269)
  File "/home/runner/work/pyiron_base/pyiron_base/tests/table/test_datamining.py", line 27, in setUp
[269](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:270)
    self.table.run()
[270](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:271)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/utils/deprecate.py", line 171, in decorated
[271](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:272)
    return function(*args, **kwargs)
[272](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:273)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/generic.py", line 693, in run
[273](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:274)
    self._run_if_new(debug=debug)
[274](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:275)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/generic.py", line 1217, in _run_if_new
[275](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:276)
    run_job_with_status_initialized(job=self, debug=debug)
[276](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:277)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/runfunction.py", line 76, in run_job_with_status_initialized
[277](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:278)
    job.run()
[278](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:279)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/utils/deprecate.py", line 171, in decorated
[279](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:280)
    return function(*args, **kwargs)
[280](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:281)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/generic.py", line 695, in run
[281](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:282)
    self._run_if_created()
[282](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:283)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/generic.py", line 1228, in _run_if_created
[283](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:284)
    return run_job_with_status_created(job=self)
[284](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:285)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/runfunction.py", line 99, in run_job_with_status_created
[285](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:286)
    job.run_static()
[286](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:287)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/datamining.py", line 744, in run_static
[287](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:288)
    self.update_table()
[288](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:289)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/utils/deprecate.py", line 171, in decorated
[289](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:290)
    return function(*args, **kwargs)
[290](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:291)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/datamining.py", line 764, in update_table
[291](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:292)
    self._pyiron_table.create_table(
[292](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:293)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/datamining.py", line 282, in create_table
[293](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:294)
    df_new_ids = self._iterate_over_job_lst(
[294](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:295)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/datamining.py", line 366, in _iterate_over_job_lst
[295](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:296)
    diff_dict_lst = list(
[296](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:297)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
[297](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:298)
    for obj in iterable:
[298](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:299)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.8/multiprocessing/pool.py", line 868, in next
[299](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:300)
    raise value
[300](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:301)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.8/multiprocessing/pool.py", line 537, in _handle_tasks
[301](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:302)
    put(task)
[302](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:303)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.8/multiprocessing/connection.py", line 206, in send
[303](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:304)
    self._send_bytes(_ForkingPickler.dumps(obj))
[304](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:305)
  File "/usr/share/miniconda3/envs/my-env/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
[305](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:306)
    cls(buf, protocol).dump(obj)
[306](https://github.com/pyiron/pyiron_base/actions/runs/4279311232/jobs/7449889030#step:9:307)
AttributeError: Can't pickle local object 'FunctionContainer.__setitem__.<locals>.<lambda>'

stale · 2023-03-18T09:04:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Merge main

jan-janssen · 2023-07-17T15:16:15Z

======================================================================
ERROR: test_analysis_project (table.test_datamining.TestProjectData)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/pyiron_base/pyiron_base/tests/table/test_datamining.py", line 30, in setUp
    self.table.run()
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/utils/deprecate.py", line 171, in decorated
    return function(*args, **kwargs)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/generic.py", line 727, in run
    self._run_if_new(debug=debug)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/generic.py", line 1228, in _run_if_new
    run_job_with_status_initialized(job=self, debug=debug)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/runfunction.py", line 91, in run_job_with_status_initialized
    job.run()
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/utils/deprecate.py", line 171, in decorated
    return function(*args, **kwargs)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/generic.py", line 729, in run
    self._run_if_created()
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/generic.py", line 1239, in _run_if_created
    return run_job_with_status_created(job=self)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/job/runfunction.py", line 114, in run_job_with_status_created
    job.run_static()
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/datamining.py", line 749, in run_static
    self.update_table()
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/utils/deprecate.py", line 171, in decorated
    return function(*args, **kwargs)
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/datamining.py", line 769, in update_table
    self._pyiron_table.create_table(
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/datamining.py", line 282, in create_table
    df_new_ids = self._iterate_over_job_lst(
  File "/home/runner/work/pyiron_base/pyiron_base/pyiron_base/jobs/datamining.py", line 368, in _iterate_over_job_lst
    p.imap(_apply_list_of_functions_on_job, job_to_analyse_lst),
AttributeError: 'Pool' object has no attribute 'imap'

jan-janssen · 2023-07-24T16:39:55Z

@ligerzero-ai Does this help your needs for a parallel pyiron table version?

pyiron_base/jobs/datamining.py

ligerzero-ai · 2023-07-24T17:24:33Z

This is great - I've got a separate contribution opening up as a draft in contrib later tonight. That one is completely individualised outside of the pyiron ecosystem. I am hoping that it will allow users to create dataframes for ML potentials easily (a. la. TrainingContainer with a little bit of fiddling) I will ping you there.

stale · 2023-08-12T12:00:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2023-09-17T00:30:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

# Conflicts: # .ci_support/environment.yml # setup.py

jan-janssen · 2024-01-10T07:06:28Z

Based on the performance analysis from @ligerzero-ai in https://github.com/orgs/pyiron/discussions/211#discussioncomment-8034046 and the general implementation of adding support for executors in #1155 - I removed the direct dependence on pympipool and instead allow the user to manually specify the executor. This allows users to choose the executor they prefer.

pmrv · 2024-01-10T11:00:33Z

On hand I like that one can customize the executor. On the other hand having to pass the instance in is a bit clunky and won't work anymore once I submit a table to the queue (or does it somehow?). I think it's ok leave the option to pass the instance in, but it would be nice if the table just creates a ProcessPoolExecutor on its own if none was passed, but server.cores > 1.

…han one

jan-janssen · 2024-01-31T12:57:05Z

The issue here is similar to what we discuss in #1296 . Basically, we can have an executor attached to a single job object, in that case the job object is submitted to the executor and executed on one of the workers of the executor. Still in the case of jobs with multiple job objects inside them, like GenericMaster jobs or alternatively with pyiron tables which have multiple function calls inside them we want assign a single executor which is then used to execute the individual tasks within this job.

Parallel pyiron table

2660dd2

jan-janssen marked this pull request as draft February 27, 2023 05:39

jan-janssen added the format_black reformat the code using the black standard label Feb 27, 2023

Format black

ff4007d

jan-janssen commented Feb 27, 2023

View reviewed changes

stale bot added the stale label Mar 18, 2023

stale bot closed this Apr 2, 2023

jan-janssen reopened this Jul 17, 2023

stale bot removed the stale label Jul 17, 2023

jan-janssen added 4 commits July 17, 2023 08:59

Update datamining.py

e5aa777

Update environment.yml

c00b37d

Update setup.py

ea71463

Merge pull request #1169 from pyiron/main

c571e04

Merge main

Update datamining.py

d3a1648

jan-janssen marked this pull request as ready for review July 24, 2023 16:39

jan-janssen commented Jul 24, 2023

View reviewed changes

pyiron_base/jobs/datamining.py Outdated Show resolved Hide resolved

jan-janssen requested a review from ligerzero-ai July 24, 2023 17:02

stale bot added the stale label Aug 12, 2023

jan-janssen added format_black reformat the code using the black standard and removed stale format_black reformat the code using the black standard labels Aug 18, 2023

jan-janssen added 2 commits August 18, 2023 06:56

Merge remote-tracking branch 'origin/main' into parallel_datamining

6240624

update pympipool

42ef18a

stale bot added the stale label Sep 17, 2023

jan-janssen removed the stale label Dec 17, 2023

jan-janssen and others added 6 commits December 17, 2023 14:27

Merge remote-tracking branch 'origin/main' into parallel_datamining

62360a2

# Conflicts: # .ci_support/environment.yml # setup.py

Update to latest version of pympipool

5f2ea3b

Update environment-old.yml

b9ff835

Merge branch 'main' into parallel_datamining

2ac61cb

Use external executor

396fdfe

Merge branch 'main' into parallel_datamining

d3c9393

jan-janssen added 2 commits January 10, 2024 08:14

Update environment-old.yml

e75983d

Update environment.yml

b5d1e1c

jan-janssen added format_black reformat the code using the black standard and removed format_black reformat the code using the black standard labels Jan 25, 2024

pyiron-runner and others added 3 commits January 25, 2024 12:40

Format black

9196b88

Merge remote-tracking branch 'origin/main' into parallel_datamining

fdad5e5

Choose ProcessPoolExecutor if no Executor is set and cores are more t…

912c3fa

…han one

jan-janssen added format_black reformat the code using the black standard and removed format_black reformat the code using the black standard labels Jan 31, 2024

Format black

42f2fd2

jan-janssen marked this pull request as draft January 31, 2024 12:54

jan-janssen closed this Feb 14, 2024

jan-janssen deleted the parallel_datamining branch February 14, 2024 12:53

jan-janssen restored the parallel_datamining branch February 14, 2024 12:58

jan-janssen reopened this Feb 14, 2024

jan-janssen mentioned this pull request Feb 14, 2024

Use the same executor for pyiron table and function container #1334

Merged

jan-janssen closed this Feb 14, 2024

jan-janssen deleted the parallel_datamining branch February 14, 2024 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel pyiron table #1050

Parallel pyiron table #1050

jan-janssen commented Feb 27, 2023

jan-janssen Feb 27, 2023

jan-janssen commented Feb 27, 2023

stale bot commented Mar 18, 2023

jan-janssen commented Jul 17, 2023

jan-janssen commented Jul 24, 2023

ligerzero-ai commented Jul 24, 2023

stale bot commented Aug 12, 2023

stale bot commented Sep 17, 2023

jan-janssen commented Jan 10, 2024

pmrv commented Jan 10, 2024

jan-janssen commented Jan 31, 2024

Parallel pyiron table #1050

Parallel pyiron table #1050

Conversation

jan-janssen commented Feb 27, 2023

jan-janssen Feb 27, 2023

Choose a reason for hiding this comment

jan-janssen commented Feb 27, 2023

stale bot commented Mar 18, 2023

jan-janssen commented Jul 17, 2023

jan-janssen commented Jul 24, 2023

ligerzero-ai commented Jul 24, 2023

stale bot commented Aug 12, 2023

stale bot commented Sep 17, 2023

jan-janssen commented Jan 10, 2024

pmrv commented Jan 10, 2024

jan-janssen commented Jan 31, 2024