Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New parallelization #108

Merged
merged 39 commits into from
Jan 20, 2025
Merged

New parallelization #108

merged 39 commits into from
Jan 20, 2025

Conversation

lmseidler
Copy link
Member

  • added a parallelization mechanic where the task of generating a molecule is assigned to a group of CPUs
  • the initial step for generating a molecule is now always a xtb single-point calculation, if that succeeds we go ahead as before
  • refinement and postprocessing now use 4 cores instead of 1, the xtb single-points still use 1
  • printing is basically disabled as soon as parallelization is taking place over multiple groups of CPUs (new system required)

@marcelmbn
Copy link
Member

There are still some simple errors observed by the linter.

In your local environment, you could also do

pre-commit install
pre-commit run --all-files

to check for such errors beforehand.

@marcelmbn marcelmbn added enhancement New feature or request algorithm Related to the back-end algorithm for generating the mindless molecules. labels Jan 16, 2025
@marcelmbn
Copy link
Member

marcelmbn commented Jan 17, 2025

Some questions/feedback

Parallelization over num_molecules

Is this parallelization currently active already? If I use mindlessgen with this config:

General configuration:
                     verbosity:   1
                    max_cycles:   200
                  print_config:   False
                      parallel:   6
                 num_molecules:   12
                   postprocess:   False
                     write_xyz:   True

I get the following output (cut-out):

================================================================================
====================== Generating molecule 2    of 12   ========================
================================================================================
Cycle 2:
Cycle 3:
Cycle 4:
Cycle 5:
Cycle 1:
Cycle 6:
Refinement failed for cycle 5.
Refinement failed for cycle 1.
Refinement failed for cycle 3.
Refinement failed for cycle 4.
Optimized mindless molecule found in 1 cycles.
Molecule: C2H1N3O1B1F1Mo1_f8cca6
# atoms: 10
total charge: 0
# unpaired electrons: 0
atomic numbers: [1 0 0 0 1 2 3 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
sum formula: C2H1N3O1B1F1Mo1
atomic coordinates:
[[ 2.07560281  0.9394159  -2.8190336 ]
 [ 1.99580601  0.23456065 -0.75900094]
 [ 1.85035243  0.79980126  0.59708992]
 [-1.4080863   0.32853373  0.30295643]
 [ 0.79170033  0.87904563  1.24876862]
 [ 1.66806589  0.44344177 -2.03298779]
 [-0.13604181  0.03129995  0.66926263]
 [-1.65081553 -0.6993044  -0.53229386]
 [-0.26992539 -1.28684253 -2.78462747]
 [-0.02974515 -0.40335067 -1.23333865]]
atomic number per index: [ 0  4  5  5  6  6  6  7  8 41]

================================================================================
====================== Generating molecule 3    of 12   ========================
================================================================================

...meaning one molecule is generated after the other.

EDIT: My fault. With 6 cores, we have only one block and this block takes all 6 cores, right?

Error when executing mindlessgen with -P 1

╰─(mindlessgen) ○ mindlessgen
Reading configuration from file: '/Users/marcelmueller/projects/mindlessgen/testparallelization/mindlessgen.toml'
╔══════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                                                  ║
║   ███╗   ███╗██╗███╗   ██╗██████╗ ██╗     ███████╗███████╗███████╗ ██████╗ ███████╗███╗   ██╗    ║
║   ████╗ ████║██║████╗  ██║██╔══██╗██║     ██╔════╝██╔════╝██╔════╝██╔════╝ ██╔════╝████╗  ██║    ║
║   ██╔████╔██║██║██╔██╗ ██║██║  ██║██║     █████╗  ███████╗███████╗██║  ███╗█████╗  ██╔██╗ ██║    ║
║   ██║╚██╔╝██║██║██║╚██╗██║██║  ██║██║     ██╔══╝  ╚════██║╚════██║██║   ██║██╔══╝  ██║╚██╗██║    ║
║   ██║ ╚═╝ ██║██║██║ ╚████║██████╔╝███████╗███████╗███████║███████║╚██████╔╝███████╗██║ ╚████║    ║
║   ╚═╝     ╚═╝╚═╝╚═╝  ╚═══╝╚═════╝ ╚══════╝╚══════╝╚══════╝╚══════╝ ╚═════╝ ╚══════╝╚═╝  ╚═══╝    ║
║                                                                                                  ║
║                                       MindlessGen v0.5.2                                         ║
║                                 Semi-Automated Molecule Generator                                ║
║                                                                                                  ║
║                          Licensed under the Apache License, Version 2.0                          ║
║                           (http://www.apache.org/licenses/LICENSE-2.0)                           ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════╝
General configuration:
                     verbosity:   1
                    max_cycles:   200
                  print_config:   False
                      parallel:   1
                 num_molecules:   1
                   postprocess:   False
                     write_xyz:   True

Generate configuration:
                 min_num_atoms:   5
                 max_num_atoms:   10
            init_coord_scaling:   3.0
       increase_scaling_factor:   1.1
           element_composition:   {5: (2, 3), 0: (1, 2), 7: (1, 2), 6: (1, None)}
            forbidden_elements:   [56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102]
      scale_fragment_detection:   1.25
        scale_minimal_distance:   0.8
               contract_coords:   True
              molecular_charge:   None
             fixed_composition:   False

Gxtb configuration:
                     gxtb_path:   gxtb
                    scf_cycles:   100

Orca configuration:
                     orca_path:   /path/to/orca
                    functional:   PBE
                         basis:   def2-SVP
                      gridsize:   1
                    scf_cycles:   100

Postprocess configuration:
                        engine:   orca
                    opt_cycles:   5
                      optimize:   True
                         debug:   False

Refine configuration:
               max_frag_cycles:   10
                        engine:   xtb
                         hlgap:   0.5
                         debug:   False

Xtb configuration:
                      xtb_path:   /path/to/xtb
                         level:   2


Running with 1 cores.
Traceback (most recent call last):
  File "/Users/marcelmueller/mambaforge/envs/mindlessgen/bin/mindlessgen", line 8, in <module>
    sys.exit(console_entry_point())
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marcelmueller/source/MindlessGen/src/mindlessgen/cli/entrypoint.py", line 39, in console_entry_point
    molecules, exitcode = generator(config)
                          ^^^^^^^^^^^^^^^^^
  File "/Users/marcelmueller/source/MindlessGen/src/mindlessgen/generator/main.py", line 111, in generator
    with setup_managers(num_cores // MINCORES_PLACEHOLDER, num_cores) as (
  File "/Users/marcelmueller/mambaforge/envs/mindlessgen/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/Users/marcelmueller/source/MindlessGen/src/mindlessgen/prog/parallel.py", line 9, in setup_managers
    executor: ProcessPoolExecutor = ProcessPoolExecutor(max_workers=max_workers)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/marcelmueller/mambaforge/envs/mindlessgen/lib/python3.12/concurrent/futures/process.py", line 685, in __init__
    raise ValueError("max_workers must be greater than 0")
ValueError: max_workers must be greater than 0

.pre-commit-config.yaml Outdated Show resolved Hide resolved
src/mindlessgen/prog/config.py Outdated Show resolved Hide resolved
src/mindlessgen/qm/orca.py Outdated Show resolved Hide resolved
@lmseidler
Copy link
Member Author

somehow in my last commit a lot of changes went out of the window, I have to revert some commits to get everything back to how it should be first

@marcelmbn
Copy link
Member

And a general comment: Have you already checked if simple tqdm-based progress bar could work for the num_molecules parallelization?

@lmseidler
Copy link
Member Author

i'm gonna look into this now to replace the whole printer thread thing

@marcelmbn
Copy link
Member

This entry types-tqdm has to be added to

[project.optional-dependencies]
dev = [
    "ruff==0.5.7",
    "mypy",
    "covdefaults",
    "coverage",
    "pre-commit",
    "pytest",
    "tox",
    "setuptools_scm>=8",
    "types-toml",
]

in pyproject.toml.

@marcelmbn
Copy link
Member

Can you adjust the "default" mindlessgen.toml? And maybe add a more detailed description of what doesn't work anymore (debug, print statements, ...) to CHANGELOG.md`?

Have you seen that one?

@marcelmbn
Copy link
Member

Looks way better now already! 🎉

Currently, the output stops with

/Users/marcelmueller/source/MindlessGen/src/mindlessgen/prog/config.py:187: UserWarning: Parallelization will disable verbosity during iterative search. Set '--verbosity 0' or '-P 1' to avoid this warning, or simply ignore it.
  warnings.warn(
Running with 12 cores.

--- Appending to existing file 'mindless.molecules'. ---
Generating Molecules ...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:09<00:00,  3.13s/it]

Do you think there's a possibility to print the names of the molecules that were found in the very end?

@marcelmbn
Copy link
Member

Quick note on the following comment in l. 236 of main.py:

    elif optimized_molecule is None:
        # TODO: will this conflict with progress bar?
        warnings.warn(
            "Molecule generation including optimization (and postprocessing) "
            + f"failed for all cycles for molecule {molcount + 1}."
        )

Yes, it does kinda interact with the num_molecules parallelization progress bar, but I don't think that's a problem. Interrupting the progress bar for this specific reason is actually desired, I'd say:

/Users/marcelmueller/source/MindlessGen/src/mindlessgen/prog/config.py:187: UserWarning: Parallelization will disable verbosity during iterative search. Set '--verbosity 0' or '-P 1' to avoid this warning, or simply ignore it.
  warnings.warn(
Running with 12 cores.

--- Appending to existing file 'mindless.molecules'. ---
Generating Molecules ...:   0%|                                                                                                                                    | 0/3 [00:00<?, ?it/s]/Users/marcelmueller/source/MindlessGen/src/mindlessgen/generator/main.py:237: UserWarning: Molecule generation including optimization (and postprocessing) failed for all cycles for molecule 2.
  warnings.warn(
Generating Molecules ...:  33%|█████████████████████████████████████████▎                                                                                  | 1/3 [00:54<01:49, 54.91s/it]/Users/marcelmueller/source/MindlessGen/src/mindlessgen/generator/main.py:237: UserWarning: Molecule generation including optimization (and postprocessing) failed for all cycles for molecule 3.
  warnings.warn(
Generating Molecules ...:  67%|██████████████████████████████████████████████████████████████████████████████████▋                                         | 2/3 [00:59<00:25, 25.23s/it]/Users/marcelmueller/source/MindlessGen/src/mindlessgen/generator/main.py:237: UserWarning: Molecule generation including optimization (and postprocessing) failed for all cycles for molecule 1.
  warnings.warn(
Generating Molecules ...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:03<00:00, 21.06s/it]

Successfully generated 0 molecules:

Ran MindlessGen in 00:01:03 (HH:MM:SS)
/Users/marcelmueller/source/MindlessGen/src/mindlessgen/cli/entrypoint.py:53: UserWarning: CAUTION: Generation failed for all molecules. No files written.
  warnings.warn("CAUTION: Generation failed for all molecules. No files written.")

@marcelmbn
Copy link
Member

marcelmbn commented Jan 19, 2025

I've just added some minor changes and adjustments. From my side, the PR is ready to merge and I will approve the changes now.

Before merging:

Please go through the changes I've introduced in addition to your commits. If you agree on them, just merge the PR. 🙂

marcelmbn
marcelmbn previously approved these changes Jan 19, 2025
Copy link
Member

@marcelmbn marcelmbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for making mindlessgen a lot better!

@marcelmbn
Copy link
Member

@jonathan-schoeps Just noticed that you've introduced the TM support entry in CHANGELOG.md at a wrong position. Hope it's fine that we correct it just here, @lmseidler, with d8b3e46.

marcelmbn
marcelmbn previously approved these changes Jan 19, 2025
Signed-off-by: Marcel Müller <[email protected]>
@lmseidler lmseidler merged commit 7364159 into grimme-lab:main Jan 20, 2025
11 checks passed
lmseidler added a commit that referenced this pull request Jan 20, 2025
* initial work

* new parallelization preliminary done

* small fix in tests

* commented unnecessary code

* updated construction sites

* hopefully fixed parallelization

* small test fix

* initial work

* new parallelization preliminary done

* small fix in tests

* commented unnecessary code

* updated construction sites

* hopefully fixed parallelization

* small test fix

* only print when verbosity > 0, otherwise nothing is printed (bad)

* pre-commit fixes

* added ncores to config + implemented setting ncores for external programs

* fixed tests

* added check for number of cores available vs needed

* pre-commit

* Added support for Turbomole (#101)

* Interface for the qm program turbomole

Signed-off-by: Jonathan Schöps <[email protected]>

* support for turbomole

Signed-off-by: Jonathan Schöps <[email protected]>

* corrected some typos

Signed-off-by: Jonathan Schöps <[email protected]>

* changed the PARNODES command

Signed-off-by: Jonathan Schöps <[email protected]>

* coord handling

Signed-off-by: Jonathan Schöps <[email protected]>

* requestet changes are build in

Signed-off-by: Jonathan Schöps <[email protected]>

* Implementation of the requested changes and the test of the read_mol_from_coord function

Signed-off-by: Jonathan Schöps <[email protected]>

* Implementation of two pathes for turbomole

Signed-off-by: Jonathan Schöps <[email protected]>

* Ignore HOMO-LUMO gap not implemented as refine engine  for Orca and Trubomole

Signed-off-by: Jonathan Schöps <[email protected]>

* improve TURBOMOLE configuration

Signed-off-by: Marcel Müller <[email protected]>

* correct linting errors

Signed-off-by: Marcel Mueller <[email protected]>

* exclude more recent interfaces from coverage check

Signed-off-by: Marcel Müller <[email protected]>

* added a test for the 'write_coord_to_file' function

Signed-off-by: Jonathan Schöps <[email protected]>

* sync the lokal branch with the remote branch

Signed-off-by: Jonathan Schöps <[email protected]>

---------

Signed-off-by: Jonathan Schöps <[email protected]>
Signed-off-by: Marcel Müller <[email protected]>
Signed-off-by: Marcel Mueller <[email protected]>
Co-authored-by: Marcel Müller <[email protected]>

* test fix

* fixed tm implementation

* updated main.py

* tqdm progress bar

* updated dependencies

* mypy types import

* moved warnings in correct bracket

* updated default config toml

* added final output of molecules and timing

* fixed time

* better time info

* print formatting

* shift block setup to parallel.py

Signed-off-by: Marcel Müller <[email protected]>

* avoid UnboundLocalError

Signed-off-by: Marcel Müller <[email protected]>

* some code formatting and printout adjustments

Signed-off-by: Marcel Müller <[email protected]>

* shifted CHANGELOG entry to correct position

Signed-off-by: Marcel Müller <[email protected]>

* update CODEOWNERS file

Signed-off-by: Marcel Müller <[email protected]>

* New parallelization (#108)

* initial work

* new parallelization preliminary done

* small fix in tests

* commented unnecessary code

* updated construction sites

* hopefully fixed parallelization

* small test fix

* initial work

* new parallelization preliminary done

* small fix in tests

* commented unnecessary code

* updated construction sites

* hopefully fixed parallelization

* small test fix

* only print when verbosity > 0, otherwise nothing is printed (bad)

* pre-commit fixes

* added ncores to config + implemented setting ncores for external programs

* fixed tests

* added check for number of cores available vs needed

* pre-commit

* test fix

* fixed tm implementation

* updated main.py

* tqdm progress bar

* updated dependencies

* mypy types import

* moved warnings in correct bracket

* updated default config toml

* added final output of molecules and timing

* fixed time

* better time info

* print formatting

* shift block setup to parallel.py

Signed-off-by: Marcel Müller <[email protected]>

* avoid UnboundLocalError

Signed-off-by: Marcel Müller <[email protected]>

* some code formatting and printout adjustments

Signed-off-by: Marcel Müller <[email protected]>

* shifted CHANGELOG entry to correct position

Signed-off-by: Marcel Müller <[email protected]>

* update CODEOWNERS file

Signed-off-by: Marcel Müller <[email protected]>

---------

Signed-off-by: Marcel Müller <[email protected]>
Co-authored-by: Marcel Müller <[email protected]>

* add IP/EA check with g-xTB for special purpose dev application only

Signed-off-by: Marcel Müller <[email protected]>

* rename gp3 to gxtb consistently

Signed-off-by: Marcel Müller <[email protected]>

* add simple SCF cycles check for molecule

Signed-off-by: Marcel Mueller <[email protected]>

* align variable names with g-xTB

Signed-off-by: Marcel Mueller <[email protected]>

* config is now loaded in dev mode; avoid redundant calculations

Signed-off-by: Marcel Mueller <[email protected]>

* hand over correct config and remove redundant config

Signed-off-by: Marcel Mueller <[email protected]>

* remove last rememnant of config for SCF cycles with g-xTB

Signed-off-by: Marcel Mueller <[email protected]>

* added ncores to gxtb calls

* remove redundant config entry

Signed-off-by: Marcel Mueller <[email protected]>

---------

Signed-off-by: Jonathan Schöps <[email protected]>
Signed-off-by: Marcel Müller <[email protected]>
Signed-off-by: Marcel Mueller <[email protected]>
Co-authored-by: Jonathan Schöps <[email protected]>
Co-authored-by: Marcel Müller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm Related to the back-end algorithm for generating the mindless molecules. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-- lower priority -- Parallelization over number of molecules instead of trials per molecule
2 participants