-
Notifications
You must be signed in to change notification settings - Fork 189
Proceedings 2019 ESPResSo meetings
Jean-Noël Grad edited this page Jul 20, 2021
·
1 revision
- remove espresso LBCPU in the PR integrating waLBerla LBCPU
- waLBerla doesn't currently have LBGPU implementation that can be integrated in espresso out-of-the-box
- EKCPU can be implemented using stencils
- EKGPU in waLBerla is unclear
- Lees-Edwards depends on LBCPU only
- find out which LB systems are sensitive to single-precision on GPU
- quantify speed-up of espresso LBGPU vs waLBerla LBCPU
- factor out globals
- refactor the analysis functions
- CIP pool only available on Fridays
- Particle collision: have a technical meeting on particle creation (Flo, Rudolf, Ingo, Christoph, Philip)
- MMM2D: check for feature in Scafacos
- ELC: schedule meeting
- GitHub Actions: could replace GitLab-CI when more features become available, defer for a few months
- Thread: #3093
- CMake (#3090)
- We currently officially require 3.4. Doesn't actually work in all environments
- FetchContent module is of importance and is available in 3.11
- Newer versions (currently 3.13) can be installed with
pip install --user cmake
- Move to 3.10 for now (Ubuntu 18.04)
- Boost
- We currently require 1.55
- Move to 1.65 (Ubuntu 18.04)
- 1.65 would let us remove most custom handling for compiler/boost version combos
- Some 32-bit relevant bug fixes came with 1.67
- Boost.qvm which we might use came with 1.62
- Boost has to be built manually due to boost-mpi
- Compile boost manually on Ubuntu 16.04 (for CUDA 9.0 Docker image)
- Cuda
- Our cluster currently runs 9.1. That's the only reason to still support that
- 9.1 requires a Ubuntu 16.04 Docker image
- Python version
- Python 3.6 on CentOS 7, Python 3.7 on Debian10
- Python 3.5 on Ubuntu 16.04 (for CUDA 9.0 Docker image)
- Cython
- We require 0.23 but have had to work around quite a few issues already
- More current Cython can easily be installed from pip
- Move to 0.26 (Ubuntu 18.04)
- ELC
- tests are currently being written (#3331)
- ELC bugfix for the potential difference will go in 4.1.2
- ELC does not work with non-neutral systems
- Prepare meeting with Christian, Florian, Alex
- MMM2D
- MMM2D cannot be used for validation of ELC due to code overlap
- MMM2D interferes with further refactoring or the cellsystem
- factor MMM2D out, except for the N-squared cellsystem
- Aims:
- Avoid issues discovered after the release during packaging
- Testing Fedora builds incl odd architectures on their infrastructure: Copr (#3312)
- Run these tests manually only for releases
- Shorten preparation time for bugfix releases
- Limit grammar and spell checks to minor releases, and to release notes for bugfix releases
- If we go for the Fedora build service Copr, remove QEMU emulated builds on odd architectures
- Avoid issues discovered after the release during packaging
- keep submodules mechanism for future external contributed features
- alternatives: subtrees, manual download
- sticking to surface of shapes, probability or breaking based on energy, force or distance, but not history
- bond creation between particles of specific types, in the future could use virtual sites as reactive sites
- currently bond creation and breaking is done via python every few time steps
- could use exception mechanism to pause the integration loop and handle bond status in python (using a runtime error queue)
- active site becomes inactive after collision, change type upon collision
- keep three-particle collision (creates angle bonds)
- make sure bond creation does not happen at the same time as bond breaking to avoid reforming the same bond
- particle types cannot change during the integration loop
- find a volunteer to implement it, Rudolf can help but not full time
- bors often pushes twice to the staging branch, auto-canceling the right pipeline half the time
- risk of timeout of not manually checked
- randomly pushes to the python branch, triggering a useless CI pipeline
- report issue
- lbmpy: will be open source once the paper is out, in the meantime can ship ES with generated code
- still thermalization issues, should be fixed soon
- disagreement between ES and waLBerla for MPI node assignment when 4+ threads are used
- release checklist, milestone
- thermostats and integrators checkpointing silently broke in 4.1.0 (#3245)
- constant pH tutorial: create a PR without unit conversion from #3184 for 4.1.1 and work on a unit conversion with pint for 4.2.0: Jonas
- removal of the old RE tutorial (#3211)
- fix NpT interface (#3253)
- fix broken build system (#3228)
- Matheval (#1644): a few use cases for which it can be better than tabulated interaction, could be used in virtual sites, will increase the maintenance effort of espresso if included, should be included as a library via a git subtree or submodule: Rudolf + maybe JN
- Stokesian dynamics (#3241): Michael + a HiWi (maybe Alex, Jan or a new one)
- Brownian dynamics (#1842): requires a quick refactor of Velocity Verlet
- waLBerla, Lees-Edwards (#2976): issue with thermalization
- new autopep8 version introduced in 4.1
- automatic sorting of import statements
- pylint (#3194)
- prevent the introduction of dangerous code (wildcard imports, function overloading, mutable optional value arguments)
- don't include rules for trivial style changes in CI
- developers should comment on the PR which rules should be included in CI
- shellcheck (#3242)
- look for replacing bash scripts by Python scripts
- "newstyle" classes in Python3 and simplified inheritance syntax (#3026):
class A(object):
def __init__(self):
pass
class B(object):
def __init__(self):
super(B, self).__init__()
becomes
class A:
def __init__(self):
pass
class B:
def __init__(self):
super().__init__()
- tutorials now in continuous delivery (#3024)
- writing tests: wiki:Testing/tutorials, ex: 04-LB-part-4
- deploy tutorial: wiki:Documentation/tutorials, ex: 04-LB-part-4
- tests (in Python) and deployment (in CMake) share the same syntax
- Vector3d-based operations on vectors, force/energy kernels refactor (target: 4.1, #3032, #3039)
- espresso developers shouldn't manually write vector cross products, dot products, hadamard products or scalar products anymore
- parts of the code already converted to the Vector3d syntax: shapes, constraints, force functions, energy functions, electrostatics, magnetostatics
- if any merge conflict occurs in PRs, @jngrad can help
- planned refactoring of bonded IA structures and force/energy kernel signatures (target: 4.1 if possible)
- see project Interaction Kernel Refactoring
- list of HIP-related issues created after the last meeting: #2973
- last one cleared a lot of tickets
- Rudolf will organize the next one
-
MMM2D and ELC (#2725)
- consider closing the PR and only keep the cherry-picked improvements in mmm2d.cpp (python...reinaual:singleCharge2D, maybe conflicts with #3022)
- ELC still requires a lot of maintenance (#2685, #3001, #3003)
- prepare meeting with Alex, Kai, Rudolf, JN
- MMM2D is slow and only used for reference, ELC is used for production. MMM2D currently requires direct access to the cells (because it uses those as layers), it is the only method that has an anisotropic cutoff and this only method that needs the layered cell system. The direct cell access blocks separating the interaction calculation from the cell system implementation; this is not maintainable on the long term. Since MMM2D can also be run as a pure pair interaction using the nsquare cell system (at the cost of worse performance) it can still be used as a reference method, without that direct access to the layered cell system.
- The MMM2D dielectric support also has very poor code quality (it was apparently added after the original implementation).
-
Lees-Edwards/LB CPU incompatibility for 4.1 #2976
- Issue with particle coupling scheme. For LE the ghost shifts have been removed. But LB couples also to ghost particles, which don't get forces from the coupling but contribute to the force density field of the LB. In doing this, it makes implicit assumptions about the ghost particles, which are an implementation detail of the cell system, including assumptions about the ghost shifts; these implicit assumptions are broken by removing the ghost shifts, which breaks the coupling. The same issue will probably be present in walberla.
- Possible solutions:
- fweik: only couple to the local particles of each node, and the reduce the halo regions of the LB force density across nodes. With this the LB coupling only needs the bounding box for the local particles as input. It can then choose a halo which extends its local grid volume so that it covers this bounding box, and otherwise work independent of the cell system. Code for this already exists in Espresso, because this is exactly how the charge density for P3M is collected. This code could be reused. I think this needs to be properly addressed before the ghost shifts can be removed. More generally speaking, the current implementation tries to be clever and avoid one communication by introducing additional coupling between otherwise independent components of the code, which hampers the extensibility of the code.
- Rudolf: remove LB CPU after 4.1, then add walberla and LE, then fix the particle coupling
- walberla has thermalization mostly fixed
-
Brownian Dynamics #1842
- delay this PR
- fweik: don't add anything to the integration/propagation before it has been refactored to state that one can reason about it.
-
ENGINE on CPU LB
- test relies on hardcoded data, which is different for LB GPU/CPU
- forces need extra tests
- check tutorial correctness 3 weeks before the school starts
- add feature to 4.1 release
- no LB support for now, maybe consider adding it after waLBerla integration
- add feature to 4.1 release
- issue with ghost communication of the velocity field for more than 1 node
- issue with EK boundaries
- volume changes causes interactions with other features
- write tests for more complex systems
- check anisotropic boxes with fixed dimensions
- check terminology in the docs
- discuss in #2939
- #2894, update Milestone 4.1 and Project 4.1
- release candidate or beta release 1 month before summer school (Oct 7-11 2019)
- fix in #2937
- memory management on the GPU is the main issue
- HIP status:
- three compilers: hcc, HIP and a new Clang feature
- HIP support made ES more CUDA-compliant during HIP integration
- HIP support of CUDA code lags behind CUDA releases
- HIP adds a new layer of complexity
- track logfiles of failed HIP jobs in a dedicated issue
- postpone removal for 2 ES meetings
- kaniko
- docker-in-docker is not secure, anyone opening a PR can run malicious code as root
- kaniko is simpler than docker-in-docker, secure, runs everywhere, but caching is broken
- staging branch for docker CI? already done by the deploy stage
- intel compiler
- requires a license server
- espressomd/espresso: only activate it on release branches
- QEMU
- created for the Fedora packages
- adds a layer of complexity, triggers too many random failures
- for the slow emulated containers:
- espressomd/espresso: only activate them on release branches
- espressomd/docker: build becomes manual @RudolfWeeber: I understood the outcome of the discussion as follows: Both, Intel and emulated containers are only built manually and the CI jobs in Espressomd are triggerd manually, before the release is made.
- convert remaining Python2 containers to Python3 -> JN for Linux, Michael for MacOS
- update install files
- setting minimum version in CMake
- check requirements
- communication to users
- updates:
- autopep from Ubuntu 18
- jupyter available in Ubuntu 18?
- GitLab, Runners, connection between GitHub and GitLab
- CUDA update on bee
- will take a few days to install espresso again
- some technical issues now resolved (timing of OS upgrade and nightly build, configuration issues in CI with GPU-tagged jobs, etc.)
- timeouts still happen
- BW cloud works out-of-the-box
- assertion for Doxygen version in Doxygen warnings parser
- enabled in 9.5, treat them as errors
- 2019: reach larger audience
- 2020: consider extending the program with all-atom simulation, WaLBerla? CECAM deadline: July 16th 2019 -> JN + Rudolf
- Rudolf: failure rate too high to rely on
- Michael: failure rate of 5% of jobs is usual in many open-source projects, equals one failed job in each of our builds build
- Rudolf: too many Mac runners, too few Linux runners. Michael: slots are used interchangably, Gitlab statistics do not reflect that. Florian: sometimes only 6 Linux jobs running. Michael: let me know next time it happens, may be config issue.
- Frank: 8 builders with 4 cores, 1 with 6 cores => 19 slots available
- Florian: slowest job is sanitizer
- Rudolf: build takes about half of the total job time
- categories of issues:
- timeouts: Rudolf seems to have fixed this by preventing GPU oversubscription
- Gitlab/Docker bugs - we can't fix, but don't happen that often (#2742)
- nightly builds vs. automatic updates - fixed already by moving the nightly builds to midnight while the automatic updates are at 05:00
- Gitlab updates and registry maintenance: Michael can move that to a better time at night so it doesn't collide with the nightly build
- s390x emulation: library linkage issue, can hopefully be fixed by Michael (#2766)
- Jean-Noel: some tests get stuck and time out. Michael: attach a debugger next time and check whetere it is stuck.
- dead build machines, currently restarted manually by Frank, maybe he can set up monitoring
- hardware:
- can't add more tower PCs because we don't have enough space
- some money is available, but also needed for new storage system
- buy AMD desktops instead of servers, they are relatively cheap per core
- cloud?
- expensive too
- Amazon Spot instances $0.003 per core hour, perfect for burst usage
- GPUs disproportionately expensive
- is it worth it if we still need to run some on-premises runners?
- buy two more AMD Vega 56, one for debugging on a desktop and one for a runner
- drop Python 2 on the master branch, three fewer CI jobs
- append "-python3" to all container names so they don't conflict with the containers used by the 4.0 branch
- bors bot:
- merge queue: combine multiple PRs
- checks the merge on the
staging
branch before merging onpython
- bors always gets the PR up-to-date before merging
- maintainer: trigger the merge by posting the comment
bors r+
- branch protection: can't merge until CI passes
- PR must be approved by 1 person
- consider applying formatting automatically during the merge
- GPU LB boundaries: fixing the code is time consuming
- keep GPU LB in 4.1 release
- WaLBerla: thermalization is still missing
- Lees-Edwards progress
- LB boundary force
- results very sensitive to input parameters #2624
- disable that feature in 4.0.2
- check LB tutorial
- LB stress #2054
- 3pt coupling disabled for now
- document the implementation of commonly used features directly in Doxygen:
- observables
- shapes
- interactions
- parallel algorithms accessing particles
- document in Sphinx where to find these classes in the core
- multiple sources of failures:
- frequent timeouts
- random errors (failed code coverage files upload, etc.)
- can be hardware-dependent
- down runners
- hard to reproduce some issues
- form a group to improve CI reliability (Frank, Jean-Noël, Rudolf, Kai)
- master branch is often failing CI after a merge
- master has more thorough tests
- plans:
- notify when master fails
- use a staging branch to merge PRs, then merge into master if thorough CI passes
- Jean-Noël should look into it
- configure GitHub accordingly
- branch protection on master
- look into GitLab Enterprise to mirror the GitHub repository
- 4.0.2rc #2585
- MPI PR waiting for review #2593
- GPU LB checkpoint now works #2511
- philox RNG on thermalized bonds, dpd, etc.
- struct Particle refactor #2296: wait for ghost and communication refactors (#2400, #2394, #2478)
- Coulomb and dipole refactoring waiting for feedback #2512
- LB stress
- waLBerla progress
- thermalization
- don't release before summer school
- consider reproducing literature results
- electrostatics tutorial: currently, only salt crystal
- find relevant literature results to implement: Rudolf + Christian
- review and merge tutorials CI #2452
- next week (see Doodle)
- back-communication of ghost forces: still need to do thermalized bonds and DPD
- angle forces: re-derive formula
- polymer placement code: in progress
- soft-sphere: done
- LB checkpoint: #2555
- waLBerla integration: in progress, todo list
- containers migration to Python3: not done yet
- Espresso 4.0.2 bugfix release:
- time-step change causes velocity change
- wrong sign in tabulated potentials
- milestone
- mailing list: communicate on platforms removed from testing
- consider binary releases (Flatpak)
- consider moving the developer's guide from Sphinx to the GitHub wiki (Development)
coverxygen:
- https://github.com/psycofdj/coverxygen
- extended version available at /work/jgrad/coverxygen
- the plain text summary now reports:
- tallies for function parameters and template parameters
- coverage diff between two commits
- the website now reports:
- a new column for undocumented functions (as a subset of the undocumented lines column)
- lists of undocumented functions in dedicated pages
- a detailed explanation of what is missing (when hovering the mouse over a red line, a tooltip appears):
- missing
@file
block - missing Doxygen block for variables and enum/union/struct/class members
- for functions, provides a list of missing
@param
and@tparam
blocks
- missing
- undocumented enum values are no longer undocumenting the entire enum
- the plain text summary now reports:
-
https://github.com/AnthonyCalandra/modern-cpp-features/blob/master/CPP14.md
- Generic lambda expressions
- Lambda capture initializers
- Return type deduction
- notify when the master branch doesn't pass CI
- consider using a staging branch in the future
- link (poster, short talk: deadline Feb 28th)
- poster (JN):
- focus more on project structure & community than on applications
- history: from Tcl to Python, from LB CPU/GPU to Walberla
- attendees: JN, Rudolf
- communicate next ES meeting date on the users mailing lists (Rudolf)
32-bit specific issue with floating point arithmetic during particle resorting, fixed in #2454, porting in #2456
- C++14 is required for new parts of the core
- use up-to-date gcc version
- remove:
- Ubuntu 14 (LTS ends in April 2019, extended security maintenance after that),
- CentOS 7 (unless an up-to-date compiler is available)
- Intel 15 (maybe)
- check which Boost version is installed
- make tutorial for running ES in a Docker container (but needs root), or use FlatPak
- Python2 won't get new updates after Jan 2020
- Python2 won't be part of standard distributions by default (Ubuntu 20)
- Need to communicate that change to the user base
- Invert CI:
- use Python3 on all distributions
- use only one container with Python2 (maxset)
- Start using Jupyter instead of IPython
- Start using Python3 syntax in 4.1 development
- Features not used and not documented, need to be removed:
- GHMC (interacts negatively with the core)
- NEMD
- MEMD
- Some may be implemented in Scafacos
- CPU improvements ready for merge, but 30% slow-down on 1 node, need to measure if this slow-down is significant vs. communication slow-down in multi-node simulations
- GPU improvements are in progress
- list of particles, indexed by id:
- previously implemented as an array, needed to be re-updated every time a particle moved from cell to cell
- now implemented as an unordered map, but slower
- #2394
- simplify reallocation and communication
- improve CPU/GPU work balance
- #2452
- add tests for numerical results of tutorials
- create a new CI container testing just the tutorials
- removing code duplicates
- do maintenance on Espresso
- poll to decide on the date