Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start of transfer of functionality from hack-the-tree to maser #933

Merged
merged 43 commits into from
Jan 25, 2024

Conversation

gtribello
Copy link
Member

Description

As discussed a couple of weeks ago I am starting the process of transferring functionality from the hack-the-tree branch into the master branch of PLUMED. This commit includes code that generalises the mechanisms for passing from/to the calling code to/from PLUMED. I have written some notes explaining the rationale for these changes here:

https://plumed-school.github.io/lessons/23/001/data/MDInterfaceI.html

As I continue this process of trying to merge hack-the-tree into master I will write further posts here:

https://plumed-school.github.io/lessons/23/001/data/NAVIGATION.html

Currently, the new mechanism is used to pass:

  • ExtraCVs from the calling code to PLUMED
  • data from PLUMED to the calling code (i.e. replacing DataFetching object)
  • the viral
  • kBT and the timestep

I plan to see what you folks see of this stuff and then work on trying to also pass the positions in the way this is done in hack-the-tree.

Target release

It would be nice if this could appear in v2.10

Type of contribution
  • changes to code or doc authored by PLUMED developers, or additions of code in the core or within the default modules
  • changes to a module not authored by you
  • new module contribution or edit of a module authored by you
Copyright
  • I agree to transfer the copyright of the code I have written to the PLUMED developers or to the author of the code I am modifying.
  • the module I added or modified contains a COPYRIGHT file with the correct license information. Code should be released under an open source license. I also used the command cd src && ./header.sh mymodulename in order to make sure the headers of the module are correct.
Tests
  • I added a new regtest or modified an existing regtest to validate my changes.
  • I verified that all regtests are passed successfully on GitHub Actions.

Gareth Aneurin Tribello added 9 commits April 30, 2023 19:37
In hack-the-tree you can pass vectors/matrices and grids through Values.
This necessitates some changes to the Value class.  In particular inputForce
becomes a vector and similarly the double value becomes a vector called data.  I avoid
having a confusing vector called derivatives (which is not always the derivatives).
If I have a scalar I store the value and the derivatives in the data vector.  data
thus has a length of 1 + number of derivatives.  The first element is the value of the scalar.

Notice that I also have functionality to set the value to be a constant that is calculated once and
only once during the calculation
I added a few lines into the script here to test the code that allows you
to get values from PLUMED and pass them back to the calling program. This
means you can test this functionality even when you don't have python installed
…-tree

This functionality introduces two new actions into plumed:

PUT = an ActionWithValue that transfers some data from a pointer that is passed from the MD code to a PLMD::Value.
GET = an ActionWithArgument that transfers data from a PLMD::Value to a pointer than is passed from the MD code

Notice that you can also pass a pointer to the forces on a object that is passed to PUT.  The forces from value
are then added to the MD code's force pointer when you call the apply method.

To demonstrate how this functionality works I have replaced all the stuff for passing extraCVs to PLUMED with PUT
commands

I have amm also using GET in place of the old DataFetchingObject class, which has now been deleted
This commit modifies the code so that cell vectors are passed through a
PUT action.  The virial is then passed as the forces that any forces on the
input value are added on.  To deal with the fact that the virial and cell should only
be passed from one of the MD codes processes there is a special new class PbcAction
that deals with passing the cell vectors.  Methods from this class are also invoked
when the box parameters are sought elswhere in the code.

Notice that I have also added the DomainDecomposition class from hack-the-tree here.
Much of the functionality in this class is not yet used.  It also needs to be modified
before it can be used here.  I felt as we merge hack-the-tree it is easier to start out
with a code that resembles the one in hack-the-tree.
The timestep is now passed to PLUMED using a PUT action.  The timestep is
thus stored in a value called timestep that can be accessed using the label
timestep.  Notice that changes to the object that collects the data from the MD
code were required here because the timestep is passed directly (and not via a
pointer).  To resolve this there is a variable in DataPassingObject that can hold
a double that is passed from the MD code.
kBT is now passed from the MD code into a PLMD::Value by passing the value
to a PUT action.  Whenever an action needs kBT you can call a method in action
called getkBT.  This reads in the keyword TEMP (if it is registered).  If TEMP
is present the value read for this flag is multiplied by Boltzmann's constant.  If
TEMP is not present then the kBT value that was passed from the MD code to PLUMED is
returned.
This should resolve the problem that is cauased when you call
setNatoms multiple times and it tries to create multiple
DOMAIN_DECOMPOSITION objects
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@codecov-commenter
Copy link

Codecov Report

Patch coverage: 70.83% and project coverage change: -0.28 ⚠️

Comparison is base (97cf096) 85.81% compared to head (7beb7b5) 85.54%.

❗ Current head 7beb7b5 differs from pull request most recent head 8a7fd75. Consider uploading reports for the commit 8a7fd75 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #933      +/-   ##
==========================================
- Coverage   85.81%   85.54%   -0.28%     
==========================================
  Files         600      610      +10     
  Lines       54717    55210     +493     
==========================================
+ Hits        46958    47229     +271     
- Misses       7759     7981     +222     
Impacted Files Coverage Δ
src/cltools/Driver.cpp 90.06% <0.00%> (ø)
src/colvar/EEFSolv.cpp 94.53% <0.00%> (ø)
src/core/Action.h 94.28% <ø> (ø)
src/core/ActionAtomistic.h 100.00% <ø> (ø)
src/core/ActionWithArguments.h 100.00% <ø> (ø)
src/core/Atoms.cpp 87.00% <ø> (-7.21%) ⬇️
src/core/PlumedMain.h 87.50% <ø> (ø)
src/isdb/Rescale.cpp 11.73% <0.00%> (+0.12%) ⬆️
src/sasa/sasa_HASEL.cpp 81.50% <0.00%> (ø)
src/sasa/sasa_LCPO.cpp 82.25% <0.00%> (ø)
... and 55 more

... and 26 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@GiovanniBussi GiovanniBussi added the wip Do not merge label May 10, 2023
@gtribello
Copy link
Member Author

@GiovanniBussi I had another look at what I did here. I had to make changes to some of the cltools as you have to call setNoVirial before you call setNatoms. I checked though and I have also put some tests in to ensure that the code crashes if these two calls are in the wrong order. I think in regards to what we discussed in the meeting yesterday about the future of the interfaces we are thus fine.

@GiovanniBussi
Copy link
Member

I guess setNoVirial is "rare" (I hope). Is the order of "setVirial" also fixed? Is there an easy to remember logic for this (e.g. box and virial both go before positions and forces)?

@gtribello
Copy link
Member Author

@GiovanniBussi setVirial has to go after setNatoms but that is always going to happen as setVirial is not called during initialisation whereas setNatoms is called during initialisation.

@GiovanniBussi
Copy link
Member

This makes totally sense

The atomic positions, masses and charges are now passed from the MD code into PLMD::Values.
Five PLMD::Value objects are created to hold these quantities.  These PLMD::Value objects are
all vectors with natoms elements.  The ActionWithValue that owns each of thse PLMD::Values is
a PUT object.  These PUT objects are created by a new action called DOMAIN_DECOMPOSITION.  This
action is created when you call cmd("setNatoms").  However, this action could be created from the
MD code directly by doing cmd("readInputLine mdcode: DOMAIN_DECOMPOSITION").  This would then
give the flexibility to pass quantities other than the the positions, masses and charges to PLUMED.

N.B. The passing of energy between the MD code and PLUMED is broken by this commit.  I will fix this
on the next commit.
@gtribello
Copy link
Member Author

@GiovanniBussi

I have a question about the Atoms class and the behaviour of ExtraCV. When positions, the cell and the energy are passed into PLUMED they are treated as if they are const. In other words, it is assumed that PLUMED will not change the data in the objects there are being passed to PLUMED.

Meanwhile, forces and the viral are treated as non-const as PLUMED does have to change these objects.

The one exception is ExtraCV, where the value of the CV that is passed to PLUMED is stored as a non-const object. This suggests there is some case where PLUMED changes the value of the ExtraCV that is passed into the code. Is this correct? What is the reason? And what would be a sensible way of testing this functionality as I don't think there is anything in the current tests where the value of the CV that is passed from the MD code to PLUMED is changed by PLUMED?

@GiovanniBussi
Copy link
Member

GiovanniBussi commented May 14, 2023 via email

@GiovanniBussi
Copy link
Member

GiovanniBussi commented May 14, 2023 via email

Gareth Aneurin Tribello and others added 3 commits May 15, 2023 08:22
The energy is now passed through a specialised PUT action.  When apply is called
for this PUT action the rescaleForces method of the position and forces PUT actions
are called.  This allows one to adjust the forces on the atoms in the correct way due
to a force that acts on the value of the energy
If, when interpretting an atom list, ActionAtomistic encounters a group it now calls a method
of the Group action with that label to get the relevant list of atoms.  This means you no longer
need to store all the groups in a map in the Atoms class.
@gtribello
Copy link
Member Author

@GiovanniBussi

I'm still not totally clear but I don't think my question was very clear. I'll give it another go.

So when we pass the energy to PLUMED atoms does this:

void Atoms::setEnergy(const TypesafePtr & p) {
  plumed_massert( dataCanBeSet,"setEnergy must be called after setStep in MD code interface");
  MD2double(p,md_energy);
  md_energy*=MDUnits.getEnergy()/units.getEnergy();
  energyHasBeenSet=true;
} 

Energy is thus saved in a variable that is local to PLUMED. The MD code's energy cannot be modified as we don't even save a pointer to the variable in the MD code that stores the energy.

When we pass an ExtraCV, PLUMED does this:

std::map<std::string,TypesafePtr> extraCV;

void Atoms::setExtraCV(const std::string &name,const TypesafePtr & p) {
  mdatoms->setExtraCV(name,p);
} 

void setExtraCV(const std::string &name,const TypesafePtr & p) override {
    p.get<T>(); // just check type and discard pointer
    extraCV[name]=p.copy();
  }

It strikes me that, in theory, we could do this instead:

std::map<std::string,double> extraCV;

void Atoms::setExtraCV(const std::string &name,const TypesafePtr & p) {
   double extracv; MD2double(p,extracv); 
   extraCV[name] = extracv; 
} 

This would be more like what was done for energy. Furthermore, as with energy, if done this way you can pass the value of the extraCV directly instead of the pointer.

A reason for not doing it this way is if you want PLUMED to be able to change the value of the extracv in the MD code but you don't want to do that so... why the difference from the way energy is set?

@GiovanniBussi
Copy link
Member

GiovanniBussi commented May 15, 2023 via email

@GiovanniBussi
Copy link
Member

GiovanniBussi commented May 15, 2023 via email

@gtribello
Copy link
Member Author

That is all fine. I have implemented ExtraCV so that it is consistent with the way it used to be done. I just was confused as to why the mechanism for passing ExtraCV was different from the energy as it is just a scalar. I found a way to reproduce both behaviours though that I think also makes it clear that the two things are done differently.

Gareth Aneurin Tribello and others added 3 commits May 15, 2023 22:45
To get Boltzmann's constant, the units or a bool telling you if PLUMED is using
natural units inside an action you used to have to include a call something like the following

plumed.getAtoms().getUnits()

As this is used in a lot of places I put a method in Action so you can now get the units using:

getUnits()

instead.  I have made similar changes from:

plumed.getAtoms().getKBoltzman()

to

getKBoltzmann()

and

plumed.getAtos().usingNaturalUnits()

to

usingNaturalUnits()
…tion and Colvar to ActionWithValue

The way that you get the forces from the Values is the same in Function and in Colvar.  I thus created a
method called getForcesFromValues in ActionWithValue and used this method in Function and Colvar.  I also
rewrote the interface for the methods setForcesOnAtoms and addForcesToArguments.  You now passed a reference
to an unsigned which keeps track of where you are in the forcesForApply array that is passed to this functions.
ActionWithVirtualAtom now creates 5 values: the x, y and z position of the atom
and the mass and charge of the atom.  This replaces the mechanism whereby additional
atoms were created in the Atoms class.  ActionAtomistic creates vectors containing
pointers to the PLMD::Values that are created by the PUT actions that pass positions from
the MD code and the Virtual Atom actions.  If virtual atoms are needed a dependency on the
virtual atoms action is created in requestAtoms.
@gtribello
Copy link
Member Author

Happy new year all!

I had another look at this over the last couple of days. I wrote a CLTool that generates a trajectory and then calls plumed so that you can run plumed without a load of IO. I then made a few small changes to optimise some things with apply.

The timings that I get currently are as indicated in this graph.

timings

All these tests are run in serial. Tests 1-5 are Giovanni's one. 6-9 are ones that involve apply that were created using Carlo's modifications to Giovanni's tests. I am pretty sure the code is doing better on the apply side than when Carlo was trying it, which is progress.

I am not sure if the changes that speed up CENTER were moved to master yet so not totally confident about that graph. @GiovanniBussi can you clear up what the status with transferring those optimisations into master?

The next step I am going to take is to look into WHOLEMOLECULES and to ensure that is not slowing things down as well. Beyond that I am not sure what to do. I am not sure how to squeeze any more performance out of the functions for apply in the htt branch and based on the graph above that is what is slowing down the new branch.

@gtribello
Copy link
Member Author

Hello all

I had a bit of a brainwave. Here are the new timings for the code.

timings

Tests 10 and 11 are WholeMolecules and WrapAround.

@gtribello
Copy link
Member Author

Hello @GiovanniBussi and @Iximiel

I have fixed the regtest/basic/rt16 in the master-center-optimised branch as we discussed in the meeting. I guess you can now merge master-center-optimised into master and do the comparison between master and this branch to discuss the next steps for this branch.

At some stage in the next few days I will try to resolve the issues with the checks that are appearing here. I haven't worried about that too much while we have been trying to make the code faster but now that we are close to merging it would be good to not have these tests failing.

Here are the set of plumed input files that I was using for benchmarking.

# test 1, single action, rarely using all atoms
c0: CENTER NOPBC ATOMS=1-1000
c1: CENTER NOPBC ATOMS=1-100000
DUMPATOMS ATOMS=c0 FILE=testout
DUMPATOMS ATOMS=c1 FILE=testout1 STRIDE=20

# test 2, multiple action, rarely using all atoms
c0: CENTER NOPBC ATOMS=1-1000
c1: CENTER NOPBC ATOMS=1000-1:-1
c2: CENTER NOPBC ATOMS=10000-11000
c3: CENTER NOPBC ATOMS=10
c4: CENTER NOPBC ATOMS=10
c5: CENTER NOPBC ATOMS=1-100000
DUMPATOMS ATOMS=c0,c1,c2,c3,c4 FILE=testout
DUMPATOMS ATOMS=c5 FILE=testout1 STRIDE=20

# test 3, use all atoms once 
c0: CENTER NOPBC ATOMS=1-100000
DUMPATOMS ATOMS=c0 FILE=testout

# test 4, use all atoms multiple times
c0: CENTER NOPBC ATOMS=1-100000
c1: CENTER NOPBC ATOMS=1-100000
c2: CENTER NOPBC ATOMS=1-100000
c3: CENTER NOPBC ATOMS=1-100000
c4: CENTER NOPBC ATOMS=1-100000
DUMPATOMS ATOMS=c0,c1,c2,c3,c4 FILE=testout

# test5, mixed strides
c0: CENTER NOPBC ATOMS=1-100000
c1: CENTER NOPBC ATOMS=1-1000
c2: CENTER NOPBC ATOMS=10000-11000
c3: CENTER NOPBC ATOMS=20000-21000
c4: CENTER NOPBC ATOMS=30000-31000
DUMPATOMS ATOMS=c1 FILE=testout1 STRIDE=1
DUMPATOMS ATOMS=c2 FILE=testout2 STRIDE=2
DUMPATOMS ATOMS=c3 FILE=testout3 STRIDE=4
DUMPATOMS ATOMS=c4 FILE=testout4 STRIDE=5
DUMPATOMS ATOMS=c0 FILE=testout5 STRIDE=20

# test 6, apply with one distance
c0: CENTER ATOMS=1-1000 NOPBC
c1: CENTER ATOMS=1-100000 NOPBC
d: DISTANCE ATOMS=c0,c1
BIASVALUE ARG=d

# test 7, apply with two distances
c0: CENTER ATOMS=1-1000 NOPBC
c1: CENTER ATOMS=1000-1:-1 NOPBC
c2: CENTER ATOMS=10000-11000 NOPBC
c3: CENTER ATOMS=10 NOPBC
c4: CENTER ATOMS=10 NOPBC
c5: CENTER ATOMS=1-100000 NOPBC
d: DISTANCE ATOMS=c0,c2
d2: DISTANCE ATOMS=c1,c3
BIASVALUE ARG=d,d2

# test 8, apply with 3 distances
c0: CENTER ATOMS=1-100000 NOPBC
c1: CENTER ATOMS=1-100000 NOPBC
c2: CENTER ATOMS=1-100000 NOPBC
c3: CENTER ATOMS=1-100000 NOPBC
c4: CENTER ATOMS=1-100000 NOPBC
d: DISTANCE ATOMS=c0,c1
d2: DISTANCE ATOMS=c2,c3
d3: DISTANCE ATOMS=c3,c4
BIASVALUE ARG=d,d2,d3

# test 9, apply with four distances
c0: CENTER ATOMS=1-100000 NOPBC
c1: CENTER ATOMS=1-1000 NOPBC
c2: CENTER ATOMS=10000-11000 NOPBC
c3: CENTER ATOMS=20000-21000 NOPBC
c4: CENTER ATOMS=30000-31000 NOPBC
d: DISTANCE ATOMS=c0,c1
d2: DISTANCE ATOMS=c1,c2
d3: DISTANCE ATOMS=c2,c3
d4: DISTANCE ATOMS=c3,c4
BIASVALUE ARG=d,d2,d3,d4

# test 10, wholemolecules
WHOLEMOLECULES ENTITY0=1-10000

# test 11, wraparound
f: FIXEDATOM AT=5,5,5
WRAPAROUND ATOMS=1-10000 AROUND=f

@Iximiel
Copy link
Member

Iximiel commented Jan 11, 2024

I did some performance tests with this, that is my take on a Giovanni's gist.

I wanted to use perf to generate the flamegraphs, so I used CXXFLAGS="-O3 -gdwarf -fno-omit-frame-pointer" for compiling the C++ files.

I only confronted using a similar configuration of your "#test 1", with centers calculated on 1000 and 100000 atoms CONTIGUOS in memory (it may be interesting to see how a CENTER NOPBC ATOMS=1-100000:100 may behave.

I confronted three runs:

  • The current master with the htt's center copied (and ActionAtomistic.h patched with
  public:
  bool                  chargesWereSet;
  private:
  • The commit 2b80592 of this PR
  • An older, slightly modified commit from this PR, updated to the 19/12/2024

If I confront the commit 2b80592 to the december one: there is more or less an halving in time of the backward loop 👍, that now is only 10% slower than the one in master. But if you observe better, is actually faster for some of the heavier actions, look at c1 and c0.

However center runs slower on both the htt versions, and by looking at the flamegraphs my interpretation is that now the compiler (at least gcc 9.4.0 on my workstation) finds more difficult to understand what we actually want to do with the code: because on the master some avx instructions pop out during center::calculate. I did not studied this further, because the major slowdown in December was in the backward loop, I'll look at this in these days.

Data

Master (8d8e526)

                                              Cycles        Total      Average      Minimum      Maximum
                                                   1    72.063395    72.063395    72.063395    72.063395
1 Prepare dependencies                         20000     0.013781     0.000001     0.000000     0.000006
2 Sharing data                                 20000     4.965375     0.000248     0.000211     0.000675
3 Waiting for data                             20000     0.002900     0.000000     0.000000     0.000004
4 Calculating (forward loop)                   20000    45.953675     0.002298     0.002090     0.004807
4A 0 @0                                        20000     0.007282     0.000000     0.000000     0.000005
4A 1 c0                                        20000     0.292821     0.000015     0.000012     0.000034
4A 2 c1                                        20000    45.318497     0.002266     0.002060     0.004749
4A 3 all                                       20000     0.038748     0.000002     0.000001     0.000009
4A 4 pos                                       20000     0.100665     0.000005     0.000003     0.000014
4A 5 @5                                        20000     0.035607     0.000002     0.000001     0.000012
5 Applying (backward loop)                     20000    20.940866     0.001047     0.000952     0.001605
5A 0 @5                                        20000     0.010735     0.000001     0.000000     0.000006
5A 1 pos                                       20000     0.043516     0.000002     0.000001     0.000010
5A 2 all                                       20000     0.001754     0.000000     0.000000     0.000006
5A 3 c1                                        20000    16.747918     0.000837     0.000730     0.001258
5A 4 c0                                        20000     0.154233     0.000008     0.000006     0.000019
5A 5 @0                                        20000     0.009469     0.000000     0.000000     0.000005
5B Update forces                               20000     3.883760     0.000194     0.000175     0.000349
6 Update                                       20000     0.018630     0.000001     0.000001     0.000031

masterplumed-11-01-24

HTT (2b80592)

                                              Cycles        Total      Average      Minimum      Maximum
                                                   1    90.326783    90.326783    90.326783    90.326783
1 Prepare dependencies                         20000     0.025413     0.000001     0.000001     0.000010
2 Sharing data                                 20000     6.851354     0.000343     0.000219     0.000767
3 Waiting for data                             20000     0.020197     0.000001     0.000001     0.000009
4 Calculating (forward loop)                   20000    61.171791     0.003059     0.002785     0.005315
4A  0 posx                                     20000     0.651638     0.000033     0.000001     0.000071
4A  1 posy                                     20000     0.701234     0.000035     0.000000     0.000083
4A  2 posz                                     20000     0.692024     0.000035     0.000000     0.000077
4A  3 Masses                                   20000     0.005104     0.000000     0.000000     0.000004
4A  4 Charges                                  20000     0.004506     0.000000     0.000000     0.000005
4A  5 Box                                      20000     0.008633     0.000000     0.000000     0.000006
4A  6 mdcode                                   20000     0.009571     0.000000     0.000000     0.000006
4A  7 @0                                       20000     0.009010     0.000000     0.000000     0.000007
4A  8 c0                                       20000     0.434830     0.000022     0.000016     0.000047
4A  9 c1                                       20000    58.293987     0.002915     0.002686     0.005037
4A 10 all                                      20000     0.082224     0.000004     0.000002     0.000024
4A 11 pos                                      20000     0.049190     0.000002     0.000002     0.000011
4A 12 @5                                       20000     0.036909     0.000002     0.000001     0.000013
5 Applying (backward loop)                     20000    22.040723     0.001102     0.000831     0.002014
5A  0 @5                                       20000     0.006944     0.000000     0.000000     0.000005
5A  1 pos                                      20000     0.041318     0.000002     0.000001     0.000014
5A  2 all                                      20000     0.007441     0.000000     0.000000     0.000007
5A  3 c1                                       20000    12.463555     0.000623     0.000490     0.001337
5A  4 c0                                       20000     0.118684     0.000006     0.000004     0.000020
5A  5 @0                                       20000     0.004383     0.000000     0.000000     0.000006
5A  6 mdcode                                   20000     9.189841     0.000459     0.000306     0.000929
5A  7 Box                                      20000     0.017287     0.000001     0.000000     0.000010
5A  8 Charges                                  20000     0.001681     0.000000     0.000000     0.000004
5A  9 Masses                                   20000     0.000504     0.000000     0.000000     0.000000
5A 10 posz                                     20000     0.001221     0.000000     0.000000     0.000005
5A 11 posy                                     20000     0.001072     0.000000     0.000000     0.000006
5A 12 posx                                     20000     0.001865     0.000000     0.000000     0.000005
5B Update forces                               20000     0.000658     0.000000     0.000000     0.000001
6 Update                                       20000     0.025604     0.000001     0.000001     0.000023

httplumed-24git

HTT (December)

                                              Cycles        Total      Average      Minimum      Maximum
                                                   1   112.757700   112.757700   112.757700   112.757700
1 Prepare dependencies                         20000     0.027335     0.000001     0.000001     0.000014
2 Sharing data                                 20000     5.937261     0.000297     0.000240     0.000670
3 Waiting for data                             20000     0.024902     0.000001     0.000001     0.000010
4 Calculating (forward loop)                   20000    58.192825     0.002910     0.002732     0.005604
41Calculating setAtomsDerivatives              60000    13.553777     0.000226     0.000000     0.001529
4A  0 posx                                     20000     0.495073     0.000025     0.000001     0.000096
4A  1 posy                                     20000     0.447953     0.000022     0.000000     0.000099
4A  2 posz                                     20000     0.473833     0.000024     0.000000     0.000103
4A  3 Masses                                   20000     0.005126     0.000000     0.000000     0.000008
4A  4 Charges                                  20000     0.004537     0.000000     0.000000     0.000010
4A  5 Box                                      20000     0.009452     0.000000     0.000000     0.000009
4A  6 mdcode                                   20000     0.007897     0.000000     0.000000     0.000006
4A  7 @0                                       20000     0.010699     0.000001     0.000000     0.000009
4A  8 c0                                       20000     0.430716     0.000022     0.000018     0.000071
4A  9 c1                                       20000    55.857497     0.002793     0.002619     0.005342
4A 10 all                                      20000     0.116907     0.000006     0.000004     0.000023
4A 11 pos                                      20000     0.050194     0.000003     0.000002     0.000015
4A 12 @5                                       20000     0.037685     0.000002     0.000001     0.000014
5 Applying (backward loop)                     20000    48.305212     0.002415     0.002245     0.004507
5A  0 @5                                       20000     0.006999     0.000000     0.000000     0.000007
5A  1 pos                                      20000     0.040619     0.000002     0.000001     0.000015
5A  2 all                                      20000     0.026088     0.000001     0.000001     0.000014
5A  3 c1                                       20000    38.294177     0.001915     0.001762     0.003873
5A  4 c0                                       20000     0.551851     0.000028     0.000024     0.000056
5A  5 @0                                       20000     0.010319     0.000001     0.000000     0.000008
5A  6 mdcode                                   20000     9.082718     0.000454     0.000377     0.001087
5A  7 Box                                      20000     0.016295     0.000001     0.000000     0.000007
5A  8 Charges                                  20000     0.002658     0.000000     0.000000     0.000004
5A  9 Masses                                   20000     0.000544     0.000000     0.000000     0.000004
5A 10 posz                                     20000     0.002834     0.000000     0.000000     0.000005
5A 11 posy                                     20000     0.000654     0.000000     0.000000     0.000001
5A 12 posx                                     20000     0.002234     0.000000     0.000000     0.000004
5B Update forces                               20000     0.000483     0.000000     0.000000     0.000004
6 Update                                       20000     0.030970     0.000002     0.000001     0.000024

httplumed19_12NoPbc-stopwatch

How to read a flamegraph

The more height the more depth is in that stack call, the more is the rectangle larger, the more time has passed in the stack, function are in alphabetical order, largeness of rectangle=%time passed in the stack and the colors distiguish the functions, the palette is set up to use the same color for the same named function in different graphs.
In particular i used a different palette for the new functions in htt, so you can see immediately which functions are not present in the master branch

ato=matmul(rotation,ato);
unsigned nat = getTotAtoms();
for(unsigned i=0; i<nat; i++) {
std::pair<std::size_t,std::size_t> a = getValueIndices( AtomNumber::index(i));

Check notice

Code scanning / CodeQL

Declaration hides variable Note

Variable a hides another variable of the same name (on
line 142
).
@gtribello
Copy link
Member Author

@GiovanniBussi do you want to push the button on this or are you happy for me to do it?

@GiovanniBussi
Copy link
Member

Please go ahead! 😀

@gtribello gtribello merged commit 9e2d115 into master Jan 25, 2024
32 checks passed
@gtribello gtribello deleted the htt-new-interface-only branch January 25, 2024 07:59
@carlocamilloni
Copy link
Member

@gtribello @GiovanniBussi nice this is merged!

I have noticed that now gromacs tells
PLUMED: KbT has not been set by the MD engine
PLUMED: It should be set by hand where needed

but this is not true because METAD knows the value of KbT, so there is a small bug somewhere

Furthermore,
I have just re-run the aladp test and the branch is improved with respect to before, the new performances over 1,000,000 steps of ALADP metad are
9,430 ns/day vs 9,300 ns/day (previous test) vs 10,700 (master before merge)
The gain is all in the applying phase that is now even faster than the master before merge, calculate is still slower.
I think it may be worth to speed up the parsing of the keywords as we discussed at some point.

@GiovanniBussi
Copy link
Member

@carlocamilloni can you provide a screenshot showing that parsing keywords takes significant time? I tried to see that with Instruments but without success...

@gtribello
Copy link
Member Author

@carlocamilloni thanks for checking this.

I had a look at the issue with the dumping of KbT. There was a typo in the place where it was output to the log. It should be fixed now.

@carlocamilloni
Copy link
Member

@carlocamilloni can you provide a screenshot showing that parsing keywords takes significant time? I tried to see that with Instruments but without success...

Sure, I will try to repeat the analysis, I also noticed that instruments does not always manage to show all functions in a relative unpredictable way

@carlocamilloni
Copy link
Member

@GiovanniBussi @gtribello this is the report from instruments on the aladp run:
Screenshot 2024-01-29 at 13 52 24
you can see that getWords uses the 0.9% of the time, that I would say is quite significant and it was 0 in the past
you can download the trace here:

https://www.dropbox.com/t/EkyXYZVOM8XINaK0

@Iximiel
Copy link
Member

Iximiel commented Jan 29, 2024

you can see that getWords uses the 0.9% of the time, that I would say is quite significant and it was 0 in the past you can download the trace here:

https://www.dropbox.com/t/EkyXYZVOM8XINaK0

On linux I have some troubles in opening the trace with this format, can you export a .csv version?

@GiovanniBussi
Copy link
Member

Thanks @carlocamilloni I see, I can reproduce this with a dummy test, just using a very low number of atoms (e.g. 20) as you have in alanine dipeptide in vacuum.

I think it can (should) be optimized, I will take care of it

@carlocamilloni
Copy link
Member

@Iximiel unfortunately instruments does not have any other option to save files. It instead allow to copy everything, so I pasted in a text file
trace.txt

@Iximiel
Copy link
Member

Iximiel commented Jan 30, 2024

@Iximiel unfortunately instruments does not have any other option to save files. It instead allow to copy everything, so I pasted in a text file trace.txt

@carlocamilloni I tried to visualize it with the flamegraph, but its instruments interface works only with a csv...
The most recent post that I found about this "export as csv" command in instrument is from 2017, that may be a dropped feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wip Do not merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants