-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lvardt cell CvMembList.ml.size() should be the number of contiguous regions of the fixed step thread Memb_list #3299
base: master
Are you sure you want to change the base?
Conversation
✔️ d73896e -> Azure artifacts URL |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3299 +/- ##
=======================================
Coverage 67.07% 67.07%
=======================================
Files 571 571
Lines 111055 111074 +19
=======================================
+ Hits 74488 74508 +20
+ Misses 36567 36566 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a high level perspective, do you happen to have some performance numbers related to this change, or do we also need #3300 to be able to see a (substantial) difference? Also, are the improvements consistent on both Linux and MacOS?
Regarding the CI failure on MacOS, I don't really see how it's related to this change (something about fmt? But the PR doesn't touch those parts at all), so we can probably ignore it.
Ironically, this is performant equivalent to #3300 for a single cell model. (For a single cell model global vardt and local vardt are conceptually equivalent since in both cases there is only a single instance of the cvode integrator). I say "ironically" because the my first sight of the issue was for the single cell modeldb model 267666 discrepancy between master local and global cv:advance_tn time and instructions retired:
The reason this PR is not an endpoint solution to the problem is that default node ordering is globally section depth first which interleaves sections of different cells. Here is a ringtest sim time comparison between master, this pr, and #3300 using the intel compiler
I have not checked the mac performance.
I believe that was localized to the new NMODL and I didn't look closely about where the specific failing CI's were getting that version. I do notice that locally my PR and the master are both getting 49fdfe6c from https://github.com/BlueBrain/nmodl |
✔️ ac1c6a9 -> Azure artifacts URL |
✔️ 8e1e0e5 -> Azure artifacts URL |
Quality Gate passedIssues Measures |
✔️ 32921b2 -> Azure artifacts URL |
There are serious performance problems with the local variable time step method in comparison with 8.2 This PR is a prerequisite for fixing one of them.
The 9.0 implementation for local variable time step (each cell is integrated by its own cvode instance) demanded that a cell's
CvMembList.ml.size()
could be either 1 orMemb_list .nodecount
.Thus, if there were multiple cells in a thread, the
ml.size() == nodecount
, and eachMemb_list
instance was limited to a single Node*.A subsequent PR (#3300) will default sort the fixed step nodes so that Nodes of each cell will be as contiguous as possible. Presently, the default is that only nodes in a Section are contiguous.