-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce OrderedSets #834
Conversation
Okay, even with this there is still a quite big performance hit in using the SubDofHandler. using Ferrite, SparseArrays, BenchmarkTools
grid = generate_grid(Quadrilateral, (1000, 1000));
ip = Lagrange{RefQuadrilateral, 1}()
qr = QuadratureRule{RefQuadrilateral}(2)
cellvalues = CellValues(qr, ip);
dh = DofHandler(grid)
add!(dh, :u, ip)
close!(dh);
K = create_sparsity_pattern(dh)
ch = ConstraintHandler(dh);
∂Ω = union(
getfaceset(grid, "left"),
getfaceset(grid, "right"),
getfaceset(grid, "top"),
getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
add!(ch, dbc);
close!(ch)
function assemble_element!(Ke::Matrix, fe::Vector, cellvalues::CellValues)
n_basefuncs = getnbasefunctions(cellvalues)
## Reset to 0
fill!(Ke, 0)
fill!(fe, 0)
## Loop over quadrature points
for q_point in 1:getnquadpoints(cellvalues)
## Get the quadrature weight
dΩ = getdetJdV(cellvalues, q_point)
## Loop over test shape functions
for i in 1:n_basefuncs
δu = shape_value(cellvalues, q_point, i)
∇δu = shape_gradient(cellvalues, q_point, i)
## Add contribution to fe
fe[i] += δu * dΩ
## Loop over trial shape functions
for j in 1:n_basefuncs
∇u = shape_gradient(cellvalues, q_point, j)
## Add contribution to Ke
Ke[i, j] += (∇δu ⋅ ∇u) * dΩ
end
end
end
return Ke, fe
end
function assemble_global(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
## Allocate the element stiffness matrix and element force vector
n_basefuncs = getnbasefunctions(cellvalues)
Ke = zeros(n_basefuncs, n_basefuncs)
fe = zeros(n_basefuncs)
## Allocate global force vector f
f = zeros(ndofs(dh))
## Create an assembler
assembler = start_assemble(K, f)
## Loop over all cels
for cell in CellIterator(dh)
## Reinitialize cellvalues for this cell
reinit!(cellvalues, cell)
## Compute element contribution
assemble_element!(Ke, fe, cellvalues)
## Assemble Ke and fe into K and f
assemble!(assembler, celldofs(cell), Ke, fe)
end
return nothing
end
function assemble_global2(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
## Allocate the element stiffness matrix and element force vector
n_basefuncs = getnbasefunctions(cellvalues)
Ke = zeros(n_basefuncs, n_basefuncs)
fe = zeros(n_basefuncs)
## Allocate global force vector f
f = zeros(ndofs(dh))
## Create an assembler
assembler = start_assemble(K, f)
## Loop over all cels
for cell in CellIterator(dh.subdofhandlers[1])
## Reinitialize cellvalues for this cell
reinit!(cellvalues, cell)
## Compute element contribution
assemble_element!(Ke, fe, cellvalues)
## Assemble Ke and fe into K and f
assemble!(assembler, celldofs(cell), Ke, fe)
end
return nothing
end
@btime assemble_global(cellvalues, K, dh); # 494.943 ms (12 allocations: 7.65 MiB)
@btime assemble_global2(cellvalues, K, dh); # 655.668 ms (15 allocations: 7.65 MiB) cc @kimauth |
Did a quick benchmark, and I think this check: Ferrite.jl/src/Dofs/DofHandler.jl Line 192 in fe44e7f
Is the reason for the worse performance. At least it would make sense to provide a fast unsafe path IMO (or, a quick solution for |
Thanks for the pointer. If the performance issue is nothing conceptual, then I would leave the PR as it is (+- updating the deps) and fixing it is left for another PR. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #834 +/- ##
==========================================
+ Coverage 93.84% 93.86% +0.01%
==========================================
Files 36 36
Lines 5293 5310 +17
==========================================
+ Hits 4967 4984 +17
Misses 326 326 ☔ View full report in Codecov by Sentry. |
Confirmed locally that dropping this check on this branch brings the performance close to each other: julia> @btime assemble_global(cellvalues, K, dh); # 494.943 ms (12 allocations: 7.65 MiB)
478.872 ms (12 allocations: 7.65 MiB)
julia> @btime assemble_global2(cellvalues, K, dh); # 655.668 ms (15 allocations: 7.65 MiB)
499.271 ms (15 allocations: 7.65 MiB) where the comment is the reference timing with the check included. Note that the timings on my machine are not super stable. Some of the remaining time difference can be attributed to the iteration of the OrderedSet, which increases the memory pressure. Hence I think it is still worth to investigate the use of ranges in the iterators. Maybe we can add a boolean to the SubDofHandler with information about having a continuous range or not? Or we can take another look at #625
If that is the case, then we should do it and force the CellIterator to iterate over the SubDofHandler in the case that there is only 1 subdofhandler. |
Also, supersedes https://github.com/Ferrite-FEM/Ferrite.jl/pull/654/files |
1d2cfb2
to
1d59deb
Compare
bee3d7b
to
fcb5dee
Compare
Fixed up after #914. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if we could make this somehow such that changing the internal representation in the future won't be breaking, but I guess that is hard.
I'm just stuck with a feeling that this isn't the optimal solution, but I also don't have a better alternative right now.
Would probably be nice to automatically sort the sets when adding a regular set to the grid, but can be done later. |
9417505
to
4664636
Compare
@KnutAM something isn't right here with the porous media example (probably something with FerriteMeshParser?), can you have a look? |
FWIW, there have been quite a few improvements to the Base
|
I totally agree here on both points. I have tried different designs an none of them were really satisfactory to me.
Why? Practially this only makes sense if the underlying grid has some ordering as in our generated ones and if your subdomain also has to follow this order. In generated meshes like e.g. the ones from Gmsh the optimal ordering is almost always not the ordering imposed by the numbering of the elements and you can mess up the performance if you sort them.
The |
What's wrong with this? We can can relax docs to say that
See the comments in the code. |
Hopefully 14d8800 fixes it. I also tested locally that it works without the overload with Ferrite-FEM/FerriteMeshParser.jl#43. That is a breaking release for FerriteMeshParser.jl though, so need to validate that no other breaking changes are required to support Ferrite v1 before merging and releasing. Once that's done, I'll PR removing that hack, and do the same for FerriteGmsh. |
I think it would make sense to say that it returns an ordered collection as I think this PR shows that we want to keep that.
Right, that makes sense, nvm!
I thought struct Index
t::NTuple{2,Int}
end
cells = rand(1:10000, 1000);
facets = rand(1:4, 1000);
inds = [Index((c, f)) for (c, f) in zip(cells, facets)];
julia> Base.summarysize.((inds, Set(inds), OrderedSet(inds)))
(16040, 35008, 32320) |
14d8800
to
4634fde
Compare
If you need a vector we can expose the internal vector of the set. |
4634fde
to
ec01184
Compare
This patch changes the use of `Set` to `OrderedSet` in the grid. This means that e.g. loops over these sets follow the original specified order. This is important for performance since it can reduce cache-misses, for example. Fixes #631. Closes #654. Co-authored-by: Fredrik Ekre <[email protected]> Co-authored-by: Dennis Ogiermann <[email protected]>
ec01184
to
bcae4a4
Compare
IIRC one open issue here is that we cannot directly parallelize over the ordered set. We want to add an how to for this. |
Title should say everything. This should increase cache locality for some problems (e.g. when using our generated grids) and we now have the guarantee that the iteration order for subdofhandlers and faces can be precisely controlled.
Closes #631 .