Introduce OrderedSets #834

termi-official · 2023-11-01T14:27:18Z

Title should say everything. This should increase cache locality for some problems (e.g. when using our generated grids) and we now have the guarantee that the iteration order for subdofhandlers and faces can be precisely controlled.

Closes #631 .

termi-official · 2023-11-01T14:55:37Z

Okay, even with this there is still a quite big performance hit in using the SubDofHandler.

using Ferrite, SparseArrays, BenchmarkTools
grid = generate_grid(Quadrilateral, (1000, 1000));

ip = Lagrange{RefQuadrilateral, 1}()
qr = QuadratureRule{RefQuadrilateral}(2)
cellvalues = CellValues(qr, ip);

dh = DofHandler(grid)
add!(dh, :u, ip)
close!(dh);

K = create_sparsity_pattern(dh)

ch = ConstraintHandler(dh);

∂Ω = union(
    getfaceset(grid, "left"),
    getfaceset(grid, "right"),
    getfaceset(grid, "top"),
    getfaceset(grid, "bottom"),
);

dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
add!(ch, dbc);

close!(ch)

function assemble_element!(Ke::Matrix, fe::Vector, cellvalues::CellValues)
    n_basefuncs = getnbasefunctions(cellvalues)
    ## Reset to 0
    fill!(Ke, 0)
    fill!(fe, 0)
    ## Loop over quadrature points
    for q_point in 1:getnquadpoints(cellvalues)
        ## Get the quadrature weight
        dΩ = getdetJdV(cellvalues, q_point)
        ## Loop over test shape functions
        for i in 1:n_basefuncs
            δu  = shape_value(cellvalues, q_point, i)
            ∇δu = shape_gradient(cellvalues, q_point, i)
            ## Add contribution to fe
            fe[i] += δu * dΩ
            ## Loop over trial shape functions
            for j in 1:n_basefuncs
                ∇u = shape_gradient(cellvalues, q_point, j)
                ## Add contribution to Ke
                Ke[i, j] += (∇δu ⋅ ∇u) * dΩ
            end
        end
    end
    return Ke, fe
end

function assemble_global(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
    ## Allocate the element stiffness matrix and element force vector
    n_basefuncs = getnbasefunctions(cellvalues)
    Ke = zeros(n_basefuncs, n_basefuncs)
    fe = zeros(n_basefuncs)
    ## Allocate global force vector f
    f = zeros(ndofs(dh))
    ## Create an assembler
    assembler = start_assemble(K, f)
    ## Loop over all cels
    for cell in CellIterator(dh)
        ## Reinitialize cellvalues for this cell
        reinit!(cellvalues, cell)
        ## Compute element contribution
        assemble_element!(Ke, fe, cellvalues)
        ## Assemble Ke and fe into K and f
        assemble!(assembler, celldofs(cell), Ke, fe)
    end
    return nothing
end


function assemble_global2(cellvalues::CellValues, K::SparseMatrixCSC, dh::DofHandler)
    ## Allocate the element stiffness matrix and element force vector
    n_basefuncs = getnbasefunctions(cellvalues)
    Ke = zeros(n_basefuncs, n_basefuncs)
    fe = zeros(n_basefuncs)
    ## Allocate global force vector f
    f = zeros(ndofs(dh))
    ## Create an assembler
    assembler = start_assemble(K, f)
    ## Loop over all cels
    for cell in CellIterator(dh.subdofhandlers[1])
        ## Reinitialize cellvalues for this cell
        reinit!(cellvalues, cell)
        ## Compute element contribution
        assemble_element!(Ke, fe, cellvalues)
        ## Assemble Ke and fe into K and f
        assemble!(assembler, celldofs(cell), Ke, fe)
    end
    return nothing
end

@btime assemble_global(cellvalues, K, dh); # 494.943 ms (12 allocations: 7.65 MiB)
@btime assemble_global2(cellvalues, K, dh); # 655.668 ms (15 allocations: 7.65 MiB)

cc @kimauth

KnutAM · 2023-11-01T15:53:28Z

Did a quick benchmark, and I think this check:

Ferrite.jl/src/Dofs/DofHandler.jl

Line 192 in fe44e7f

@assert i in sdh.cellset

Is the reason for the worse performance. At least it would make sense to provide a fast unsafe path IMO (or, a quick solution for CellCache is to just pass the sdh.dh) instead. Theoretically, using the SubDofHandler could be faster by one lookup, because ndofs_per_cell is just one value...

termi-official · 2023-11-01T16:39:09Z

Thanks for the pointer. If the performance issue is nothing conceptual, then I would leave the PR as it is (+- updating the deps) and fixing it is left for another PR.

codecov-commenter · 2023-11-01T18:30:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.86%. Comparing base (ccf583d) to head (bcae4a4).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #834      +/-   ##
==========================================
+ Coverage   93.84%   93.86%   +0.01%     
==========================================
  Files          36       36              
  Lines        5293     5310      +17     
==========================================
+ Hits         4967     4984      +17     
  Misses        326      326

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

termi-official · 2023-11-02T12:12:58Z

Did a quick benchmark, and I think this check:

Ferrite.jl/src/Dofs/DofHandler.jl

Line 192 in fe44e7f

@assert i in sdh.cellset

Is the reason for the worse performance. At least it would make sense to provide a fast unsafe path IMO (or, a quick solution for CellCache is to just pass the sdh.dh) instead.

Confirmed locally that dropping this check on this branch brings the performance close to each other:

julia> @btime assemble_global(cellvalues, K, dh); # 494.943 ms (12 allocations: 7.65 MiB)
  478.872 ms (12 allocations: 7.65 MiB)

julia> @btime assemble_global2(cellvalues, K, dh); # 655.668 ms (15 allocations: 7.65 MiB)
  499.271 ms (15 allocations: 7.65 MiB)

where the comment is the reference timing with the check included. Note that the timings on my machine are not super stable. Some of the remaining time difference can be attributed to the iteration of the OrderedSet, which increases the memory pressure. Hence I think it is still worth to investigate the use of ranges in the iterators. Maybe we can add a boolean to the SubDofHandler with information about having a continuous range or not? Or we can take another look at #625

Theoretically, using the SubDofHandler could be faster by one lookup, because ndofs_per_cell is just one value...

If that is the case, then we should do it and force the CellIterator to iterate over the SubDofHandler in the case that there is only 1 subdofhandler.

termi-official · 2023-11-14T13:50:48Z

Also, supersedes https://github.com/Ferrite-FEM/Ferrite.jl/pull/654/files

fredrikekre · 2024-05-19T13:43:18Z

Fixed up after #914.

KnutAM

Would be nice if we could make this somehow such that changing the internal representation in the future won't be breaking, but I guess that is hard.

I'm just stuck with a feeling that this isn't the optimal solution, but I also don't have a better alternative right now.

src/iterators.jl

src/Dofs/ConstraintHandler.jl

KnutAM · 2024-05-19T15:23:02Z

Would probably be nice to automatically sort the sets when adding a regular set to the grid, but can be done later.

fredrikekre · 2024-05-19T22:05:04Z

@KnutAM something isn't right here with the porous media example (probably something with FerriteMeshParser?), can you have a look?

KristofferC · 2024-05-20T08:08:52Z

FWIW, there have been quite a few improvements to the Base Dict that has not been carried over to the OrderedDict. I am not sure this matters in practice but:

If someone is interested it might be worth trying to bring those improvements over to OrderedDict.
It is worth benchmarking again on newer Julia versions (1.11).

termi-official · 2024-05-20T09:01:53Z

I'm just stuck with a feeling that this isn't the optimal solution, but I also don't have a better alternative right now.

I totally agree here on both points. I have tried different designs an none of them were really satisfactory to me.

Would probably be nice to automatically sort the sets when adding a regular set to the grid, but can be done later.

Why? Practially this only makes sense if the underlying grid has some ordering as in our generated ones and if your subdomain also has to follow this order. In generated meshes like e.g. the ones from Gmsh the optimal ordering is almost always not the ordering imposed by the numbering of the elements and you can mess up the performance if you sort them.

* It is worth benchmarking again on newer Julia versions (1.11).

The Set or the OrderedSet version?

fredrikekre · 2024-05-20T09:53:42Z

I totally agree here on both points. I have tried different designs an none of them were really satisfactory to me.

What's wrong with this? We can can relax docs to say that getcellset etc return a collection instead of explicitly OrderedSet if you want the freedom to change it in the future.

Would probably be nice to automatically sort the sets when adding a regular set to the grid

See the comments in the code. addcellset! etc loop over the cells in order so the order of the added set will match. If you bring your own collection then the order is preserved.

KnutAM · 2024-05-20T12:57:32Z

something isn't right here with the porous media example (probably something with FerriteMeshParser?), can you have a look?

Hopefully 14d8800 fixes it. I also tested locally that it works without the overload with Ferrite-FEM/FerriteMeshParser.jl#43. That is a breaking release for FerriteMeshParser.jl though, so need to validate that no other breaking changes are required to support Ferrite v1 before merging and releasing. Once that's done, I'll PR removing that hack, and do the same for FerriteGmsh.

KnutAM · 2024-05-20T13:12:05Z

We can can relax docs to say that getcellset etc return a collection instead of explicitly OrderedSet if you want the freedom to change it in the future.

I think it would make sense to say that it returns an ordered collection as I think this PR shows that we want to keep that.

Would probably be nice to automatically sort the sets when adding a regular set to the grid

See the comments in the code. addcellset! etc loop over the cells in order so the order of the added set will match. If you bring your own collection then the order is preserved.

Right, that makes sense, nvm!

What's wrong with this?

I thought OrderedSet stored the information twice compared to a Set, but I seem to be wrong, so maybe this is the optimal approach (of course, when possible a UnitRange is better but not always applicable and Vector is smaller but has slow in).

struct Index
    t::NTuple{2,Int}
end
cells = rand(1:10000, 1000);
facets = rand(1:4, 1000);
inds = [Index((c, f)) for (c, f) in zip(cells, facets)];

julia> Base.summarysize.((inds, Set(inds), OrderedSet(inds)))
(16040, 35008, 32320)

fredrikekre · 2024-05-20T15:21:54Z

If you need a vector we can expose the internal vector of the set.

This patch changes the use of `Set` to `OrderedSet` in the grid. This means that e.g. loops over these sets follow the original specified order. This is important for performance since it can reduce cache-misses, for example. Fixes #631. Closes #654. Co-authored-by: Fredrik Ekre <[email protected]> Co-authored-by: Dennis Ogiermann <[email protected]>

termi-official · 2024-05-20T16:20:02Z

IIRC one open issue here is that we cannot directly parallelize over the ordered set. We want to add an how to for this.

* Ferrite-FEM/Ferrite.jl#692 * Ferrite-FEM/Ferrite.jl#834 * Ferrite-FEM/Ferrite.jl#914

KnutAM mentioned this pull request Nov 1, 2023

Allow AbstractSet in create_faceset Ferrite-FEM/FerriteMeshParser.jl#32

Merged

termi-official added the awaiting review PR is finished from the authors POV, waiting for feedback label Nov 14, 2023

termi-official added this to the v1.0.0 milestone May 13, 2024

fredrikekre force-pushed the do/sdh-orderedsets branch from 1d2cfb2 to 1d59deb Compare May 19, 2024 12:54

fredrikekre requested a review from KnutAM May 19, 2024 13:38

fredrikekre force-pushed the do/sdh-orderedsets branch 2 times, most recently from bee3d7b to fcb5dee Compare May 19, 2024 13:42

fredrikekre removed the awaiting review PR is finished from the authors POV, waiting for feedback label May 19, 2024

KnutAM approved these changes May 19, 2024

View reviewed changes

src/iterators.jl Outdated Show resolved Hide resolved

src/iterators.jl Outdated Show resolved Hide resolved

src/Dofs/ConstraintHandler.jl Outdated Show resolved Hide resolved

fredrikekre force-pushed the do/sdh-orderedsets branch from 9417505 to 4664636 Compare May 19, 2024 21:59

fredrikekre force-pushed the do/sdh-orderedsets branch from 14d8800 to 4634fde Compare May 20, 2024 15:21

fredrikekre force-pushed the do/sdh-orderedsets branch from 4634fde to ec01184 Compare May 20, 2024 15:34

fredrikekre force-pushed the do/sdh-orderedsets branch from ec01184 to bcae4a4 Compare May 20, 2024 15:48

fredrikekre merged commit 15db94b into master May 20, 2024
9 checks passed

fredrikekre deleted the do/sdh-orderedsets branch May 20, 2024 16:04

KnutAM added a commit to KnutAM/FerriteAssembly.jl that referenced this pull request May 21, 2024

Ferrite facet definition and OrderedSets (#26)

00c5ae0

* Ferrite-FEM/Ferrite.jl#692 * Ferrite-FEM/Ferrite.jl#834 * Ferrite-FEM/Ferrite.jl#914

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce OrderedSets #834

Introduce OrderedSets #834

termi-official commented Nov 1, 2023 •

edited

Loading

termi-official commented Nov 1, 2023

KnutAM commented Nov 1, 2023 •

edited

Loading

termi-official commented Nov 1, 2023

codecov-commenter commented Nov 1, 2023 •

edited by codecov bot

Loading

termi-official commented Nov 2, 2023 •

edited

Loading

termi-official commented Nov 14, 2023

fredrikekre commented May 19, 2024

KnutAM left a comment •

edited

Loading

KnutAM commented May 19, 2024

fredrikekre commented May 19, 2024 •

edited

Loading

KristofferC commented May 20, 2024

termi-official commented May 20, 2024 •

edited

Loading

fredrikekre commented May 20, 2024

KnutAM commented May 20, 2024

KnutAM commented May 20, 2024 •

edited

Loading

fredrikekre commented May 20, 2024

termi-official commented May 20, 2024

Introduce OrderedSets #834

Introduce OrderedSets #834

Conversation

termi-official commented Nov 1, 2023 • edited Loading

termi-official commented Nov 1, 2023

KnutAM commented Nov 1, 2023 • edited Loading

termi-official commented Nov 1, 2023

codecov-commenter commented Nov 1, 2023 • edited by codecov bot Loading

Codecov Report

termi-official commented Nov 2, 2023 • edited Loading

termi-official commented Nov 14, 2023

fredrikekre commented May 19, 2024

KnutAM left a comment • edited Loading

Choose a reason for hiding this comment

KnutAM commented May 19, 2024

fredrikekre commented May 19, 2024 • edited Loading

KristofferC commented May 20, 2024

termi-official commented May 20, 2024 • edited Loading

fredrikekre commented May 20, 2024

KnutAM commented May 20, 2024

KnutAM commented May 20, 2024 • edited Loading

fredrikekre commented May 20, 2024

termi-official commented May 20, 2024

termi-official commented Nov 1, 2023 •

edited

Loading

KnutAM commented Nov 1, 2023 •

edited

Loading

codecov-commenter commented Nov 1, 2023 •

edited by codecov bot

Loading

termi-official commented Nov 2, 2023 •

edited

Loading

KnutAM left a comment •

edited

Loading

fredrikekre commented May 19, 2024 •

edited

Loading

termi-official commented May 20, 2024 •

edited

Loading

KnutAM commented May 20, 2024 •

edited

Loading