Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory allocation #410

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

tizianoGuadagnino
Copy link
Collaborator

@tizianoGuadagnino tizianoGuadagnino commented Dec 4, 2024

Motivation

After doing some heaptrack analysis in PRBonn/kinematic-icp/pull/22, it turns out that our beloved system performs an insane amount of unnecessary allocations.

This PR

This change can be synthesized with "don't let that vector to reallocate for each point".

std::vector<Voxel> GetAdjacentVoxels(const Voxel &voxel, int adjacent_voxels = 1) {
    std::vector<Voxel> voxel_neighborhood; // <--- NO RESERVE HAS BEEN CALLED
    for (int i = voxel.x() - adjacent_voxels; i < voxel.x() + adjacent_voxels + 1; ++i) {
        for (int j = voxel.y() - adjacent_voxels; j < voxel.y() + adjacent_voxels + 1; ++j) {
            for (int k = voxel.z() - adjacent_voxels; k < voxel.z() + adjacent_voxels + 1; ++k) {
                voxel_neighborhood.emplace_back(i, j, k);
            }
        }
    }
    return voxel_neighborhood;
}

This code does not preallocate memory, and even worse, this reallocation happens for each point in the scan for which we are computing associations. The funny thing is that we perfectly know how these voxel offsets look, so I just precompute them. The result in terms of the number of allocations is incredible.

Results

Memory allocations

pr_image1

}
return voxel_neighborhood;
}
static const std::array<Voxel, 27> shifts{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I tried moons ago, and also worked fine (mainly style):

const std::array<Voxel, 27> &GetAdjacentVoxels() {
    static const auto ADJACENT_VOXELS = [&]() -> std::array<Voxel, 27> {
        std::array<Voxel, 27> output;
        // clang-format off
        size_t idx = 0;
        for (int i = -1; i <= 1 ; ++i) {
        for (int j = -1; j <= 1 ; ++j) {
        for (int k = -1; k <= 1 ; ++k) {
            output[idx++] = Voxel{i,j,k};
        }}}
        // clang-format on
        return output;
    }();
    return ADJACENT_VOXELS;
}

And down the line:

std::array<Voxel, 27> GetVoxelNeighborhood(const Voxel &voxel) {
    auto voxel_neighborhood = GetAdjacentVoxels();
    for (auto &adjacent_voxel : voxel_neighborhood) adjacent_voxel += voxel;
    return voxel_neighborhood;
}

This is for "more readability." as later we will "search in the neighboring voxels", instead of the shift+voxel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it looks a bit too complicated for what it should be. I will compile the decision using -Wpedantic @benemer ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. I see one advantage with Nacho's solution: it is easier to consider more than one neighboring voxel (in case you need to have a small voxel size but still find nearest neighbors).

How about a compromise?

static const std::array<Voxel, 27> voxel_shifts = []() {
    std::array<Voxel, 27> output;
    size_t idx = 0;
    for (int i = -1; i <= 1; ++i) {
        for (int j = -1; j <= 1; ++j) {
            for (int k = -1; k <= 1; ++k) {
                output[idx++] = Voxel{i, j, k};
            }
        }
    }
    return output;
}();

This keeps the simplicity but allows better readability and simpler extension to more neighboring voxels.

@tizianoGuadagnino tizianoGuadagnino changed the title Reduce allocations -> Improve runtime Improve runtime by reshaping TBB Data Association Dec 4, 2024
@tizianoGuadagnino tizianoGuadagnino changed the title Improve runtime by reshaping TBB Data Association Reduce memory allocation Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants