ClearEntireCostmap returns before all costmap layer include valid data #4734

schupf2 · 2024-10-29T09:50:47Z

Bug report

Required Info:

Operating System:
- Ubuntu 22.04.5 LTS
ROS2 Version:
- Humble binaries
Version or commit hash:
- 11.1.13-0jammy
DDS implementation:
- Cyclonedds

Steps to reproduce issue

When working with a costmap with more then one different layers (in our case, we use ["static_layer", "obstacle_layer", "inflation_layer"]), we face a problem after clearing the costmap: The the nav2_msgs::srv::ClearEntireCostmap service returns after the costmap is cleared. However, at this point no scanner measurements are considered in the obstacle layer. Thus, the space which is not considered in the static layer is marked as free for a short time until the next scanner measurement is included in the obstacle layer.

Expected behavior

There should be an option to either enforce waiting on information in all layers being present before returning from the ClearEntireCostmap service call, or some option to verify the presence of information on all layers.

Actual behavior

After the ClearEntireCostmap service call the costmap only shows values form that static layer and no scanner measurements are yet considered in the obstacle layer. Therefore, obstacles in front of the robot are ignored for a brief time.

The text was updated successfully, but these errors were encountered:

SteveMacenski · 2024-10-29T12:25:46Z

That is why there is a ‘isCurrent’ API that is checked before planning or control is allowed to make sure there is new data before the cost map is allowed to be used. Why the cost map might be in a state without valid data after a clear, it is not used until the data is again valid for all layers. Does that make sense?

schupf2 · 2024-10-30T14:52:45Z

Thank you for the quick answer - that definitely makes sense :) .
However, if one uses the the collision checker directly (and not the whole controller) - the check is not performed and one needs to get a hold of the LayeredCostmap or the Costmap2DROS in order to be able to perform the check. So maybe a service (similar to e.g., ClearEntireCostmap) that waits for isCurrent to evaluate to true, might be a nice feature.

SteveMacenski · 2024-10-31T09:51:01Z

A collision checker though would be used inside of a controller/planner plugin, so if the controller / planner is not being utilized until data is valid yet, then there is no issue :-) There shouldn't be any calls to costmap data structures by the task server's algorithms until isCurrent is true

SteveMacenski · 2024-11-05T18:37:48Z

Are you saying to use the costmap topic collision detector object? That is a good point - I think though we could adjust how the costmap publishing is done so that we don't publish a new costmap until its current again. I don't think blocking the clear costmap service is a good idea since the costmap update rate is user-defined so it could be very slow. We could not update the published costmap topics as well as blocking the algorithm's use of the costmap until isCurrent() == true.

Is that what you mean / work for you?

schupf2 · 2024-11-06T09:11:01Z

Thank your for your quick replies.

Yes exactly - I was talking about the CostmapTopicCollisionChecker.

Yes that would be amazing and definitely solve our problem. I'm not sure if it could cause any problems for other existing users though. Maybe we could also introduce a parameter to determine if the use of the costmap should be blocked until isCurrent = true. However, in the long run I think the behavior you suggested would definitely be desirable.

SteveMacenski · 2024-11-06T17:48:02Z

@schupf2 I looked over the details and it should be as easy as adding isCurrent() to the condition here: https://github.com/ros-navigation/navigation2/blob/main/nav2_costmap_2d/src/costmap_2d_ros.cpp#L542-L544

As long as we keep the updateBounds calls in place even when not current, then once it becomes current then we should update the lot of it.

Would you be open to testing that out to make sure that resolves your concern and then submitting the PR? :-)

schupf2 · 2024-11-08T08:46:45Z

Amazing thank you.
Definitely, I'll do that on Monday :)

schupf2 · 2024-11-11T16:05:18Z

I implemented the discussed change today and tested it. Unfortunately, it still doesn't solve the problem.

I discovered that the isCurrent() of the obstacleLayer briefly returns true after resetting the costmap. This happens due to the following two parts in the code:

In the ObstacleLayer::resetBuffersLastUpdated(), the last_updated_ of the individual observation buffers is reset to the current time. Thus, right after resetting the individual buffers will return true on isCurrent(), even if no data has been received.
Furthermore, in the ObstacleLayer::updateCosts (precisely here: https://github.com/ros-navigation/navigation2/blob/fbd1d3e7964dc220c0861d5e8bdcfcc7b9ec2812/nav2_costmap_2d/plugins/obstacle_layer.cpp#L542-544) current_ is also set to true in the first update after the reset. However, this does not at all verify that data is present / that the obstacle buffers are current.

SteveMacenski · 2024-11-11T21:51:45Z

Ah - I didn't catch that in my initial analysis.

I believe we set that to true after clearing since at that point, we've called updateMap for the updateBounds and updateCosts to have been triggered. Whether the buffers contain new data or not, from the main costmap's perspective, the layers have performed a complete update (just without promise that new sensor data has come in in the meantime). When using the sensor timeout parameter on the observation source, if an update hasn't come in in a sufficiently long time based on the sensor's update frequency, then we set current = false since there's something wrong with a sensor. Thus, for just 1 update without new sensor data, that is seemingly a valid and complete update to consider current, in general.

So from that perspective, this feels correct - we did successfully update the costmap layer, it just didn't contain a promise that data was processed yet. Though, if using a static layer, we know a map and inflation are now present on a single update. However the nature of "current" seems insufficient for the case where we (1) want to know if we performed a valid update and (2) the costmap layer actually had data that iteration as well. If you had a 5 hz publishing sensor and a 10 hz updating costmap, not each costmap update would have a new piece of sensor data and that is an expected behavior. Or a situation with multiple observations in a layer and only some subset of them are updated in the first iteration.

That doesn't address our problem here, though. I'd be open to suggestions. A couple of initial thoughts:

We have more than just isCurrent to know if all layers were actually updated with new data -- though a little problematic to decipher the outputs since some layers like the StaticLayer technically have updates, but are not frequently updated that you wouldn't want to block for -- but ObstacleLayer may be. Adding an isUpdated() API and how we process the output of that seems tricky to get right if its blocking planning/controlling.
Have some delay when a clearing costmap operation happens before the costmap collision checker object uses the costmap -- though that seems very fragile / hacky and would result in less responsive behaviors that use the costmap collision checker object. Or perhaps have an isUpdated API that blocks publication of a new costmap until all layers have a complete update. This would work, however it would just block any behavior relying on the costmap collision checker object for some time after a clearing. But, I suppose that is kind of the point, isn't it?
Clearing costmap service block until data comes in, but I think that mimics the same issues in my first thought since its hard to tell which layers "should" have updates & blocking the service call could create other data congestion issues

It seems like we'd need to make some promise about the fact that each observation source in each data processing costmap layer has not only been called to update, but that update contained new current data. And, do so in a general purpose way that accounts for the fact that some data sources come in faster / slower, regular or irregular. And also if that's really necessary in the way that we're thinking about it right now.

Any thoughts?

schupf2 · 2024-11-12T10:54:29Z

Thanks for your detailed answer :).

Yes, I agree that it does make sense for a costmap to be valid in general after it has undergone a complete update procedure. However, I think this makes the isCurrent somewhat inconsistent. After clearing the costmap it basically tells us whether the costmap has been fully updated (not taking into account if sensor data has been received as expected), while during all other times isCurrent evaluates to false when sensor data is not received as expected (even if the map has been updated correctly).

So from my perspective, also taking into account the presence of sensor data (also after a reset), seems like a valid solution. If I'm not missing something here, it might also be less likely to introduce further problems then the points you mentioned (especially since you already mentioned the problems ^^).

The changes I would suggest:

I don't see the necessity to reset the last_updated_ in the observation buffer after a reset, since the buffer itself is not reset (as far as I can see). So basically, when we reset the map the buffer itself is not affected. Therefore, I would suggest to NOT call ObservationBuffer::resetLastUpdated() if the map is cleared. That way, the isCurrent() of the observation buffer would always correctly take into consideration if sensor data has been received within the expected_update_rate. If the observation_keep_time > expected_update_rate it would then be safe to assume that there is sensor data in the buffer if isCurrent() evaluates to true.
You said that "I believe we set that to true after clearing since at that point, we've called updateMap for the updateBounds and updateCosts to have been triggered". However, the current_ is already set in updateBounds anyway - if there is no irregularity with the sensor, it will also be set to true right after the costmap is cleared. Thus, we would NOT need to set current_ to true after a reset in updateCosts, since it will be set in updateBounds anyway.

However, I noticed another thing here: If the map was cleared, the obstacle layer reports being current already after the call to updateBounds (assuming sensor data is present) and before the call to updateCosts - i.e. before the map is actually updated. Maybe we should take this into consideration as well and check for was_reset_ in the isCurrent() of the ObstacleLayer.

So to sum it up. I think if we remove line 760 and 544 in the ObstacleLayer, we would obtain consistent behavior no matter if the costmap was reset. For the normal case (when sensor data is received as expected), after a reset the costmap is current again as soon as the map was updated. While if we didn't receive sensor data in the expected update rate, the costmap will not be current regardless of whether it has been cleared or not.

Combined with the change of only publishing the costmap if it's current, this would also solve my problem (at least from what I can see in a preliminary analysis) :).

SteveMacenski · 2024-11-12T20:42:53Z

while during all other times isCurrent evaluates to false when sensor data is not received as expected (even if the map has been updated correctly).

I think it is consistent: it takes time before the sensor timeout can trigger. If you have a 10 hz updating sensor and you set your timeout to 0.2 seconds to give it some leeway, then when we reset() or clear, it could be up to 0.2 seconds before we should evaluate to false. The same would happen during normal non-reset times, its just that when we reset, the timeout itself is reset at that very moment, rather than being on a rolling basis. I think these are exactly the same.

also taking into account the presence of sensor data (also after a reset), seems like a valid solution.

I don't entirely disagree with you, though I'm cautious because I know costmaps can be implemented with inputs that are either not regularly updating sensors (i.e. not a regular frequency) or based on slow updating inputs (1hz AI output or even on event basis). If we do this, then we would (1) have to wait until all costmap layers and their constituent observation sources have some new data and (2) block publication and algorithms until such a time -- which could be arguably good. We could make a carve out for special cases like the Static Layer that are not updated regularly, but that now means its another API that has to be implemented and managed. Doable, but starts to feel like patches on top of patches and is further complicated if the inputs are not simple lasers, depth cameras, etc that update regularly and frequently.

I don't see the necessity to reset the last_updated_ in the observation buffer after a reset, since the buffer itself is not reset (as far as I can see).

I tried to find the origins of this, but its been done since ROS 1 Navigation, so its hard for me to pinpoint. Its at least 11 years old, but it seems to me to have been in the stack since the very first commit on costmap 2d, or so close to it that even GitHub itself doesn't have the stored Blame information for me to backtrack further.

I believe this could be due to a reset operation taking arbitrarily long, or reset is meant to reset all state and that's an important element of the layer's state. It used to unsubscribe to the topic, reset the map, then resubscribe to the topic, and finally reset the last updated time (which makes sense if a subscription is new). There was an activate and deactivate function that in reset are called relatively quickly, but could in concept be called independently with an arbitrary amount of time between them. We don't do the same activate/deactivate for the reset anymore.

So, I'm a bit cautious about a change to that.

However, the current_ is already set in updateBounds anyway

Update bounds is called in the updateMap loop in costmap 2D, its not called as part of the clearing or reset operation. So, if we updateMap and we have valid (although potentially empty) observation buffers, then it would always be current_ = true;. Perhaps this can be removed. I believe the thinking is that if we reset the costmap, we don't want to block the next planning/control iteration that'll be immediately triggered & the #1999 that this PR was in response to was w.r.t. the Map not being present after a reset, so my choices on the data processing layers may be flawed.

I can really see an argument for removing that current_ & was_reset_ logic so that current_ is not forced true, allowing it to be false if data doesn't come in within the buffer timeout without an immediate override. I'd love a 3rd sanity check on that, but I'm on board. I think that also addresses your next text on "However, I noticed another thing here:"?

check for was_reset_ in the isCurrent() of the ObstacleLayer.

current_ = false; is set in the reset along with was_reset_ = true;. So, isCurrent() is already going to return false until updateMap runs to trigger updateBounds / updateCosts. I think that's already baked in at the moment.

So to sum it up.

I think if we remove line 760

@mikeferguson can you provide some historical context and/or let me know your thoughts?

and 544 in the ObstacleLayer,

Tentatively agree. @mikeferguson do you concur?

schupf2 · 2024-11-14T09:55:50Z

Doable, but starts to feel like patches on top of patches ...

You're definitely right there. I also think we should keep it as simple as is and only consider the expected_update_rate instead of the actual presence of data.

I believe this could be due to a reset operation taking arbitrarily long, or reset is meant to reset all state and that's an important element of the layer's state ...

Yes, if we used to unsubscribe and resubscribe to the sensor topic the resetBuffersLastUpdated definitely seems to have been necessary, but as this is not done anymore I think the resetBuffersLastUpdated should not be needed.

In terms of "a reset operation taking arbitrarily long" - this should not matter right? For the ObservationBuffer we store new point clouds in the buffer when received. So if we don't unsubscribe while clearing the map, we will also obtain updates during that time - so the actual update rate should not be affected.

So, I'm a bit cautious about a change to that.

Definitely understand that though :-).

So, if we updateMap and we have valid (although potentially empty) observation buffers, then it would always be current_ = true

As far as I can see it, we should not obtain a observation buffer that is empty and valid IF we set the observation_persistence and expected_update_rate accordingly (since the observation buffer is not cleared during a reset). That is totally fine IMO, since it's basically up to the user if they are willing to accept potentially empty observation buffers or not.

I'd love a 3rd sanity check on that, but I'm on board.

:D

current_ = false; is set in the reset along with was_reset_ = true;. So, isCurrent() is already going to return false until updateMap runs to trigger updateBounds / updateCosts.

Yes exactly, but in the LayeredCostmap::updateMap the updateBounds is of course always called before the updateCosts. Thus, (in theory) after a reset isCurrent could be set to true before the the map is actually updated in updateCosts (at least for the obstacle layer), leading to a costmap with isCurrent being true before the obstacle layer is included in the map.

However, I didn't notice any problems with that so far, so I guess this should not cause problems since the costmap will be updated right after.

Looking forward to @mikeferguson thoughts :).

schupf2 · 2024-11-28T13:06:57Z

@mikeferguson: Still looking forward to your opinion on this :)

We also suggest the same changes for the Spatio Temporal Voxel Layer , as we obtained the same problems here. We already have working implementations for both repositories which are currently undergoing some thorough testing.

Furthermore, we observed a problem with removing stale observations in the stvl, which we mentioned in this issue.

SteveMacenski · 2024-12-02T22:40:35Z

Sure thing, STVL / NPVL would probably be good to both be updated by this (though NPVL is so simple, it probably isn't required)

mikeferguson · 2024-12-10T04:06:18Z

I think, for costmap historical stuff, @DLu might have more knowledge than me (I read this entire ticket but actually don't have anything of value to add)

DLu · 2024-12-13T02:08:47Z

Greetings -

For historical perspective, the isCurrent structure was put in place specifically so that obstacle layers that haven't gotten updates recently would cause the planning to stop. This bit of ObservationBuffer::isCurrent is the only bit that could cause Costmap2DROS::isCurrent to return false (as far as I recall) in ROS 1.

As far as I've found, the current status was not checked before running recovery behaviors such as clearing the costmap. I also don't believe that the recovery itself checked or set the current status.

I think if we remove line 760 and 544

Line 760 was introduced in 2019 by @crdelsey in #1412. Line 544 was @SteveMacenski in 2020 with #2090.

Ultimately, I think that there are three distinct states intertwined with this bool, which I'm going to assign arbitrary names.

FRESH - The costmap is newly initialized OR has been reset recently and has yet to receive new data.
CURRENT - The costmap has gotten some data recently within the expected update period.
STALE - The costmap has not gotten any data within the expected update period.

isCurrent returning false was (in ROS 1) only an indication of the STALE observation data. The expansion to cover both STALE and FRESH came from #2090, by setting current_ to false initially and when resets happen.

I haven't yet traced through all the changes to current_ in Nav2, but removing line 544 makes sense to me, although then we might not even need was_reset. I'm not sure about removing line 760, which makes sure that the costmap is not STALE, which might make it so that a freshly reset costmap that hasn't recieved data recently might still be marked as CURRENT

schupf2 · 2024-12-13T11:38:01Z

Thank you @DLu for your detailed response and providing historical insight :)

Why do you think removing line 760 prevents the costmap from being STALE ?
I would see it rather the other way around. The resetBuffersLastUpdated() which is called in line 760 is only setting the last_updated_ of each observation buffer to the current time (not clearing any data from the buffer). Thus, if we don't receive any data, the costmap will still be current after a reset, since the last_updated_ is recent enough (even if we didn't receive any data).

schupf2 · 2024-12-17T15:19:52Z

Since I already have the implementation for all my suggestions, I created a pull request (#4800) so some changes might be more visible / easier to discuss.

Looking forward to your comments on this.
In the meantime: Wish you all a merry christmas and happy new year :)

sjusner mentioned this issue Nov 28, 2024

Inconsistent behavior for update rate between stvl and navigation2/nav2_costmap_2d SteveMacenski/spatio_temporal_voxel_layer#307

Closed

sjusner mentioned this issue Dec 4, 2024

Update in observation time comparison SteveMacenski/spatio_temporal_voxel_layer#308

Merged

schupf2 mentioned this issue Dec 17, 2024

Fix isCurrent in obstacle layer and assure CostmapTopicCollisionChecker only uses current costmaps #4800

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClearEntireCostmap returns before all costmap layer include valid data #4734

ClearEntireCostmap returns before all costmap layer include valid data #4734

schupf2 commented Oct 29, 2024

SteveMacenski commented Oct 29, 2024

schupf2 commented Oct 30, 2024

SteveMacenski commented Oct 31, 2024

SteveMacenski commented Nov 5, 2024

schupf2 commented Nov 6, 2024 •

edited

Loading

SteveMacenski commented Nov 6, 2024 •

edited

Loading

schupf2 commented Nov 8, 2024

schupf2 commented Nov 11, 2024

SteveMacenski commented Nov 11, 2024 •

edited

Loading

schupf2 commented Nov 12, 2024

SteveMacenski commented Nov 12, 2024

schupf2 commented Nov 14, 2024

schupf2 commented Nov 28, 2024

SteveMacenski commented Dec 2, 2024

mikeferguson commented Dec 10, 2024

DLu commented Dec 13, 2024

schupf2 commented Dec 13, 2024

schupf2 commented Dec 17, 2024

ClearEntireCostmap returns before all costmap layer include valid data #4734

ClearEntireCostmap returns before all costmap layer include valid data #4734

Comments

schupf2 commented Oct 29, 2024

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

SteveMacenski commented Oct 29, 2024

schupf2 commented Oct 30, 2024

SteveMacenski commented Oct 31, 2024

SteveMacenski commented Nov 5, 2024

schupf2 commented Nov 6, 2024 • edited Loading

SteveMacenski commented Nov 6, 2024 • edited Loading

schupf2 commented Nov 8, 2024

schupf2 commented Nov 11, 2024

SteveMacenski commented Nov 11, 2024 • edited Loading

schupf2 commented Nov 12, 2024

SteveMacenski commented Nov 12, 2024

schupf2 commented Nov 14, 2024

schupf2 commented Nov 28, 2024

SteveMacenski commented Dec 2, 2024

mikeferguson commented Dec 10, 2024

DLu commented Dec 13, 2024

schupf2 commented Dec 13, 2024

schupf2 commented Dec 17, 2024

schupf2 commented Nov 6, 2024 •

edited

Loading

SteveMacenski commented Nov 6, 2024 •

edited

Loading

SteveMacenski commented Nov 11, 2024 •

edited

Loading