-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't reopen file for every seek if we don't have to. Search directionally for the correct file #1117
Conversation
@emersonknapp I've just did analysis for this problem yesterday when trying to figure out why we have strange warning in player.
I came to conclusion that in
We need to add API to be able to open another file in current storage instead of recreating storage. Recreating storage is expensive. It is reload plugin in memory. And we call load_current_file() for each seek(timestamp) call.
Note: I've had a chance to review your changes yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me. I agree on removing the misleading warning - best not to confuse users by warning about normal operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emersonknapp Thank you for your contribution and sorry for my late response.
This is certainly improvements comparable to how it was before.
Although I would like to give some "love" to the logic in implementation to make more clear.
I have some concern in regards to the API changes in particular to the has_next_file(bool reverse)
and load_next_file(bool reverse)
.
After some consideration I think it will be better to create separate API for has_previous_file()
and load_next_file()
You may argue that current implementation is better align with current implementation since we already have reverse
flag on which we rely up on.
From one hand it look like easy and more straight forward in implementation in one place
rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp
Lines 146 to 147 in e948e95
if (!current_storage_has_next && has_next_file(read_order_.reverse)) { | |
load_next_file(read_order_.reverse); |
However from another hand it's not very clear in another place where logic more sophisticated
rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp
Lines 213 to 224 in e948e95
if (timestamp < start_time && has_next_file(true)) { | |
// Check back a file if the timestamp is before the beginning of the current file | |
load_next_file(true); | |
return seek(timestamp); | |
} else if (timestamp > end_time && has_next_file(false)) { | |
// Check forward a file if the timestamp is after the end of the current file | |
load_next_file(false); | |
return seek(timestamp); | |
} else { | |
// The timestamp lies in the range of this file, or there are no files left to go to | |
storage_->seek(timestamp); | |
} |
IMO it will be much cleaner to understand what is going on in logic if API will be with self described names and granuraly for instance:
if (timestamp < start_time && has_prev_file()) {
// Check back a file if the timestamp is before the beginning of the current file
load_prev_file(true);
return seek(timestamp);
} else if (timestamp > end_time && has_next_file()) {
// Check forward a file if the timestamp is after the end of the current file
load_next_file();
return seek(timestamp);
} else {
// The timestamp lies in the range of this file, or there are no files left to go to
storage_->seek(timestamp);
}
When someone will try to comprehend this logic and will see construction like has_next_file(true)
it's not really obvious that this method call really checking for existence of the previous file rather than next. And it will cause a lot of confusion and will have to keep in mind that next file is not really the next but previous even if name it as next. It's more error prone approach.
Also you can argue that it's made for the sake of the reducing code duplication in "next_/previous_" methods. I was trying to consider this argument as well and it looks like this not really the case. It will be a duplication of a few lines of code in load_next_file()
and load_previous_file()
rosbag2/rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp
Lines 259 to 269 in e948e95
assert(current_file_iterator_ != file_paths_.end()); | |
auto info = std::make_shared<bag_events::BagSplitInfo>(); | |
info->closed_file = get_current_file(); | |
if (reverse) { | |
current_file_iterator_--; | |
} else { | |
current_file_iterator_++; | |
} | |
info->opened_file = get_current_file(); | |
load_current_file(); | |
callback_manager_.execute_callbacks(bag_events::BagEvent::READ_SPLIT, info); |
and logic there pretty much straight forward and will became eve more simple i.e. without any branching like if-else if we will duplicate it in relevant methods.
I also was thinking about performance impact for calling get_metadata
from storage for each seek operation.
auto metadata = storage_->get_metadata(); |
In current implementation at least in sqlite3
storage plugin this is pretty much expensive call.
We do request to the database for pulling out information about all topics and counting all messages in essence internally in db engine it will translates to iterating over each message give results from such SQL request.
I think we can optimize this place if we will readout and calculate values for metadata only once when we are opening storage and keep metadata in private member variable ready to be returned when needed by request from get_metadata()
method.
Although there are one caveat in such approach. We need to take in to account situation when get metadata could be called for storage which was opened with read_write
or append
io_flag
.
We need to readout metadata only if io_flag == READ_ONLY
.
We can built in somewhere here
rosbag2/rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp
Lines 195 to 197 in e948e95
if (is_read_write(io_flag)) { | |
initialize(); | |
} |
and will need to keep this
io_flag
as private member variable to be able to check it in get_metadata()
to be able to decide if we need to do a fresh metadata readout or we can return just cached value saved in private member variable during storage opening.
I'm fine with |
…nally for the correct file Signed-off-by: Emerson Knapp <[email protected]>
Signed-off-by: Emerson Knapp <[email protected]>
Signed-off-by: Emerson Knapp <[email protected]>
e948e95
to
f597525
Compare
@MichaelOrlov updated based on review - re-review requested (sorry for slow turnaround, didn't work during japan roscon trip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emersonknapp Looks good enough to be merged with green CI.
Although it could be even better if address performance impact for calling get_metadata
from storage for each seek operation.
auto metadata = storage_->get_metadata(); |
In current implementation at least in sqlite3
storage plugin this is pretty much expensive call.
We do request to the database for pulling out information about all topics and counting all messages in essence internally in db engine it will translates to iterating over each message give results from such SQL request.
I think we can optimize this place if we will readout and calculate values for metadata only once when we are opening storage and keep metadata in private member variable ready to be returned when needed by request from get_metadata()
method.
Although there are one caveat in such approach. We need to take in to account situation when get metadata could be called for storage which was opened with read_write
or append
io_flag
.
We need to readout metadata only if io_flag == READ_ONLY
.
We can built in somewhere here
rosbag2/rosbag2_storage_sqlite3/src/rosbag2_storage_sqlite3/sqlite_storage.cpp
Lines 195 to 197 in e948e95
if (is_read_write(io_flag)) { | |
initialize(); | |
} |
and will need to keep this
io_flag
as private member variable to be able to check it in get_metadata()
to be able to decide if we need to do a fresh metadata readout or we can return just cached value saved in private member variable during storage opening.
@emersonknapp Thoughts? May be implement in follow up issue?
Gist: https://gist.githubusercontent.com/MichaelOrlov/310884637693cd81b243857191413684/raw/cb6b5965073a9c3c99085a1d4f69258479aa7998/ros2.repos |
@emersonknapp Ups I occasionally merged this PR without running CI.
|
As regards my suggestions about improvements with caching metadata in |
All fine. I am deleting brunch and closing #1151 |
Thank you for the follow through! I meant to say that yes, I think other changes should be followups to this one, keep PRs small |
@mergify backport humble |
…nally for the correct file (#1117) * Don't reopen file for every seek if we don't have to. Search directionally for the correct file Signed-off-by: Emerson Knapp <[email protected]> * Fix use of load_next_file in test Signed-off-by: Emerson Knapp <[email protected]> * has_prev_file and load_prev_file API instead of has_next(bool) Signed-off-by: Emerson Knapp <[email protected]> Signed-off-by: Emerson Knapp <[email protected]> (cherry picked from commit 0c7c352) # Conflicts: # rosbag2_cpp/src/rosbag2_cpp/readers/sequential_reader.cpp
✅ Backports have been created
|
Use case:
rqt_bag
uses frequent seek (for every message). This might not be the best way to use rosbag, but regardless it highlights a big performance issue withseek
- that we reopen the storage file every single time. Change this to directionally look for a file that contains the requested timestamp, then seek within that bag. This makes it so that repeated seeks within the same file keep that file open.Discovered in testing for ros-visualization/rqt_bag#126