Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inform storage plugin about message-level compression #1144

Closed
james-rms opened this issue Nov 1, 2022 · 12 comments
Closed

inform storage plugin about message-level compression #1144

james-rms opened this issue Nov 1, 2022 · 12 comments
Labels
enhancement New feature or request

Comments

@james-rms
Copy link
Contributor

james-rms commented Nov 1, 2022

Description

Right now the message-level compression mode being used is not passed to the storage plugin. This means that without the metadata.yaml file, we have no way of knowing if a db3 or mcap's messages are CDR encoded, or CDR encoded and compressed. This information should be stored inside the bag somehow. Specific plugins can choose their own way of storing this and dealing with that information, but for now, this is an issue tracking the storage plugin API's ability to pass that information down at all.

Completion Criteria

  • storage plugin must know how the messages its storing are encoded, including compression.

Implementation Notes / Suggestions

  • could add the message compression type as a field on TopicMetadata.
  • Could implement some way of passing BagMetadata to the storage plugin. This doesn't seem like a good fit for BagMetadata, which contains lots of other fields which are derived from intrinsic properties of the file, like message count and length.

Testing Notes / Suggestions

** All features in this project need tests. Please give some input on cases that will need to be tested - and how the testing might be implemented. **

@james-rms james-rms added the enhancement New feature or request label Nov 1, 2022
@MichaelOrlov
Copy link
Contributor

As far as I recall we have discussed per message compression and per message encryption during original mcap design discussions. And as far as I remember we agreed to have string field which will describe serialization format, encryption and compression and we agreed that each layer will be spitted by some predefined delimiter and as far as I remember suggestions was to use colon : or double colon :: as delimiter.
For instance if we have CDR serialized message which was encrypted with AES128 and compressed with zstd. The appropriate message format or encoding field would have a value CDR::AES128::ZSTD . And the most appropriate field for that is TopicMetada::serialization_format

.
Adding new special field for indicating compression and encryption will cause a big headache for backward compatibility and doesn't look reasonable IMO.

@MichaelOrlov
Copy link
Contributor

For completeness we have already passing TopicMetada data structure as parameter for the BaseWriteInterface::create_topic(const TopicMetadata & topic) API.

@james-rms
Copy link
Contributor Author

james-rms commented Nov 3, 2022

I think compression type and message serialisation format are separate concepts that any consumer will need to handle separately. How they are encoded in the TopicMetadata is an implementation detail.

Therefore, the choice comes down to one of:

  1. putting serialization format and compression type together in the same field with a :: delimiter
  2. adding a new field to TopicMetadata for the compression type.

I'm in favor of (2), assuming it's technically feasible to do so without breaking ABI compatibility.

The only reason not to choose (2) would be to avoid the pain of dealing with struct compatibility concerns across ABI boundaries. Before choosing I'd like to understand concretely what these concerns are and what we'd have to do to address them, if we chose to go down the (2) route.

If we chose (1), I would want to document a clear definition of the scheme of the serialization_format field that is compatible with existing consumers (since compatibility is the whole point), including a plan for future extensibility with encryption information. This should include answers for cases where a message is encrypted but not compressed and vice-versa.

@amacneil
Copy link
Contributor

amacneil commented Nov 4, 2022

I would like to make a case that compression options are entirely a storage plugin concern, not a global setting.

Different storage plugins will support different compression options. For example:

  • Sqlite storage plugin supports file level or message level compression
  • Mcap storage plugin supports chunk compression (zstd with configurable speed, or lz4)
  • Other plugins may not support compression at all

I think file and message level compression are both strictly worse than chunk compression, and having message encryption as a global setting means that you can currently do silly things like enable both message compression (at the rosbag2 level) and chunk compression (at the mcap storage plugin level).

In addition, storage plugins cannot simply treat the messages as binary blobs. MCAP is specifically designed to be self-contained and allow decoding by non-ROS third party tools, so it is not acceptable for the upstream rosbag package to transform the data without the plugin being informed.

Concrete proposal: move the current compression options (per file, per message) into the sqlite plugin, and let each plugin handle compression individually.

@MichaelOrlov
Copy link
Contributor

@amacneil Your statement about that chunk compression is always better would be true if not considering real world data and in particular camera images and LIDAR data.

Please consider a case when for instance we have stream of data from multiple cameras and we want to do per message compression with JPEG and do it HW accelerated on FPGA. And we want to be a valid case if we are not able to compress in realtime all messages we will write some of them uncompressed.

This use case doesn't align very well in your paradigm that all compression should be moved to the plugin level.

@amacneil
Copy link
Contributor

amacneil commented Nov 4, 2022

JPEG compression is not a responsibility of the recording process. That would require modifying the content of the messages, making them incompatible with their own schema.

The logic you describe (compressing a subset of messages with JPEG) belongs at the application layer using the existing ROS concepts (nodes that subscribe and transform messages).

@james-rms
Copy link
Contributor Author

Copying discussion to issue - below is from @MichaelOrlov .

I've dig deeper in rosbag2 "derbies" and came to the conclusion that we can't provide BagMetadata in the open() calls, alongside rosbag2_storage::StorageOptions& as you would like it. Not even part of it related to the compression.
The reasoning for that is the multiple abstraction layers and the fact that BagMetadata object initializes after calling open(..) method of the storage plugin. And uses some data from storage object for initialization.
For details: - here is the place

instance->open(storage_options, flag);
where from we are calling storage open() method.
And here is the place where we are initializing BagMetadata after creating and opening the storage
storage_ = storage_factory_->open_read_write(storage_options_);
if (!storage_) {
throw std::runtime_error("No storage could be initialized. Abort");
}
if (storage_options_.max_bagfile_size != 0 &&
storage_options_.max_bagfile_size < storage_->get_minimum_split_file_size())
{
std::stringstream error;
error << "Invalid bag splitting size given. Please provide a value greater than " <<
storage_->get_minimum_split_file_size() << ". Specified value of " <<
storage_options.max_bagfile_size;
throw std::runtime_error{error.str()};
}
use_cache_ = storage_options.max_cache_size > 0u;
if (storage_options.snapshot_mode && !use_cache_) {
throw std::runtime_error(
"Max cache size must be greater than 0 when snapshot mode is enabled");
}
if (use_cache_) {
if (storage_options.snapshot_mode) {
message_cache_ = std::make_shared<rosbag2_cpp::cache::CircularMessageCache>(
storage_options.max_cache_size);
} else {
message_cache_ = std::make_shared<rosbag2_cpp::cache::MessageCache>(
storage_options.max_cache_size);
}
cache_consumer_ = std::make_unique<rosbag2_cpp::cache::CacheConsumer>(
message_cache_,
std::bind(&SequentialWriter::write_messages, this, std::placeholders::_1));
}
init_metadata();

init_metadata() defined here https://github.com/ros2/rosbag2/blob/rolling/rosbag2_cpp/src/rosbag2_cpp/writers/sequential_writer.cpp#L65-L79
The things even worsen since init_metadata() overloaded in SequentialCompressionWriter https://github.com/ros2/rosbag2/blob/rolling/rosbag2_compression/src/rosbag2_compression/sequential_compression_writer.cpp#L111-L118and calling from base SequentialWriter class.
I am considering to create separate method update_metadata(BagMetadata &) for storage writer interface to call it after init_metadata() .

@james-rms
Copy link
Contributor Author

james-rms commented Nov 6, 2022

Let me also register the standing goals agreed on at the working group here on this issue:

  • rosbag2 will updated API to pass the metadata information down to the storage plugin indicating if message has extra layer of modifications such as compression or encryption so the storage plugin can record it as necessary
  • mcap storage plugin will make a best effort to determine if it can use the ros2 profile when writing an mcap file. If it determines (based on the metadata or any missing schemas) that it cannot ensure a ros2 profile it will leave the profile name in the header section blank but continue to make a best effort to record the data and schema names.
  • Agreed to not change the current API which is passing TopicMetada rather to alter the format of the current existing TopicMetadat::serialization_format field which is defined as a string and usually has value cdr for rosbag2 recording with DDS rmw.
    • It was suggested to rename it to the message_encoding to confirm with corresponding field from mcap channel information https://github.com/foxglove/mcap/tree/main/docs/specification#channel-op0x04
    • Also it was suggested to allow adding a series of keywords of the applied modifications to the message separated by the delimiter, double colon “::”. For instance if we would have per message compression with JPEG and then encryption with AES. The TopicMetadat::message_encoding field would have a value CDR::JPEG::AES128. Therefore it will be possible to detect what sequence of transformations should be done on the playback stage to restore the original message data type which was transferred over the wires for a particular topic. Perhaps we will need to update mcap spec and rosbag2 documentation about these changes.
  • Created follow up issue with request for changes in mcap specification RFC: Change the "ros2" profile to allow CDR-encoded messages with no preceding Schema record

@james-rms
Copy link
Contributor Author

The reasoning for that is the multiple abstraction layers and the fact that BagMetadata object initializes after calling open(..) method of the storage plugin. And uses some data from storage object for initialization.

The proposal of dealing with this by adding an update_metadata method seems hacky to me.

I think we should change tactic and not try to pass all of BagMetadata to the storage plugin.

At a minimum the storage plugin needs only these data:

  • Compression format
  • Compression mode

My alternate proposal is:

  1. come up with a new struct RecorderMetadata which is defined as "metadata about the recorder's state that would be required by a bag reader when decoding the bag contents". This would include:
  • Compression mode
  • Compression format
    Any future pluggable component of rosbag2 (such as encryption, for example) would likely need to add something to this structure.
  1. Rev the version of BagMetadata to 7, and replace the compression mode and format fields with a RecorderMetadata instance.
  2. Add an open() overload with a recorder_metadata argument to SequentialCompressionWriter, SequentialWriter, and to the storage plugin API.

@MichaelOrlov
Copy link
Contributor

MichaelOrlov commented Nov 8, 2022

@james-rms I've tried to spend some more time trying to consider your proposal with transmitting compression_mode in some form as you mentioned to be as part of newly created RecorderMetadata or even better naming as RecorderInfo or WriterOptions data structure to the storage::open(..) method.
I am sorry, but I can't come to the any descent solution with current rosbag2 design.

The problem is that SequentialCompressionWriter inherited from SequentialWriter

SequentialCompressionWriter::SequentialCompressionWriter(
const rosbag2_compression::CompressionOptions & compression_options)
: SequentialWriter(),
compression_factory_{std::make_unique<rosbag2_compression::CompressionFactory>()},
compression_options_{compression_options}
{}

and there are no descent way to pass such parameters aka WriterOptions with rosbag2_compression::CompressionOptions to the base SequentialWriter class to be able to pass this parameter further on to the
storage_ = storage_factory_->open_read_write(storage_options_);

To be more clear. Of course we can pass some WriterOptions with rosbag2_compression::CompressionOptions to the SequentialWriter constructor but it will be incorrect from design point of view if we will pass it from SequentialCompressionWriter. Because SequentialWriter shall not be created with WriterOptions::CompressionOptions::compression_mode == message.

If you can come up with less "hucky" solution with concrete implementation in code which align with current rosbag2 design - I am open to consider it.
Unfortunately, currently I don't see a better design solution rather then use newly created update_metadata API from #1149.
Btw we will call update_metadata right after storage open(..) and guarantee that it will be before any write(message) calls.

@james-rms
Copy link
Contributor Author

james-rms commented Nov 18, 2022

Actions from sync:

  • make ros2 bag record —-compression-mode message print some warning indicating that a future version will only support this compression mode with sqlite3, please add -s sqlite3
  • Update the MCAP storage plugin to print a message and exit when it detects an incompatible compression mode
  • need to make community aware of this in advance. Do some announcement to say that we’re going to do this, ensure we’re not making things worse for community. Do this as part of the MCAP announcement.
  • At some point before the Iron release: add some functionality for storage plugins to register their own CLI args
    • Roman: note that users currently configure storage plugins via yaml files
  • For older distros: print some warning if users select MCAP + message compression

@ros-discourse
Copy link

This issue has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/psa-default-ros-2-bag-storage-format-is-changing-to-mcap-in-iron/28489/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants