Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract.cpp: Fixed handling multi-output files, added some debug #92

Open
wants to merge 2 commits into
base: wasm-main
Choose a base branch
from

Conversation

Michal-Ston-Mobica
Copy link
Collaborator

Changed logic in handling multi-output files to write them in sequence, not chunk-by-chunk.
This should fix some errors in writing output ZIPs and general file handling for some installers.

@Michal-Ston-Mobica Michal-Ston-Mobica self-assigned this Dec 20, 2023
@@ -20,6 +20,8 @@

#include "wasm/fdzipstream/fdzipstream.h"

#define wasm_debug(X) do { if (extractor::get().get_settings().debug_messages_enabled) log_info << X; } while(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that construction here?
do { ... } while(0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a popular way to create a macro wrapper that doesn't produce any errors, no matter how it's used in the code.
wasm_info is just wrapped here with an additional conditional statement, making this output the log only if the debug output is enabled before starting the extraction.

json ret;
ret["status"] = "Aborted by user";
return ret.dump();
wasm_debug("pre-copy chunk=" << chunk.first.first_slice << " " << chunk.first.last_slice
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather consider using different format of logging function like:
wasm_log << "pre-copy chunk=" ....
or
conditional_log(extractor::get().get_settings().debug_messages_enabled) << "pre-copy chunk=" ....

Would be more readable in my opinion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function (or more specifically macro) was meant to be used only for debug and it's only used in wasm-related code, so I don't really see the point of making this more general or rewriting to be used like a stream.
It might be a good idea to make it more readable, but I assumed it's good enough for now and hoped that it could be addressed later, as important issues should resolved first and without much delay.

@@ -20,6 +20,8 @@

#include "wasm/fdzipstream/fdzipstream.h"

#define wasm_debug(X) do { if (extractor::get().get_settings().debug_messages_enabled) log_info << X; } while(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you use inside log_info I would recommend to change this macro name to: wasm_info

@@ -64,8 +70,12 @@ bool file_output::write(char* data, size_t n) {

position_ += n;
total_written_ += n;

if (is_complete()) {
if (extractor::get().get_settings().debug_messages_enabled) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to unify logging handling, looks like you could use the "wasm_debug" here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done on purpose to potentially save some execution time, as there are two consecutive log writes handled by a single if statement, which would make two ifs when used with a wasm_debug.
It's not obvious whether a simple if like this is optimized when compiling to WASM, and the main goal of this commit was to expand on actual extract functionality while keeping some of the added debug as an extra and not to overhaul the whole debug system.
The existing debug calls in extract.cpp weren't even operational before introducing wasm_debug, for an unknown reason.

throw std::runtime_error("Error writing file \"" + output->path().string() + '"');
}
bool success = output->write(buffer, extracted_n, file.size);
if (!success) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you don't use that "success" variable anywhere below, I would recommend to simplify that code to:
if (!output->write(buffer, extracted_n, file.size)) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, although you can probably see this part was already present in the code, probably to make this part a bit more readable.
I agree that it can and probably should be simplified at one point or another, although a general refactor wasn't my goal when making this PR.

Copy link
Collaborator

@mrcebryk mrcebryk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting changes, I would consider improving the format of introduced wasm_debug macro and to add additional logs (or exceptions?) in unexpected cases

total_written_, file_->entry().size, size2, n);
std::cout << "write: is_complete():" << std::endl;
}
if (file_->entry().size != 0 && is_complete() || size2 == total_written_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have different combinations of situations when we don't call:
zs_entryend(zip_, zip_entry_, 0);

I assume some of them might be expected, others not, I would recommend to add proper logs, especially warnings/errors in unexpected cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original is_complete() only checks whether file_->entry().size == total_written_, while this parameter is not always set for an unknown, probably installer-related reason. Thus an additional parameter of size2 with the actual file size that comes from another file descriptor was added to fix this issue and ensure the file is written properly.
This might've not been an issue with the original code, as it was only writing directly to the filesystem, and closing the files was always done at least by the OS when exiting the process, or somehow done via C++ interface.

To sum up: this whole if statement performs a single check - to see if the file was written completely. Maybe it could be rewritten to be a bit more clear, but it's not simply a matter of rewriting is_complete(), as it's not aware of the size2, and it's not a part of the object it's being called on.

Michal-Ston-Mobica and others added 2 commits February 5, 2024 11:27
The intention is to introduce them in dedicated commit while following
the style of already existing logging functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants