Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

[Join] Zero-copy storage column check. #623

Merged
merged 1 commit into from
Aug 23, 2023

Conversation

Devjiu
Copy link
Contributor

@Devjiu Devjiu commented Aug 8, 2023

This commit adds check for valid pointer to column buffer to skip copying. Most effective with enabled enable-non-lazy-data-import option. Checks number of chunks in storage.

Partially resolves: #574

@Devjiu Devjiu requested a review from ienkovich August 8, 2023 12:43
@@ -187,6 +187,53 @@ std::unique_ptr<AbstractDataToken> ArrowStorage::getZeroCopyBufferMemory(
return nullptr;
}

std::unique_ptr<AbstractDataToken> ArrowStorage::getZeroCopyColumnData(
const ColumnRef& col_ref) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost copy-paste of getZeroCopyBufferMemory. Should we unite this methods, or add private utility method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking it as temporary solution, should be removed in further prs.

@@ -436,6 +436,12 @@ ResultSetRegistry::getZeroCopyBufferMemory(const ChunkKey& key, size_t num_bytes
: nullptr;
}

std::unique_ptr<AbstractDataToken> ResultSetRegistry::getZeroCopyColumnData(
const ColumnRef& col_ref) {
// UNREACHABLE();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how to handle this case correctly, should we support it?

@Devjiu Devjiu force-pushed the dmitriim/join_column_data_zero_copy branch 2 times, most recently from a39fffe to 35046cb Compare August 8, 2023 13:57
@Devjiu Devjiu force-pushed the dmitriim/join_column_data_zero_copy branch 2 times, most recently from f9f7000 to b42d496 Compare August 9, 2023 16:10
@Devjiu Devjiu marked this pull request as ready for review August 10, 2023 11:09
, direct_columnar_conversion_(false)
, thread_idx_(thread_idx) {
auto timer = DEBUG_TIMER(__func__);
const bool is_varlen = target_type->isArray() || target_type->isString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, we have fixed-length arrays. Proper check is target_type->isVarLen().

size_t arrow_elem_size = fixed_type->bit_width() / 8;
size_t elems = elem_size / arrow_elem_size;
CHECK_GT(elems, (size_t)0);
auto data_to_fetch = table.col_data[col_idx];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto data_to_fetch = table.col_data[col_idx];
auto &data_to_fetch = table.col_data[col_idx];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed PTAL

@@ -203,7 +248,7 @@ void ArrowStorage::fetchFixedLenData(const TableData& table,
// to get a proper slice.
size_t elems = elem_size / arrow_elem_size;
CHECK_GT(elems, (size_t)0);
auto data_to_fetch =
const auto& data_to_fetch =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slice creates a new shared_ptr, therefore you cannot use reference here and all other similar places.

@Devjiu Devjiu force-pushed the dmitriim/join_column_data_zero_copy branch from ec29e93 to a3cf4a0 Compare August 21, 2023 21:11
This commit adds check for valid pointer to column buffer to skip
copying. Most effective with enabled `enable-non-lazy-data-import`
option. Checks number of chunks in storage.

Partially resolves: #574

Signed-off-by: Dmitrii Makarenko <[email protected]>
@Devjiu Devjiu force-pushed the dmitriim/join_column_data_zero_copy branch from a3cf4a0 to 0b46841 Compare August 21, 2023 21:54
@ienkovich ienkovich merged commit 3ac48fe into main Aug 23, 2023
@ienkovich ienkovich deleted the dmitriim/join_column_data_zero_copy branch August 23, 2023 18:50
@Devjiu Devjiu mentioned this pull request Sep 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Perf][Bench] Join is slow on big tables.
2 participants