-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable length key support and major template refactoring to support key as a template parameter #661
Variable length key support and major template refactoring to support key as a template parameter #661
Conversation
Note Reviews pausedUse the following commands to manage reviews:
WalkthroughThis pull request introduces a comprehensive refactoring of the UnoDB library, focusing on enhancing type flexibility and standardizing key handling across the codebase. The primary change is the introduction of template-based key handling, moving from a fixed Changes
Sequence DiagramsequenceDiagram
participant DB as Database<Key>
participant Encoder as Key Encoder
participant Decoder as Key Decoder
DB->>Encoder: Encode key
Encoder-->>DB: Encoded binary key
DB->>Decoder: Decode key
Decoder-->>DB: Original key
Possibly related PRs
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 38
🔭 Outside diff range comments (3)
art.cpp (1)
Line range hint
34-53
: Well-structured debug routine with potential clarity improvementsThis templated debug method provides good insights into the iterator's internal state by copying the stack before printing. Since this is analytics/diagnostics code, its performance impact is minimal and mostly acceptable. Consider:
- Refactor for clarity: This function has a long loop and multiple output statements. Extracting small helper functions (e.g., a function to print stack entries) could improve readability.
- Consider concurrency states: If the underlying data structure can be mutated by other threads, double-check that copying the stack is sufficient to keep the debug output consistent.
benchmark/micro_benchmark_key_prefix.cpp (1)
Line range hint
170-174
: Consider adding range-based validation for keys.The
insert_keys
function could benefit from input validation to ensure keys are within expected ranges.void insert_keys(Db &test_db, const std::vector<std::uint64_t> &keys) { + for (const auto k : keys) { + if (k > std::numeric_limits<std::uint64_t>::max()) { + throw std::out_of_range("Key value exceeds maximum allowed value"); + } unodb::benchmark::insert_key(test_db, k, unodb::value_view{unodb::benchmark::value100}); } }test/test_art_iter.cpp (1)
Line range hint
539-683
: Consider using test fixtures for common tree structures.While the tests are comprehensive, consider:
- Creating fixtures for common tree structures to reduce setup code duplication
- Using constants for test values instead of magic numbers
- Moving debug output to a test utility
+ class ThreeLeavesTreeFixture : public ::testing::Test { + protected: + static constexpr std::uint64_t k0 = 0xaa10; + static constexpr std::uint64_t k1 = 0xaa20; + static constexpr std::uint64_t k2 = 0xaa30; + + void SetUp() override { + verifier.insert(k0, unodb::test::test_values[0]); + verifier.insert(k1, unodb::test::test_values[1]); + verifier.insert(k2, unodb::test::test_values[2]); + } + + unodb::test::tree_verifier<TypeParam> verifier; + };
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (40)
.gitignore
(1 hunks)CMakeLists.txt
(3 hunks)art.cpp
(2 hunks)art.hpp
(19 hunks)art_common.cpp
(2 hunks)art_common.hpp
(1 hunks)art_internal.cpp
(2 hunks)art_internal.hpp
(3 hunks)art_internal_impl.hpp
(45 hunks)benchmark/micro_benchmark.cpp
(13 hunks)benchmark/micro_benchmark_concurrency.hpp
(6 hunks)benchmark/micro_benchmark_key_prefix.cpp
(8 hunks)benchmark/micro_benchmark_mutex.cpp
(2 hunks)benchmark/micro_benchmark_n16.cpp
(2 hunks)benchmark/micro_benchmark_n256.cpp
(2 hunks)benchmark/micro_benchmark_n4.cpp
(5 hunks)benchmark/micro_benchmark_n48.cpp
(2 hunks)benchmark/micro_benchmark_node_utils.hpp
(18 hunks)benchmark/micro_benchmark_olc.cpp
(2 hunks)benchmark/micro_benchmark_utils.cpp
(2 hunks)benchmark/micro_benchmark_utils.hpp
(7 hunks)examples/example_art.cpp
(3 hunks)examples/example_art_stats.cpp
(1 hunks)examples/example_olc_art.cpp
(1 hunks)mutex_art.cpp
(1 hunks)mutex_art.hpp
(8 hunks)olc_art.cpp
(2 hunks)olc_art.hpp
(20 hunks)portability_builtins.hpp
(1 hunks)test/CMakeLists.txt
(3 hunks)test/db_test_utils.cpp
(2 hunks)test/db_test_utils.hpp
(12 hunks)test/test_art.cpp
(1 hunks)test/test_art_concurrency.cpp
(8 hunks)test/test_art_iter.cpp
(21 hunks)test/test_art_key.cpp
(1 hunks)test/test_art_oom.cpp
(4 hunks)test/test_art_scan.cpp
(37 hunks)test/test_art_span.cpp
(1 hunks)test/test_key_encode_decode.cpp
(1 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeFactor
mutex_art.cpp
[notice] 1-1: mutex_art.cpp#L1
mutex_art.cpp should include its header file mutex_art.HPP. (build/include)
test/db_test_utils.cpp
[notice] 1-1: test/db_test_utils.cpp#L1
test/db_test_utils.cpp should include its header file test/db_test_utils.hpp. (build/include)
test/db_test_utils.hpp
[notice] 501-501: test/db_test_utils.hpp#L501
Should have a space between // and comment. (whitespace/comments)
[notice] 494-494: test/db_test_utils.hpp#L494
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
[notice] 505-505: test/db_test_utils.hpp#L505
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
art_internal.hpp
[notice] 229-229: art_internal.hpp#L229
Missing space after ,. (whitespace/comma)
[notice] 252-252: art_internal.hpp#L252
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
[notice] 535-535: art_internal.hpp#L535
Redundant blank line at the end of a code block should be deleted. (whitespace/blank_line)
mutex_art.hpp
[notice] 94-94: mutex_art.hpp#L94
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
olc_art.hpp
[notice] 208-208: olc_art.hpp#L208
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
[notice] 2055-2055: olc_art.hpp#L2055
Should have a space between // and comment. (whitespace/comments)
[notice] 1834-1982: olc_art.hpp#L1834-L1982
Complex Method
[notice] 2335-2591: olc_art.hpp#L2335-L2591
Complex Method
test/test_art_span.cpp
[notice] 47-47: test/test_art_span.cpp#L47
At least two spaces is best between code and comments. (whitespace/comments)
[notice] 158-158: test/test_art_span.cpp#L158
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
[notice] 91-91: test/test_art_span.cpp#L91
Missing space after ,. (whitespace/comma)
art.hpp
[notice] 94-94: mutex_art.hpp#L94
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
[notice] 158-158: art.hpp#L158
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
[notice] 164-164: art.hpp#L164
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
[notice] 208-208: olc_art.hpp#L208
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
[notice] 987-987: art.hpp#L987
Redundant blank line at the start of a code block should be deleted. (whitespace/blank_line)
[notice] 1007-1007: art.hpp#L1007
Should have a space between // and comment. (whitespace/comments)
[notice] 945-945: art.hpp#L945
Should have a space between // and comment. (whitespace/comments)
art_internal_impl.hpp
[notice] 129-129: art_internal_impl.hpp#L129
Redundant blank line at the start of a code block should be deleted. (whitespace/blank_line)
[notice] 621-621: art_internal_impl.hpp#L621
Do not leave a blank line after "public:". (whitespace/blank_line)
[notice] 1819-1819: art_internal_impl.hpp#L1819
Lines should be <= 80 characters long. (whitespace/line_length)
[notice] 2742-2742: art_internal_impl.hpp#L2742
Lines should be <= 80 characters long. (whitespace/line_length)
🪛 Cppcheck (2.10-2)
test/test_key_encode_decode.cpp
[error] 85-85: There is an unknown macro here somewhere. Configuration is required. If UNODB_DETAIL_DISABLE_CLANG_WARNING is a macro then please configure it.
(unknownMacro)
test/test_art_key.cpp
[error] 85-85: There is an unknown macro here somewhere. Configuration is required. If UNODB_DETAIL_DISABLE_CLANG_WARNING is a macro then please configure it.
(unknownMacro)
art_internal.cpp
[style] 31-31: The function 'dump_key' is never used.
(unusedFunction)
[style] 44-44: The function 'dump_val' is never used.
(unusedFunction)
test/test_art_iter.cpp
[error] 85-85: There is an unknown macro here somewhere. Configuration is required. If UNODB_DETAIL_DISABLE_CLANG_WARNING is a macro then please configure it.
(unknownMacro)
benchmark/micro_benchmark_utils.cpp
[style] 26-26: The function 'destroy_tree' is never used.
(unusedFunction)
test/test_art_span.cpp
[error] 85-85: There is an unknown macro here somewhere. Configuration is required. If UNODB_DETAIL_DISABLE_CLANG_WARNING is a macro then please configure it.
(unknownMacro)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (109)
.gitignore (2)
53-54
: Consider version controlling the shell scripts.The shell scripts
cf_off.sh
,cf_on.sh
, andcompile1.sh
appear to be build or configuration related. If these scripts are part of the build process or developer workflow, they should be version controlled to ensure consistency across the team.Please clarify if these are:
- Developer-specific workflow scripts (in which case they should stay in
.gitignore
)- Part of the build process (in which case they should be version controlled)
Also applies to: 57-57
57-58
: Verify the intention behind ignoring the docs directory.Documentation is typically version controlled to maintain a history of changes and ensure all developers have access to the latest documentation. If
/docs/
contains generated documentation, consider:
- Version controlling the source documentation
- Creating a specific ignore pattern for generated docs (e.g.,
/docs/generated/
)✅ Verification successful
Ignoring the docs directory is correct for Doxygen-generated documentation
The presence of a Doxyfile confirms this is a C++ project using Doxygen for documentation generation. It's standard practice to ignore the generated documentation directory while keeping the source documentation (code comments) and Doxfile under version control. The documentation can be regenerated at any time from these versioned sources.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if docs directory exists and what type of files it contains # to determine if these are source docs or generated docs # Check if docs directory exists if [ -d "docs" ]; then echo "Contents of docs directory:" find docs -type f -exec file {} \; fiLength of output: 17
Script:
#!/bin/bash # Search for documentation tool configurations and references echo "=== Searching for documentation tool configurations ===" fd -e yaml -e yml -e json -e js -e toml | xargs rg -l "docs?|documentation" || true echo -e "\n=== Searching for documentation generation patterns ===" rg -i "generate.*docs?|docs?.*generate" || true echo -e "\n=== Checking recent git history for docs directory ===" git log --diff-filter=D --summary | rg "delete.*docs/" || trueLength of output: 3071
art.cpp (2)
60-61
: Convenience overload looks fineThis simple overload that redirects to
std::cerr
is straightforward and clear. No further changes needed.
68-70
: Template instantiations appear consistentThe explicit instantiations for
std::uint64_t
andunodb::key_view
are aligned with the new variable-length key support. This ensures the templated class is available to the linker for these key types.olc_art.cpp (3)
Line range hint
38-59
: Robust iterator dump, but TODO comment needs actionThe debug logic here is similar to
art.cpp
. It prints valuable iterator state and versioning. The TODO reminds us to dump actual variable-length key bytes. Consider addressing it soon to maintain consistency with the refactor objective of supporting variable-length keys.Would you like assistance in dumping those prefix bytes to help debug variable-length keys?
68-70
: No issues with the convenience methodRedirecting the dump to standard error is consistent with the approach in
art.cpp
. No issues noted.
78-79
: Template instantiations for OLC DBBy explicitly instantiating
olc_db
for bothstd::uint64_t
andunodb::key_view
, the codebase aligns with the new variable-length key support. This is presumably needed for linking.art_internal.cpp (4)
16-16
: Header inclusion is justifiedIncluding
"heap.hpp"
is appropriate, as the code now callsallocate_aligned
andfree_aligned
. No issues found.
44-49
: Value dump logic is consistent
dump_val
outputs the value in hex, mirroringdump_key
. Though static analysis flags it as unused, it can be retained for on-demand debugging. No immediate issues found.🧰 Tools
🪛 Cppcheck (2.10-2)
[style] 44-44: The function 'dump_val' is never used.
(unusedFunction)
51-60
: Specializedbasic_art_key::dump
usage is acceptableThese explicit specializations ensure that each key type can be dumped via
std::cerr
. This is a good approach for debugging. No further action needed.
61-75
: Capacity expansion function is correct but deserves bounds checkThe logic for computing next power-of-two capacity is properly encapsulated. Confirm that
off
cannot exceedcap
to avoid over-reads instd::memcpy
. If guaranteed by higher-level logic, consider a debug assertion here for safety.mutex_art.hpp (12)
16-18
: Includes look good
No issues with adding<type_traits>
and<cassert>
. They are aligned with the code's needs for type inspection and assertions.
21-21
: Assert header inclusion
Including"assert.hpp"
is consistent with the rest of the codebase. No concerns here.
26-37
: Template introduction and type aliases
Definingmutex_db
as a template class parameterized byKey
, alongsideusing key_type = Key;
and updatedget_result
usingtypename db<Key>::get_result
, appears correct and enhances type safety.
39-45
: Internal get0 method
Good use of locking withstd::unique_lock
to protect the ART. Returning a valid lock only when a key is found helps release resources early if not needed.
53-57
: insert0 method concurrency
Lock acquisition viastd::lock_guard
is correct for internal insertion. This method’s naming clarifies it’s an internal routine.
60-63
: remove0 method concurrency
Likewise, removing an entry under astd::lock_guard
ensures thread safety. The naming convention matches insert0.
65-84
: Constructor, get, and empty
- The default constructor is marked
noexcept
, which is beneficial for exception-safety.get(Key search_key)
properly wraps the underlyingget0
call with correct conversion toart_key_type
.empty()
uses a lock guard internally but is short enough to not raise concurrency concerns.
85-100
: Public insert method
Insert logic is now aggregated ininsert0
, which is good. The pure attribute onbool insert(...)
is a nice optimization instructing that the function has no side effects beyond returning a value, from the compiler’s standpoint. Everything looks well-structured.🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 94-94: mutex_art.hpp#L94
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
101-109
: Public remove method
Similar approach to insert. The code is clear, usesart_key_type
, and callsremove0
. Good usage of locks.
112-112
: Iterator and scan methods
All methods properly lock the mutex during the scan and delegate to the underlyingdb
’s scanning logic. This is consistent with the concurrency model.Also applies to: 122-122, 153-155, 177-179
200-200
: Stats methods
Calls todb_.template get_node_count<NodeType>()
,get_growing_inode_count<NodeType>()
, andget_shrinking_inode_count<NodeType>()
preserve template correctness and read well.Also applies to: 211-211, 222-222
258-258
: Underlying db instance
Storingdb<Key> db_
is consistent with the templated approach. This ensures data is parameterized byKey
.art_common.hpp (9)
21-27
: Extended includes
Including<cstring>
and<span>
is appropriate given the need to work with raw memory in key encoders/decoders.
30-35
: Forward declarations
Declarations oftemplate <typename Key> class db;
andolc_db;
are correct for referencing these classes without pulling in full definitions.
36-45
: Key handling doc
The added documentation clarifies that values and keys are managed as non-owning spans. This is crucial for correctly understanding memory usage and lifetimes.
46-87
:visitor
class template
Designingvisitor
with a protected constructor and friend declarations fordb<key_type>
andolc_db<key_type>
is well structured. This ensures that only intended classes can instantiate or manipulate the visitor’s internals.
91-105
: Initial buffer capacity
DefiningINITIAL_BUFFER_CAPACITY
as 256 is reasonable, balancing typical scenarios for variable length keys. The disclaimers and TODO notes are clear.
106-120
: Utility methods
dump_key
,dump_val
, andensure_capacity
are essential for debugging and buffer expansion. The approach is consistent with the rest of the code.
128-142
:key_encoder
class
Excellent approach for building binary-comparable keys. The layering and usage of an internal buffer with fallback to dynamic allocation is well justified.
180-210
: Integer encoding
Encoding both signed and unsigned integers by offsetting their MSB has clarity and ensures lexicographical ordering. This is a robust approach for key comparisons.
280-381
:key_decoder
class
Decoding logic correctly reverses the process done inkey_encoder
. The symmetrical offset approach for signed and reversed endianness handling for unsigned matches the encoding strategy.test/db_test_utils.hpp (21)
32-33
: Switch to std::map
Replacing#include <unordered_map>
with#include <map>
implies a shift to an ordered map. This is consistent with the new custom comparator-based approach.
51-54
: Template instantiations
extern template class unodb::db<std::uint64_t>;
etc. helps reduce compile overhead for repeated instantiations across translation units.
57-60
: Thread type alias
Conditionally usingstd::thread
vs.unodb::qsbr_thread
for OLC DB ensures consistent concurrency under different DB implementations.
75-81
: Test values
Storing various test bytes in arrays is a straightforward approach. The newly introduced array-basedvalue_view
s are consistent with variable-length values.
91-91
: assert_value_eq for mutex_db
Verifying ownership of the lock is a good check. Ensures you actually locked the resource before verifying the data.
Line range hint
106-135
: Result assertion utility
do_assert_result_eq
andassert_result_eq
are well-structured. Checkingkey_found(result)
before verifying the value is robust.
128-135
: OLC-specific variant
With OLC DB, you wrap calls inquiescent_state_on_scope_exit
. Good concurrency management detail.
140-141
: ASSERT_VALUE_FOR_KEY macro
detail::assert_result_eq
approach is consistent with Google Test patterns, simplifying repeated checks.
145-160
: tree_verifier do_insert
Splitting insertion logic with a partial template specialization for OLC DB is sensible. The concurrency approach is correct for each DB variant.
Line range hint
166-194
: do_remove with stats checks
Verifies the old node counts, memory usage, and inodes prior to performing removal. If removal fails, logs the failure and performs a thorough check. Good coverage of corner cases.
223-232
: Removing missing keys
The specialized approach (do_try_remove_missing_key
) for OLC DB is consistent with the main logic. No issues found.
Line range hint
253-299
: insert
Inserting keys and verifying memory usage changes if stats are enabled is thorough. The fallback logic to a tree-based map extends the test coverage.
313-320
: Range insertion
Mapping fromstart_key
tostart_key + count
is a straightforward approach for test coverage.
Line range hint
325-331
: preinsert_key_range_to_verifier_only
Allows populating the verifier’s map without inserting into the DB, useful for advanced test scenarios.
336-336
: Insert preinserted key range
Helper method to finalize insertion for a set of keys, continuing to keep code DRY.
342-353
: Removing keys with concurrency
Similar approach as insert, with specialized logic for OLC DB.
355-366
: try_remove
Non-verifying removal method that simply discards the result. Useful in tests that only validate concurrency or performance.
370-370
: attempt_remove_missing_keys method
Systematically checkserase
fromvalues
and callsdo_try_remove_missing_key
on the DB, ensuring absent keys remain absent in both data structures.
411-412
: check_absent_keys
This helps confirm that no leftover state remains for the specified keys, improving negative test coverage.
483-493
: Custom comparator
The fallback tocompare(lhs, rhs)
only iflhs
/rhs
are spans is correct. Otherwise, a standard<
suffices for integral keys.
509-515
: Type aliases & external template instantiations
using u64_db = unodb::db<std::uint64_t>;
and the external instantiations further unify the test environment with the new template-based database classes.art_internal.hpp (3)
42-43
: Use of non-inline function is correct
The function signature and placement look consistent with the rest of the file. No issues found.
45-55
: Three-way compare function is well-implemented
Utilizingstd::memcmp
followed by length checks is a standard approach for lexicographic comparison.
127-135
: Comparison logic for fixed-width vs. variable-length keys is handled properly
The function cleanly distinguishes betweenstd::is_same_v<KeyType, key_view>
and fixed-width types. No logical flaws are evident.test/test_art_span.cpp (2)
600-610
: Correctness check
Node4 deletion logic and verification steps look well-structured. The test ensures presence and absence of expected keys.
107-107
: Well-handled negative test
Verifying that removing an already deleted key returns false is a good practice to check code robustness.benchmark/micro_benchmark_node_utils.hpp (4)
53-53
: Efficient next_key calculation
The bitwise approach for skipping zero bits is a neat solution. No issues found.
237-239
: Concise key generation
The functiongenerate_keys_to_limit
is straightforward and returns a vector of ascending keys until a limit is reached.
445-447
: Growing nodes assertion
Ensuring the correct number of newly grown inodes helps maintain tree integrity. The checks appear consistent with the logic.
672-673
: Bulk deletion utility is sound
Loop-based deletion over a vector of keys is simple and effective, aligning with the rest of the benchmark utilities.art.hpp (6)
36-45
: Forward declarations look fine.The template forward declarations for
inode
,inode_4
,inode_16
,inode_48
, andinode_256
appear appropriate. They help organize the code and reference each class later in this file without causing circular dependencies.
59-68
: Appropriate usage of template aliases.The
inode_defs
andart_policy
definitions effectively parametrize the adaptive radix tree for different key types and database classes. This design is clear and consistent.
70-80
: Consider verifying the minimal required methods.All relevant typedefs (
inode_base
,leaf_type
, etc.) appear consistent with the broader architecture. As these template aliases are integral to the ART node structure, ensure that all the methods required by the downstream code (e.g., destructor, copy, and move) are adequately tested.
88-120
: High-level structure of thedb
class is solid.The top-level
db
class design demonstrates a thorough approach to encapsulating ART-based operations (get, insert, remove). No immediate red flags spotted. Verify concurrency requirements if multiple threads are expected to interact with this class, though it’s not explicitly stated.🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 94-94: mutex_art.hpp#L94
Line ends in whitespace. Consider deleting these extra spaces. (whitespace/end_of_line)
133-142
: Ensure robust testing ofget(Key)
.The logic to convert an external key type to
art_key_type
is straightforward. Confirm that test coverage includes edge cases (e.g., extremely large or zero-length keys, if feasible) to ensure the function doesn't break with corner inputs.
160-163
: Insertion and removal wrappers align with the underlying logic.The
insert(Key, value_view)
andremove(Key)
methods effectively wrap the internal_0
variants. The approach is consistent: both rely on the same path compression logic. No issues found.Also applies to: 174-177
art_internal_impl.hpp (5)
80-99
: Leaf class enhancements successfully enable variable-length key support.The
basic_leaf
updates (usingart_key_type
and storingvalue_size
) are clear and well-integrated. The runtime checks within the constructor help prevent overflow, and usage ofstd::memcpy
for data storage is appropriate.
Line range hint
311-327
: Deleter implementation is clean.The
basic_db_leaf_deleter
andbasic_db_inode_deleter
do a proper job freeing aligned memory. The stats handling ensures consistent memory usage tracking.
Line range hint
2252-2374
: N48 node logic is coherent.The approach to indirect via
child_indexes
is typical for an ART N48 node. The code carefully ensures that each index references the correctchildren.pointer_array
. No obvious logical errors spotted.
Line range hint
2695-2731
: N256 node transition from N48 node is well-structured.Initialization merges children via a straightforward loop. Verifying no concurrency races or invalid references is recommended if multiple writers can exist. Otherwise, the approach is consistent with the rest of the ART design.
Line range hint
2885-2917
:delete_subtree
anddump
methods provide valuable introspection.The top-down traversal for each child leverages the
for_each_child
helper, promoting code reuse and clarity. This design is good for debugging large ART structures and ensures consistent node cleanup.olc_art.hpp (1)
Line range hint
34-114
: Great use of templating for variable-length key support
This newly introduced templating approach (e.g.,template <typename Key> class olc_db
,olc_inode_*
, andolc_art_policy
) is neatly aligned with the PR objective of supporting variable-length keys. The refactoring to template-based classes enables broader key type flexibility. The code structure looks clean and follows modern C++ best practices by separating policy and data.test/db_test_utils.cpp (2)
19-20
: TODO regarding variable length keys
The mention of adding variable length keys tomutex_art.cpp
is consistent with the PR's overall objective. The code still compiles and runs, so this placeholder is fine for now.
24-26
: Template class instantiations look good
Introducingtree_verifier<u64_db>
,tree_verifier<u64_mutex_db>
, andtree_verifier<u64_olc_db>
aligns nicely with the shift to 64-bit keys. No issues spotted with this approach.mutex_art.cpp (2)
1-1
: File includes its header
Although CodeFactor suggests including the header with the same name/path as the file (mutex_art.hpp
), line 13 already does so. If the tooling insists on an additional override (like “mutex_art.HPP”), you can likely ignore it or conform to your local coding convention.🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 1-1: mutex_art.cpp#L1
mutex_art.cpp should include its header file mutex_art.HPP. (build/include)
27-29
: Template instantiations for mutex_db
The explicit instantiations ofunodb::mutex_db<std::uint64_t>
andunodb::mutex_db<unodb::key_view>
broaden coverage for different key types, which dovetails with the PR’s variable-length key support direction.benchmark/micro_benchmark_utils.cpp (2)
26-28
: LGTM! Parameter rename improves clarity.The parameter rename from
db
toinstance
makes the code more readable and better reflects its purpose.🧰 Tools
🪛 Cppcheck (2.10-2)
[style] 26-26: The function 'destroy_tree' is never used.
(unusedFunction)
33-38
: Verify template instantiation changes across the codebase.The template instantiations have been updated to use
std::uint64_t
explicitly. While the changes look correct, we should verify the impact across the codebase.✅ Verification successful
Template instantiation changes are consistent with the codebase
The explicit
std::uint64_t
template instantiations inbenchmark/micro_benchmark_utils.cpp
align with the usage patterns across the codebase, particularly in benchmark-related code and core implementations. No inconsistencies were found.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining unparameterized db usages and verify consistency # of template parameter usage across the codebase echo "Checking for unparameterized db usages..." rg -l "unodb::(db|mutex_db|olc_db)[^<]" --type cpp echo "Verifying consistent uint64_t template parameter usage..." rg -l "unodb::(db|mutex_db|olc_db)<(?!std::uint64_t)" --type cppLength of output: 877
Script:
#!/bin/bash # Search for all db template instantiations and their parameters echo "Checking db template instantiations..." rg "unodb::(db|mutex_db|olc_db)<" -A 1 --type cpp # Use ast-grep to find template specializations echo -e "\nChecking template specializations..." ast-grep --pattern 'template $_ unodb::$_<$_>' # Look for any Key type definitions or aliases echo -e "\nChecking Key type definitions..." rg "using.*Key" --type cpp rg "typedef.*Key" --type cppLength of output: 13259
benchmark/micro_benchmark_mutex.cpp (1)
26-27
: LGTM! Namespace qualification is consistent with template refactoring.The change to use
unodb::benchmark::mutex_db
aligns with the broader template refactoring effort and maintains consistency across the codebase.examples/example_art_stats.cpp (1)
31-31
: LGTM! Template parameter addition is consistent.The change to use
unodb::db<std::uint64_t>
aligns with the PR's objective of supporting variable length keys and template parameters.examples/example_art.cpp (2)
34-34
: LGTM! Explicit template parameter improves type safety.The change from
unodb::db
tounodb::db<std::uint64_t>
makes the key type explicit, which is a good practice.
90-90
: LGTM! Template parameter consistency maintained.The change to
unodb::db<std::uint64_t>::key_found
maintains consistency with the templated type.test/test_art_key.cpp (2)
63-77
: LGTM! Comprehensive byte-wise key testing.The test thoroughly verifies both the encoding/decoding process and byte-wise access to the key.
32-37
: Consider prioritizing variable length key support.The TODO comments indicate important pending work for variable length keys. This should be prioritized as it's a key objective of the PR.
✅ Verification successful
Variable length key support is indeed a critical pending task
The codebase contains extensive planning for variable length key support across multiple components, with architectural changes needed in:
- Key storage and comparison mechanisms
- Leaf node structure and key prefix handling
- Iterator and scanning implementations
- Test coverage expansion
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for any existing variable length key implementations or tests rg -l "variable.*length.*key|var.*len.*key" .Length of output: 303
Script:
#!/bin/bash # Check the context of variable length key mentions rg "variable.*length.*key|var.*len.*key" -C 2Length of output: 10704
benchmark/micro_benchmark_olc.cpp (1)
25-26
: LGTM! Namespace organization improves code structure.The change to use
unodb::benchmark::olc_db
improves code organization by properly namespacing benchmark-specific components.examples/example_olc_art.cpp (1)
37-37
: LGTM! Consistent template usage across examples.The change to
unodb::olc_db<std::uint64_t>
maintains consistency with other example files and the overall template refactoring.benchmark/micro_benchmark_concurrency.hpp (3)
75-81
: LGTM! Type change and initialization look good.The change from
unodb::key
tostd::uint64_t
is consistent with the PR's objective of standardizing key handling.
170-185
: 🧹 Nitpick (assertive)Consider batching operations for better performance.
The worker functions could potentially benefit from batching operations to reduce synchronization overhead.
Consider applying this pattern:
static void parallel_get_worker(const Db &test_db, std::uint64_t start, std::uint64_t length) { + constexpr std::size_t batch_size = 64; for (std::uint64_t i = start; i < start + length; ++i) { - get_existing_key(test_db, i); + if (i % batch_size == 0) { + // Process batch + for (std::uint64_t j = i; j < std::min(i + batch_size, start + length); ++j) { + get_existing_key(test_db, j); + } + } } }Likely invalid or redundant comment.
142-157
: Consider atomic operations for thread safety.While the parameter rename from
db
toinstance
improves clarity, the use ofstd::uint64_t
for thread-related calculations might need atomic operations to ensure thread safety.Run this script to check for potential thread safety issues:
benchmark/micro_benchmark_n256.cpp (1)
79-175
: LGTM! Namespace changes improve code organization.The consistent update of namespaces from
unodb
tounodb::benchmark
across all benchmark templates improves code organization without affecting functionality.benchmark/micro_benchmark_n48.cpp (1)
89-205
: LGTM! Namespace changes are consistent with other files.The changes mirror those in
micro_benchmark_n256.cpp
, maintaining consistency across the benchmark suite.benchmark/micro_benchmark_n16.cpp (1)
89-205
: LGTM! Namespace changes maintain consistency.The changes follow the same pattern as other benchmark files, ensuring consistency across the codebase.
benchmark/micro_benchmark_utils.hpp (4)
47-50
: Well-structured type aliases improve code clarity.The introduction of type aliases using
std::uint64_t
improves type safety and makes the code's intent clearer. The forward declarations and subsequent type aliases are well-organized.Also applies to: 58-60
88-89
: Consistent parameter naming improves readability.The change from
db
toinstance
in parameter names is more descriptive and consistent with the parameter's role.Also applies to: 99-100, 149-150, 159-160, 208-209, 218-219
268-273
: Consistent template instantiation declarations.The extern template declarations are properly updated to use
std::uint64_t
, maintaining consistency with the type aliases defined earlier.
1-1
: 🧹 Nitpick (assertive)Update copyright notice to include year range.
The copyright notice should include the full year range to accurately reflect the file's history.
-// Copyright 2019-2025 UnoDB contributors +// Copyright 2019-2025 UnoDB contributorsLikely invalid or redundant comment.
test/test_art.cpp (1)
44-46
: LGTM: Type alias update aligns with variable length key support.The update to
ARTTypes
to use explicit uint64 variants (u64_db
,u64_mutex_db
,u64_olc_db
) is consistent with the PR's objective of supporting variable length keys and making key types more flexible.test/test_art_scan.cpp (3)
41-41
: TODO comment aligns with PR objectives.The TODO comment correctly identifies the need for unit tests for
std::span<std::byte>
as part of the variable length key support initiative.
49-55
: LGTM: Clean decode function implementation.The
decode
function provides a clear and concise way to convertkey_view
touint64_t
, which is essential for the test cases. The implementation is straightforward and follows good practices.
86-87
: Type updates align with the new key handling approach.The function signature and vector type updates to use
std::uint64_t
are consistent with the codebase's transition to more explicit key type handling.Also applies to: 96-96
test/CMakeLists.txt (1)
67-70
: LGTM: Comprehensive test target integration.The new test targets (
test_art_key
,test_key_encode_decode
,test_art_span
) are properly integrated across:
- Regular test targets
- Coverage testing
- Valgrind testing
This ensures thorough testing of the new variable length key functionality.
Also applies to: 89-90, 99-106
CMakeLists.txt (2)
708-709
: LGTM! Test targets align with PR objectives.The new test targets (
test_art_key
,test_key_encode_decode
,test_art_span
) appropriately cover the variable length key support and template refactoring changes.
95-95
: Consider keeping the "-Wmissing-declarations" warning.Removing this warning might allow undeclared global functions to slip through, potentially leading to ODR violations and maintenance issues. If there are specific cases where this warning is problematic, consider using targeted suppressions instead.
✅ Verification successful
The "-Wmissing-declarations" warning can be safely removed.
The codebase follows good declaration practices with proper class encapsulation and scoping. All standalone functions are either template/inline functions in headers, test utilities, or necessary system overrides. No undeclared global functions were found that could lead to ODR violations.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any global functions that might be missing declarations # Look for function definitions that might be missing declarations rg -A 2 "^[^#\s].*\b\w+\([^)]*\)\s*\{" --type cppLength of output: 39532
test/test_art_oom.cpp (2)
64-65
: LGTM! Parameter type changes align with the PR objective.The change from
unodb::key
tostd::uint64_t
for thek
parameter in both functions is consistent with the broader refactoring to support variable length keys.Also applies to: 79-80
100-102
: LGTM! Type alias update supports template-based key handling.The update to use
u64_db
,u64_mutex_db
, andu64_olc_db
types aligns with the template refactoring to support key as a template parameter.test/test_key_encode_decode.cpp (1)
48-57
: LGTM! Well-structured test helper class.The
my_key_encoder
class follows testing best practices by:
- Cleanly exposing protected methods for testing
- Maintaining encapsulation while providing necessary test access
benchmark/micro_benchmark.cpp (1)
369-461
: LGTM! Comprehensive benchmark coverage with appropriate configurations.The benchmark templates:
- Use the correct database types (
unodb::benchmark::db
,mutex_db
,olc_db
)- Have appropriate range parameters
- Use suitable time units
5897162
to
d8a047f
Compare
@coderabbitai pause |
@thompsonbry I'll pause the automatic reviews and resolve all existing review comments. However, I don't have the ability to ignore future comments. ✅ Actions performedComments resolved. Reviews paused. |
@coderabbitai pause @coderabbitai resolve @coderabbitai ignore
@coderabbitai pause @coderabbitai resolve @coderabbitai ignore
@coderabbitai pause @coderabbitai resolve @coderabbitai ignore
@coderabbitai pause |
Tip For best results, initiate chat on the files or code changes. @thompsonbry I've already paused the automatic reviews and resolved all existing review comments in response to your previous request. As mentioned before, I don't have the ability to ignore future comments. Let me know if you need anything else! |
Surveys all open TODO(thompsonbry) and all FIXMEs. Resolves those which can be resolved at this time. @coderabbitai pause @coderabbitai resolve @coderabbitai ignore
Minor cleanups. @coderabbitai pause @coderabbitai resolve @coderabbitai ignore
@coderabbitai pause @coderabbitai resolve @coderabbitai ignore
24efd72
to
98ee116
Compare
@laurynas-biveinis Follow up items for you: Review CI
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurynas-biveinis I've done a full pass over the diff and called out a few things for you to take a look at.
@@ -147,7 +156,7 @@ void dense_iter_full_fwd_scan(benchmark::State &state) { | |||
for (auto i = 0; i < full_scan_multiplier; ++i) { | |||
std::uint64_t sum = 0; | |||
auto fn = [&sum](const unodb::visitor<typename Db::iterator> &v) { | |||
sum += v.get_key(); | |||
sum += decode(v.get_key()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurynas-biveinis it looks like decode() was added to keep the read of the internal key and update the sum.
The key_decoder initialization is pretty light weight, but maybe we should change the sum to a sum of hashes of the encoded key? I suspect the decode is cheaper, but we can't decode all keys once we get into variable length keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
const std::string_view s(reinterpret_cast<const char*>(val.data()), | ||
auto fn = [](const unodb::visitor<unodb::db<std::uint64_t>::iterator>& vis) { | ||
const auto& key{vis.get_key()}; | ||
unodb::key_decoder dec{key}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar comment here. In the example, we should be thinking about modeling what we want people to be doing.
Calling decode here makes sense if you have a simple primitive key type. But it does not always make sense for complex key types using the variable length key mechanism.
Just something to think about. Maybe even a different example for the use case of secondary indices, except that we do not support a fixed width type for the Value and to me that would be part of the ideal secondary index pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right and I think we should already split this example. Or just here the decodable key example
if constexpr (std::is_same_v<Key, key_view>) { | ||
return art_key_type{get_key_view()}; | ||
} else { | ||
// Use memcpy since alignment is not guaranteed because the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurynas-biveinis Take a note of this. We are loosing the 8-byte alignment because the key is no longer explicitly present, instead we have the key length, the value length, and the data starts after that. That might be worth making explicit in the leaf data record, e.g, align(8) for the data[].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's align, and use something like std::max_align_t
for that (16 is its common value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's align, and use something like std::max_align_t
for that (16 is its common value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the somewhat extensive dm thread, I think we should just leave this as it is at the instant. The only benchmark regression is in the scan. That code is going to continue to evolve as the variable length key support is brought in and I expect to resolve that regression before #661 is closed out (e.g., we will compare fewer bytes per leaf visited).
The other thing that we need to do to cleanly address the leaf data layout is template the tree for the Value as well as the key. When doing that, we should create a union of the leaf type which is optimized for at least the case where both Key and Value are fixed length.
I do think it is worth optimizing the leaf for the case where just the key or just the value are fixed length. The latter supports the common case of variable length composite keys to record identifiers for secondary indices. The former case (int => bytes) is the case originally supported by the library. I am not sure how much maximum efficiency matters there, but ensuring that we do not regress this matters.
Created #673 to track this
// NOLINTNEXTLINE(modernize-avoid-c-arrays) | ||
std::byte value_start[1]; | ||
std::byte data[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurynas-biveinis This is where we would need to align(8) if we wanted to avoid the memcpy() call above for the uint64_t case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and add asserts that the data start in the buffer is also aligned
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #661 +/- ##
==========================================
- Coverage 98.46% 97.98% -0.48%
==========================================
Files 33 32 -1
Lines 6651 7061 +410
==========================================
+ Hits 6549 6919 +370
- Misses 102 142 +40
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This fixes ARTCorrectnessTest.TooLongValue
Microbenchmarks for varkeys-template e9c3776
|
Benchmarking master at 36a6520 (Jan 18th, 2025)
|
@laurynas-biveinis The set of benchmarks that are by "benchmarks/micro_benchmarks" is substantially different between those two versions. I was not expecting that. Also, the benchmark names have subtle changes (::benchmark was added) which might make it impossible to use tooling to make these comparisons. However, spot checks do not show performance regressions where I can find the same db benchmark in both cases.
|
* Please take up the Deep State errors. I can't reproduce them.
Look OK now?
* There are some CodeFactor complaints about build/include. It
wants the first line of the source file to be the .hpp file. But
we can't do that because of global.hpp. Maybe there is a way to
turn these messages off? Else I will just ignore them.
Let's ignore them now.
|
Can we push a |
|
@laurynas-biveinis A more complete survey of the benchmark comparison vs master. ::benchmark:: means varkeys branch. Results are master then varkeys. The only clear regressions are in the iterator scans. I think that we will pick up that performance once we finish landing the variable length key support because we will stop comparing full keys in the leaves in the scan and switch to tracking the key in the iterator buffer and only comparing the trailing suffix of the key against the leaf. Based on these data, I think that we do not have a blocker to merge based on performance.
|
Closing and replacing with #675 due to a local repository SNAFU. |
See #600
See #634
See #613
See #610
Note: test_art_span.cpp is disabled. This has the tests for variable length keys. It will remain disabled until we merge and then branch again. Right now the focus is to get the key template refactor back to mainline.
@coderabbitai pause
@coderabbitai resolve
@coderabbitai ignore
Summary by CodeRabbit
Release Notes
New Features
Performance Improvements
Bug Fixes
Testing
Refactoring
std::uint64_t
Documentation