From f346e7407c423013a4bdd118715cc8b80ac79ded Mon Sep 17 00:00:00 2001 From: Trevor Steil Date: Mon, 15 Apr 2024 19:18:20 -0500 Subject: [PATCH] Preparing v0.6 Release (#200) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Increment YGM version number in CMakeLists.txt * Sends comm:cerr() to std::cerr instead of std::cout (#78) * feature/msg_tweak (#77) * Increment YGM version number in CMakeLists.txt * changed messages to use instead of hard-coded value. * Cleaned up cmake build, and moved the library target description to one place. Co-authored-by: Trevor Steil * Defaults CMAKE_BUILD_TYPE to Release (#76) * Initial ygm::io::multi_output functionality (#75) * Initial ygm::io::multi_output functionality * Removes unnecessary closing of ofstreams in destructor * Checks prefix path provided to ygm::io::multi_output does not exist as a regular file, and changes filename variable name to subpath * Adds tests for ygm::io::multi_output * Adds ygm::io::daily_output and simple tests * Adds test to check correct files are created by ygm::io::multi_output * Fixing clean-up of test files written * Forces filename_prefix given to ygm::io::multi_output constructor to be a directory * Feature/ci boost 1.78 (#80) Adds Boost 1.78 to CI. * Initial ygm::array implementation (#79) * Initial ygm::array implementation * Adds safety checks of array sizes * Adds barrier after resizing completes * Changes name of async_put to async_set * Adds array::async_unary_op_update_value and helpers for commonly used binary and unary operators * Adds functions for getting YGM pointers to arrays and array sizes * Adding container maptrix and SpMV. (#68) * Adding maptrix apis and impls. * Adding async_visit API+impl. * Some more details. * Maptrix API changes. * Adding new maptrix design. * Adding SpMV - first take. * Adding Structure * Porting to develop. * Adding SpMV as a standalone function. * Adding pagerank in examples and other restructuring changes. * Maptrix Impl. * Adding for_all over row_id and col_id. * Moving to experimental directory * Delete assoc_vector, replaced with ygm map. * Delete assoc_vector_impl, replaced with ygm map_impl. * Delete maptrix.hpp, in experimental mode. * Delete adj_impl.hpp, in experimental. * Delete csc_impl.hpp, in experimental. * Delete csr_impl.hpp, in experimental. * Delete maptrix_impl.hpp, in experimental. * Delete spmv.hpp, in experimental. * Moved as a part of alg_spmv.hpp. * Moved as a part of alg_spmv.hpp. * Add nicer examples or, tests. * Brain storming spmv row. * Moved within containers. * Moved within containers. * Adding new changes. * Adding timing details. * . * Changing API to insert_if_missing_else_visit. * Fix. * Fix. * Adding norm check. * Added OpPlus OpTimes. * Adding OpTimes. * Adding fixes from pull-request. * Adding fixes. * Delete spmv_row, not supporting for now. * Adding references. * Fixing references. * Changing line parsing. * Changing to 32bit. * Cleaning up webgraph example to use as general SpMV example * Removing extra headers * Switching to webgraph_spmv.cpp as new alg_spmv.cpp * Removing maptrix_visit * Cleaning up alg_pagerank.cpp * Removing unused 2D hasher * Removes unneeded header and changes a function name in adj_impl.hpp * Removing unused code in SpMV * Removing extra headers in column_view_impl.hpp * Removes unused headers in maptrix_impl and passes default value to row_view and column_view * Removes unused headers in maptrix_impl and row_view_impl * clang-format on maptrix.hpp Co-authored-by: Ancy Sarah Sarah Tom Co-authored-by: Trevor Steil * Feature/buffer multi output (#81) * Manually buffers ygm::io::multi_output writing * Adds buffering to ygm::io::daily_output * Updates ygm::io output tests to use buffering * Adds ability for csv_parser to read fields as unsigned integers (#83) * ygm::comm::Layout class (#82) * initial pass on implementing the Layout class. ygm::comm::layout() returns a const reference. * Removed comments and cleaned up ygm::Layout internals. * Reordered ygm::comm::Layout member for consistency. * Added tests for ygm::Layout::is_strided() and ygm::Layout::is_local(), and added functions to get const refs to all local and strided ranks. * Refactored ygm::Layout -> ygm::detail::layout. * Adds missing functions to counting set (#84) * Adds missing function to get a YGM pointer to a counting set * Adds comm() function to counting_set * Fixes typo in counting_set.comm() * Adds test for counting_set's YGM pointer * Adds topk to counting set * Adds barrier to beginning of map's topk * Adds example of counting_set topk * Consolidated ci jobs into a single job with matrixed dispatch on gcc version and mpi types. Also caching boost to avoid downloading/untaring with every job. (#86) * updated CI triggers such that PRs to main and develop and pushes to feauture/** and hotfix/** branches trigger jobs. (#88) * Adds possibility to send during processing of receive queue (#91) * Adds additional opportunities to flush buffers while processing received messages * Tweaked contributing guidelines to reflect CI changes and fixed some … (#89) * Tweaked contributing guidelines to reflect CI changes and fixed some typos. * Update CONTRIBUTING.md Co-authored-by: Trevor Steil * Fixes bug in ygm::io::line_parser based on the split between files occurring on a newline. (#93) * Added get_ygm_ptr() to disjoint sets (#92) Co-authored-by: Sudharshan Srinivasan * Switches ygm::container::array to block partitioning (#95) * Switches ygm::container::array to block partitioning * Gives last rank fewer elements during resize to avoid issues in for_all * Removes ability to resize ygm::container::array. No longer sensible with block partitioning * Properly sets local block size of last rank when block is full-sized * gather data to single std::vector added (#94) * gather data to single std::vector added * cleaned up collective operation, fixed test case, added to all support * Update bag_impl.hpp m_res -> p_res removes unnecessary `mailbox` from `gatherer` lambda passes `outer_data` to `gatherer` by reference uses m_comm to build ygm_ptr (preferred method) * Update bag_impl.hpp Makes vector arguments to lambdas in gather_to_vector const Co-authored-by: Dozier Co-authored-by: Trevor Steil * Hotfix/remove local receive (#96) * Adds ygm_tag to comm::impl * Removes local_receive function * Feature/interrupt mask (#97) * Adds initial ygm::detail::interrupt_mask implementation * Adds test for ygm::detail::mask_interrupt * Adds detail::interrupt_mask to map_impl to prevent iterator invalidation during map visits * Adds missing include statement for interrupt_mask in map_impl * Fixes namespacing for interrupt_mask use in map_impl * Gets m_comm from pmap in map_impl::async_visit_group * Adds missing mpi_typeof function for floats (#99) * Adds comm() function to set and multiset (#100) * Updates version number given in example CMake snippet (#101) * Added small fixes to clear gcc12 compiler warnings. (#102) * Feature/single threaded overhaul (#103) Significant overhaul removing listener thread and MPI_THREAD_MULTIPLE. * Removal of the listener thread. Removes requirement for MPI_THREAD_MULTIPLE. Improves performance substantially for the triangle counting benchmark. * NR & NLNR routing are supported. * Added the ability to capture primitive values in the async lambda to avoid Cereal overhead. This improves performance a bit, but mostly it makes user code easier to read/debug. Caution: there are no safety rails to prevent capturing a local pointer (but does prevent local references). * Added a full environment setting system, and it now controls the major settings in the runtime. * comm::welcome() added a welcome banner that prints out the current configuration and MPI settings. * API minor change removed the optional parameter to comm::comm(…, int buffer_capacity) that controlled the buffer size. Now it must be controlled by the environment. * Added new mechanism to indicate remote dispatch functions via static initialization. * Removes x86intrin.h inclusion in comm_impl (#104) * Inserted a std::stringstream into ygm::comm::cout()-style functions so that multi-rank print operations are less likely to get garbled. (#107) * Feature/ci update (#108) * Adds Github action to run GCC-8 tests on Ubuntu 20.04 as it is unavailable on Ubuntu 22.04 * Updating version of checkout action to get rid of javascript warning in Github Actions * Updating version of cache action to get rid of javascript warning in Github Actions * Develop arrow (#98) * arrow parquet file reader * arrow parquet file reader test * Update arrow_parquet_stream_reader.cpp Fixes typo in schema_to_string name * Update arrow_parquet_parser.hpp Fixes typo in schema_to_string name * arrow parquet file reader - updated how files in a directory are read Co-authored-by: Trevor Steil * Small updates to Parquet parser (#109) * Updates Parquet reader tests and examples * Removes commented include * Updates arrow_parquet_parser to only check files from rank 0 during construction * Arrow/Parquet CI (#110) * arrow parquet file reader * arrow parquet file reader test * Adds installation of Apache Arrow to Github CI * Using Arrow version 9.0 * Downloading Arrow 9.0.0 for CI * Trying Arrow 10 for CI * Downloading newest Apache Arrow and installing 9.0.0 in CI * Trying to download Apache Arrow 9.0.0 source instead * Trying version range for Arrow * Changes CI runner to install Arrow 10.0 and explicitly check for Arrow 8, 9, and 10 in CMake to work around issues with Arrow version compatibility. * Investigating output test_arrow_parquet_stream_reader to find cause of failure in CI * Adds guards around finding different versions of Arrow Co-authored-by: tahsinreza * Feature/nlnr bcast (#105) * Adds NLNR broadcast * Removes debug statements * Removes old bcast code * Reorganizes examples (#112) * Bumps version number to 0.5 (#111) * Release Prep * Adds dummy header to bcast messages when routing is used to process messages properlyin handle_next_receive (#120) * Loops through test_comm once for each routing scheme (#119) * Added support for (key, value) lambdas while still supporting (kv_pair) lambdas for ygm::container::map (#123) * Changed local lambda signature of ygm::container::map to expect separate (key, value) pairs. * Made ygm::container::map's expected lambda signature backwards compatible. * introduced constexpr compile-time guard to check that remote lambdas ygm::container::map adhere to legacy signature expectations. * ygm::container::map can now accept remote lambdas whose signatures expect either pairs or separate key, value arguments. pair visitors with optional map pointer arguments and no visitor arguments MUST specify that the second argument is a pair to compile correctly. * Update map_visit_optional_arguments_legacy.cpp * Update map_impl.hpp --------- Co-authored-by: Trevor Steil * adding inline definition for release_assert_fail in order to support mulit-object targets; fix for Issue #126 (#127) * Feature/routing consistency (#125) * Initial comm_router implementation * Removes comments, adds user access to a comm's router, and spells out assumptions on routing in comm_router.hpp * Changes bcast to use same remote channels as new NLNR * wip/reducing adapter (#129) * Adds ygm::container::map::async_reduce operation * Fixes bug in test_map * Initial reducing_adapter for ygm::container::map without reduction tree * Adds ygm::container::array::key_type as an alias to Index for use in ygm::container::reducing_adapters * Adds container_traits for inspecting YGM container types and moves always_false from map_impl to ygm/detail/ygm_traits.hpp for use in other contexts * Adds ygm::container::reducing_adapter for use with ygm::container::array using functionality from ygm/container/detail/container_traits.hpp to handle both types * Adds is_counting_set and provides tests for container_traits * Stores reduction operation in ygm::container::reducing_adapter * modified ygm::container::bag::for_all() to support separated (first, second) lambdas (#128) * created ygm::container::detail::bag_impl::for_all_pairs() to be used with pair bags. * created make_similar() functions for map and array containers that return empty containers with the same comm and default value (and size for arrays) * Added missing cereal include for std::pair * moved template metaprogramming boilerplate into its own header. * Added special functionality to bag::for_all() so that it can accept split (first, second) signatures. * removed vestigial header * Added compiler guard to bag::for_all() and added more helpful compiler error messages. * removed the make_similar function in favor of something more disciplined in the future. * removed make_similar from multimap * remove legacy pair signature support for ygm::container::map (#132) * removed support for pair arguments in local and remote map lambdas. * Added more informative compiler error messages for map lambda signature checks * Removed outdate comment text [skip ci] * minor cleanup of compiler error message [skip ci] * Added support for local lambdas with (value_type&) signatures for arr… (#133) * Added support for local lambdas with (value_type&) signatures for arrays with constexpr compiler guards. * Opens ygm::container::array::owner() call to public API and adds ygm::container::array::is_mine() operation to match other containers (#136) * Reducing Adapter Reduction Tree (#139) * Adds basic caching layer to reducing_adapter without multi-hop caching * Adds reduction tree to reducing_adapter * Adds sanity check that reducing_adapter cache slot is empty after flushing * Fixes bug where reducing_adapter cache was declared non-empty when a value was placed in the underlying container * Adds missing check of pthis in reducing_adapter * Moves reducing_adapter class to ygm::container::detail * Added compile-time guards to all local and remote container lambdas (#138) * placed compiler guards on remote array lambdas * Added compiler guards to local set lambdas * changed local lambda signatures of disjoint_set to separated [](const value_type &, const value_type &) format to match map. Also added compiler guards to disjoint_set local and remote lambdas. * Feature reduce by key (#144) * Added reduce_by_key and started new traits features. * Adds communicator collectives. (#143) * Adds communicator collectives. * fixed MPI_Comm. * bcast & is_same * Added rank-aware RNG wrapper. It can be modified with different rank/… (#140) * Added rank-aware RNG wrapper. It can be modified with different rank/seed strategies. * updated random namespace to mimic STL * fixed namespace device -> engine * removed shared_random_device. Will reintroduce when we add a post barrier callback concept * moved most random machinery into ygm::detail and made std::mt19937 the default random engine * Hotfix/recursive double receive (#147) * Adds .git to Git repos fetched in CMakeLists.txt (#150) * Added new set methods. (#151) * Update comm_impl.hpp (#152) Removed old debug asserts. * static_assert no longer triggers when using a lambda with disjoint_set::async_union_and_execute that requires a pointer to the disjoint set (#158) * Feature/bag shuffle (#130) * Added local and global shuffles to bag container * Added test case for shuffles * Cleaning up a little code * Updated global bag shuffle to accept RNG as well * Updated bag shuffles to utilize ygm:default_random_engine * Removed some comments * Adding template for RandomEngine and functions where no rng argument is passed * Finished templating shuffle functions properly * Bugfix/disjoint set logic (#159) * Overcomplicated disjoint set with path splitting storing parents and their ranks * Simplifies disjoint_set * Adds missing const in async_union_and_execute * Avoid running up tree when other_item is my_parent * Avoid running up tree when other_item is my_parent * Finishes initial path-splitting union-by-rank disjoint set implementation * Adds missing barrier at beginning of all_compress() * Bugfix/disjoint set optional pointer (#160) Stops static_assert from triggering when disjoint_set::async_union_and_execute lambda requires a pointer to the disjoint set * Feature/bag balancing (#146) * Rough draft for rebalencing bag function * WIP: Second iteration of bag rebalance code * Added second iteration of rebalance method. * Appended rebalance tests to test_bag.cpp * Updated rebalancing algorithm to reduce space complexity. Added local_pop(int n) function to pop multiple values at once. * Updated test to reflect bag rebalancing sizes being congruent with ygm arrays * static_assert no longer triggers when using a lambda with disjoint_set::async_union_and_execute that requires a pointer to the disjoint set (#158) * Feature/bag shuffle (#130) * Added local and global shuffles to bag container * Added test case for shuffles * Cleaning up a little code * Updated global bag shuffle to accept RNG as well * Updated bag shuffles to utilize ygm:default_random_engine * Removed some comments * Adding template for RandomEngine and functions where no rng argument is passed * Finished templating shuffle functions properly * Bugfix/disjoint set logic (#159) * Overcomplicated disjoint set with path splitting storing parents and their ranks * Simplifies disjoint_set * Adds missing const in async_union_and_execute * Avoid running up tree when other_item is my_parent * Avoid running up tree when other_item is my_parent * Finishes initial path-splitting union-by-rank disjoint set implementation * Adds missing barrier at beginning of all_compress() * Bugfix/disjoint set optional pointer (#160) Stops static_assert from triggering when disjoint_set::async_union_and_execute lambda requires a pointer to the disjoint set * Rough draft for rebalencing bag function * Merged with updated develop branch to prep for pull request. * Added second iteration of rebalance method. * Appended rebalance tests to test_bag.cpp * Updated rebalancing algorithm to reduce space complexity. Added local_pop(int n) function to pop multiple values at once. * Updated test to reflect bag rebalancing sizes being congruent with ygm arrays * Several small fixes to make rebalancing run with better space efficiency * Fixed async_insert vector pass by value error. Made small fix for accessing map elements to save memory. * Fixed O(p) time complexity issue when sending values for rebalance * Adding support for Arrow 12.0. (#162) * arrow parquet file reader * arrow parquet file reader * arrow parquet file reader * arrow parquet file reader * arrow parquet file reader * arrow parquet file reader * arrow parquet file reader * arrow parquet file reader * arrow parquet file reader * arrow parquet file reader test * Update arrow_parquet_stream_reader.cpp Fixes typo in schema_to_string name * Update arrow_parquet_parser.hpp Fixes typo in schema_to_string name * arrow parquet file reader - updated how files in a directory are read * merge develop into develop-arrow * adding support for Arrow 12.0 * merge develop into develop-arrow * Added missing for_all test for counting set (#165) Co-authored-by: Stephen Thaddaeus Youd * added tagged bag container (#163) Added tagged_bag container * Feature/container traits (#161) Added ygm_container_type tags to containers. Additionally added compile time functions to check container type. * Move Constructor for Set (#166) * Adds test of std::vector of YGM sets * Fixes size check in test of vector of YGM sets * Adds move constructor to YGM set * Removes comm_impl and bag_impl (#167) * Removes comm::impl * Properly passes lambdas to pack_lambda operations and removes mention of comm::impl from interrupt_mask * Removes comm copy constructor * Changes comm in arrow_parquet_parser to a reference * Removes comm_impl.hpp includes from comments * Removes bag_impl * Added missing limits include (#168) * Feature/comm progress (#171) * Added comm::progress() and comm::wait_until(Fn). * Added comm::process_incoming() * Adding consume_all to set. (#169) * Adding consume_all to set. * Made adapter more clear. * Hotfix/multi output filesystem (#176) * Adds missing filesystem include to multi_output.hpp * Replaces csv_parser.hpp include in ndjson_parser.hpp with line_parser.hpp * Removes unnecessary ygm::io::detail namespace * Renamed 3 local comm member functions. (#173) * fix type of the get_ygm_ptr call in set.hpp (#178) * fix type of set get_ygm_ptr * add a test for the ygm set pointer * also fix the set pointer for multiset --------- Co-authored-by: Grace Jingying Li * Hotfix/arrow_ci (#180) * Adding recent version of Arrow to CMakeLists.txt * make -j * Adds CMake flag to require a version of Arrow to be found, specifically for use in CI to make it obvious when Apache Arrow is not being found * Adds disjoint_set::clear() function (#183) * Feature/context aware progress (#182) * Check calling context in comm.local_progress() * Fix spelling error and increase test trip count * Add Parquet -> JSON Converter (#181) * Change parquet parser wrt schema Specifically, schema() returns physical type rather than logical type. * Add paquet -> json converter * Update Parquet JSON reader example * Arrow 14.0 (#185) Change the cmake file to support Apache Arrow v14.0 * Fixes array partitioning and bag rebalancing to be more balanced. (#187) * Copies all private variables in ygm::container::detail::array_impl copy constructor (#189) * Removes impl from ygm::container::array (#190) * Parallel Parquet File Reading (v2) (#188) * Parallel parquet reader v2 * Add comments * Add tests for parquet reader (#191) * Add tests for parquet reader - Add test for parquet2json convertor - Add additional parallel parquet reading test * Fix wrong indent * Brush up on parque reader test * Updates Regarding Arrow Parquet (#193) * Read values using std::optional in the ParquetToJson converter * Support Arrow v15 * Bugfix/disjoint set compress (#195) * Adds print statements to see if obvious bugs exist in disjoint_set::all_compress() * Removes race condition in disjoint_set::all_compress() by preparing all parent queries before sending any * Removing print statements from disjoint_set * Adds check of parent ranks before compressing disjoint sets * Adds check of parent ranks before compressing disjoint sets * Adds check of parent ranks before compressing disjoint sets * Avoid querying parent during disjoint_set::all_compress() when item is a root * Fixes missing reference for local_rep_query inside of update_rep_functor of disjoint_set::all_compress() * More detailed debugging output in all_compress() * Fixing logic for when a parent request comes for an item that had to query its own parent in current all_compress() round * Fixing logic for when a parent request comes for an item that had to query its own parent in current all_compress() round * Cleaning up all_compress() queries as they return * Cleaning up all_compress() queries as they return * Adding additional debug printing to disjoint_set::all_compress() * Avoids potential race condition where all_compress() is using whether parent of item being queried has returned in place of directly tracking whether item has found its root * Cleaning up disjoint_set::all_compress() code * Removing unused code * Add parquet2variant converter (#194) Co-authored-by: Keita Iwabuchi * Bugfix/disjoint set compress (#196) * Adds missing async_visit function to disjoint_set * Fixes stopping criteria on while loop in disjoint_set::all_compress() * Fixes tracking of ongoing compression updates (#197) * Updating version number to 0.6 (#198) * Removes performance directory (#199) * Fixing Github actions to run on pull requests to master branch --------- Co-authored-by: Benjamin Priest Co-authored-by: Roger Pearce Co-authored-by: ancysarahtom Co-authored-by: Ancy Sarah Sarah Tom Co-authored-by: Sudharshan <32466131+suds12@users.noreply.github.com> Co-authored-by: Sudharshan Srinivasan Co-authored-by: Ryan Dozier Co-authored-by: Dozier Co-authored-by: tahsinreza Co-authored-by: tahsinreza Co-authored-by: jleidel Co-authored-by: Roger Pearce Co-authored-by: Lance <57380441+LFletch1@users.noreply.github.com> Co-authored-by: youd3 <123604071+youd3@users.noreply.github.com> Co-authored-by: Stephen Thaddaeus Youd Co-authored-by: Seth Bromberger Co-authored-by: Grace <68253130+graceli24@users.noreply.github.com> Co-authored-by: Grace Jingying Li Co-authored-by: Preston Piercey <112522643+prestonpiercey-tamu@users.noreply.github.com> Co-authored-by: Keita Iwabuchi Co-authored-by: Keita Iwabuchi --- .github/workflows/ci-test.yaml | 12 +- CMakeLists.txt | 29 +- examples/container/alg_pagerank.cpp | 28 +- examples/container/disjoint_set_cc.cpp | 5 +- examples/container/map_insert_if_missing.cpp | 8 +- examples/container/map_set.cpp | 6 +- examples/container/map_visit.cpp | 7 +- .../map_visit_optional_arguments.cpp | 19 +- examples/container/multimap_visit_group.cpp | 5 +- examples/io/CMakeLists.txt | 17 +- .../io/arrow_parquet_stream_reader_json.cpp | 53 + .../arrow_parquet_stream_reader_variant.cpp | 74 ++ include/ygm/collective.hpp | 172 +++ include/ygm/comm.hpp | 178 ++- include/ygm/container/array.hpp | 133 ++- include/ygm/container/bag.hpp | 66 +- include/ygm/container/container_traits.hpp | 78 ++ include/ygm/container/counting_set.hpp | 52 +- include/ygm/container/detail/array.ipp | 230 ++++ include/ygm/container/detail/array_impl.hpp | 156 --- include/ygm/container/detail/bag.ipp | 271 +++++ include/ygm/container/detail/bag_impl.hpp | 120 -- .../container/detail/disjoint_set_impl.hpp | 647 +++++++---- include/ygm/container/detail/map_impl.hpp | 108 +- .../ygm/container/detail/reducing_adapter.hpp | 128 ++ include/ygm/container/detail/set_impl.hpp | 109 +- include/ygm/container/disjoint_set.hpp | 31 +- .../experimental/detail/adj_impl.hpp | 32 +- .../experimental/detail/algorithms/spmv.hpp | 17 +- .../experimental/detail/column_view_impl.hpp | 16 +- .../experimental/detail/maptrix_impl.hpp | 18 +- .../experimental/detail/row_view_impl.hpp | 14 +- include/ygm/container/map.hpp | 81 +- include/ygm/container/reduce_by_key.hpp | 66 ++ include/ygm/container/set.hpp | 66 +- include/ygm/container/tagged_bag.hpp | 123 ++ include/ygm/detail/assert.hpp | 2 +- include/ygm/detail/comm.ipp | 991 ++++++++++++++++ include/ygm/detail/comm_environment.hpp | 18 +- include/ygm/detail/comm_impl.hpp | 1028 ----------------- include/ygm/detail/comm_router.hpp | 84 ++ include/ygm/detail/comm_stats.hpp | 55 +- include/ygm/detail/interrupt_mask.hpp | 6 +- include/ygm/detail/lambda_map.hpp | 3 +- include/ygm/detail/mpi.hpp | 92 +- include/ygm/detail/random.hpp | 51 + include/ygm/detail/std_traits.hpp | 22 + include/ygm/detail/ygm_cereal_archive.hpp | 1 + include/ygm/detail/ygm_traits.hpp | 66 ++ include/ygm/for_all_adapter.hpp | 50 + include/ygm/io/arrow_parquet_parser.hpp | 241 +++- .../detail/arrow_parquet_json_converter.hpp | 129 +++ .../arrow_parquet_variant_converter.hpp | 122 ++ include/ygm/io/line_parser.hpp | 2 +- include/ygm/io/multi_output.hpp | 7 +- include/ygm/io/ndjson_parser.hpp | 11 +- include/ygm/random.hpp | 18 + performance/CMakeLists.txt | 17 - performance/counter_scaling_test.cpp | 75 -- performance/disjoint_set_union_chain.cpp | 138 --- test/CMakeLists.txt | 23 +- .../parquet_files_different_sizes/0.parquet | Bin 0 -> 1461 bytes .../parquet_files_different_sizes/1.parquet | Bin 0 -> 1684 bytes .../parquet_files_different_sizes/2.parquet | Bin 0 -> 1461 bytes .../parquet_files_different_sizes/3.parquet | Bin 0 -> 1461 bytes .../parquet_files_different_sizes/4.parquet | Bin 0 -> 1658 bytes .../parquet_files_different_sizes/5.parquet | Bin 0 -> 1650 bytes .../parquet_files_different_sizes/6.parquet | Bin 0 -> 1650 bytes .../parquet_files_different_sizes/7.parquet | Bin 0 -> 1461 bytes test/data/parquet_files_json/data.parquet | Bin 0 -> 5165 bytes test/test_array.cpp | 116 ++ test/test_arrow_parquet_stream_reader.cpp | 61 +- .../test_arrow_parquet_stream_reader_json.cpp | 95 ++ test/test_bag.cpp | 157 ++- test/test_collective.cpp | 73 ++ test/test_comm.cpp | 186 +-- test/test_container_traits.cpp | 92 ++ test/test_counting_set.cpp | 47 + test/test_disjoint_set.cpp | 45 +- test/test_map.cpp | 116 +- test/test_multimap.cpp | 90 +- test/test_random.cpp | 48 + test/test_recursion_large_messages.cpp | 56 + test/test_recursion_progress.cpp | 46 + test/test_reduce_by_key.cpp | 56 + test/test_reducing_adapter.cpp | 81 ++ test/test_set.cpp | 168 ++- test/test_tagged_bag.cpp | 63 + test/test_traits.cpp | 83 ++ 89 files changed, 5741 insertions(+), 2365 deletions(-) create mode 100644 examples/io/arrow_parquet_stream_reader_json.cpp create mode 100644 examples/io/arrow_parquet_stream_reader_variant.cpp create mode 100644 include/ygm/collective.hpp create mode 100644 include/ygm/container/container_traits.hpp create mode 100644 include/ygm/container/detail/array.ipp delete mode 100644 include/ygm/container/detail/array_impl.hpp create mode 100644 include/ygm/container/detail/bag.ipp delete mode 100644 include/ygm/container/detail/bag_impl.hpp create mode 100644 include/ygm/container/detail/reducing_adapter.hpp create mode 100644 include/ygm/container/reduce_by_key.hpp create mode 100644 include/ygm/container/tagged_bag.hpp create mode 100644 include/ygm/detail/comm.ipp delete mode 100644 include/ygm/detail/comm_impl.hpp create mode 100644 include/ygm/detail/comm_router.hpp create mode 100644 include/ygm/detail/random.hpp create mode 100644 include/ygm/detail/std_traits.hpp create mode 100644 include/ygm/detail/ygm_traits.hpp create mode 100644 include/ygm/for_all_adapter.hpp create mode 100644 include/ygm/io/detail/arrow_parquet_json_converter.hpp create mode 100644 include/ygm/io/detail/arrow_parquet_variant_converter.hpp create mode 100644 include/ygm/random.hpp delete mode 100644 performance/CMakeLists.txt delete mode 100644 performance/counter_scaling_test.cpp delete mode 100644 performance/disjoint_set_union_chain.cpp create mode 100644 test/data/parquet_files_different_sizes/0.parquet create mode 100644 test/data/parquet_files_different_sizes/1.parquet create mode 100644 test/data/parquet_files_different_sizes/2.parquet create mode 100644 test/data/parquet_files_different_sizes/3.parquet create mode 100644 test/data/parquet_files_different_sizes/4.parquet create mode 100644 test/data/parquet_files_different_sizes/5.parquet create mode 100644 test/data/parquet_files_different_sizes/6.parquet create mode 100644 test/data/parquet_files_different_sizes/7.parquet create mode 100644 test/data/parquet_files_json/data.parquet create mode 100644 test/test_arrow_parquet_stream_reader_json.cpp create mode 100644 test/test_collective.cpp create mode 100644 test/test_container_traits.cpp create mode 100644 test/test_random.cpp create mode 100644 test/test_recursion_large_messages.cpp create mode 100644 test/test_recursion_progress.cpp create mode 100644 test/test_reduce_by_key.cpp create mode 100644 test/test_reducing_adapter.cpp create mode 100644 test/test_tagged_bag.cpp create mode 100644 test/test_traits.cpp diff --git a/.github/workflows/ci-test.yaml b/.github/workflows/ci-test.yaml index 79205b29..732b52ac 100644 --- a/.github/workflows/ci-test.yaml +++ b/.github/workflows/ci-test.yaml @@ -2,7 +2,7 @@ name: CI Test on: pull_request: - branches: [ main, develop ] + branches: [ master, develop ] push: branches: [ 'feature/**', 'hotfix/**'] @@ -39,9 +39,9 @@ jobs: run: | cd ~ wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb - sudo apt install ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb - sudo apt update - sudo apt install libarrow-dev libparquet-dev + sudo apt-get install ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb + sudo apt-get update + sudo apt-get install libarrow-dev libparquet-dev - name: Install mpich if: matrix.mpi-type == 'mpich' run: sudo apt-get install mpich @@ -58,8 +58,8 @@ jobs: g++-${{ matrix.gcc-version }} --version mkdir build cd build - cmake ../ -DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }} -DCMAKE_CXX_COMPILER=g++-${{ matrix.gcc-version }} -DBOOST_ROOT=~/boost_1_77_0 - make + cmake ../ -DCMAKE_BUILD_TYPE=${{ env.BUILD_TYPE }} -DCMAKE_CXX_COMPILER=g++-${{ matrix.gcc-version }} -DBOOST_ROOT=~/boost_1_77_0 -DYGM_REQUIRE_ARROW=ON + make -j - name: Make test (mpich) if: matrix.mpi-type == 'mpich' run: | diff --git a/CMakeLists.txt b/CMakeLists.txt index f5985da4..f8dba5e7 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -7,7 +7,7 @@ cmake_minimum_required(VERSION 3.14) project( ygm - VERSION 0.5 + VERSION 0.6 DESCRIPTION "HPC Communication Library" LANGUAGES CXX ) @@ -141,6 +141,21 @@ endif() if (NOT Arrow_FOUND) find_package(Arrow 10.0 QUIET) endif() +if (NOT Arrow_FOUND) + find_package(Arrow 11.0 QUIET) +endif() +if (NOT Arrow_FOUND) + find_package(Arrow 12.0 QUIET) +endif() +if (NOT Arrow_FOUND) + find_package(Arrow 13.0 QUIET) +endif() +if (NOT Arrow_FOUND) + find_package(Arrow 14.0 QUIET) +endif() +if (NOT Arrow_FOUND) + find_package(Arrow 15.0 QUIET) +endif() if (Arrow_FOUND) message(STATUS ${PROJECT_NAME} " found Arrow ") message(STATUS "Arrow version: ${ARROW_VERSION}") @@ -150,12 +165,15 @@ if (Arrow_FOUND) if (Parquet_FOUND) message(STATUS ${PROJECT_NAME} " found Parquet ") message(STATUS "Parquet version: ${PARQUET_VERSION}") - message(STATUS "Parquet SO version: ${PARQUET_FULL_SO_VERSION}") + message(STATUS "Parquet SO version: ${PARQUET_FULL_SO_VERSION}") else () - message(WARNING ${PROJECT_NAME} " did not find Parquet. Building without Parquet.") - endif () + message(WARNING ${PROJECT_NAME} " did not find Parquet. Building without Parquet.") + endif () else () - message(WARNING ${PROJECT_NAME} " did not find Arrow >= 8.0. Building without Arrow.") + message(WARNING ${PROJECT_NAME} " did not find Arrow >= 8.0. Building without Arrow.") + if (YGM_REQUIRE_ARROW) + message(FATAL_ERROR "YGM configured to require Arrow, but Arrow could not be found") + endif () endif () # @@ -235,7 +253,6 @@ endif () # Testing & examples are only available if this is the main app if (YGM_MAIN_PROJECT) add_subdirectory(test) - add_subdirectory(performance) # Example codes are here. add_subdirectory(examples) endif () diff --git a/examples/container/alg_pagerank.cpp b/examples/container/alg_pagerank.cpp index a350995a..69ff60fd 100644 --- a/examples/container/alg_pagerank.cpp +++ b/examples/container/alg_pagerank.cpp @@ -34,8 +34,8 @@ int main(int argc, char **argv) { value = value + update_val; }; - auto deg_acc_lambda = [](auto &rv_pair, const auto &update_val) { - rv_pair.second = rv_pair.second + update_val; + auto deg_acc_lambda = [](auto &row, auto &val, const auto &update_val) { + val = val + update_val; }; std::string key1, key2; @@ -59,12 +59,12 @@ int main(int argc, char **argv) { int N = pr.size(); init_pr = ((double)1) / N; - auto mod_pr_lambda = [&init_pr](auto &rv_pair) { rv_pair.second = init_pr; }; + auto mod_pr_lambda = [&init_pr](const auto &vtx, auto &pg_rnk) { + pg_rnk = init_pr; + }; pr.for_all(mod_pr_lambda); - auto deg_lambda = [&A](const auto &kv_pair) { - auto vtx = kv_pair.first; - auto deg = kv_pair.second; + auto deg_lambda = [&A](const auto &vtx, const auto °) { auto scale_A_lambda = [](const auto &row, const auto &col, auto &value, const auto °) { value = ((double)value) / deg; @@ -83,20 +83,18 @@ int main(int argc, char **argv) { ns_spmv::spmv(A, pr, std::plus(), std::multiplies()); world.barrier(); - auto adding_damping_pr_lambda = [&map_res, d_val, N](auto &vtx_pr) { - auto vtx_id = vtx_pr.first; - auto pg_rnk = vtx_pr.second; - auto visit_lambda = [](auto &vtx_pr_pair, auto &da_val, auto &d_val) { - vtx_pr_pair.second = da_val + d_val * vtx_pr_pair.second; - }; - map_res.async_insert_if_missing_else_visit(vtx_id, (float(1 - d_val) / N), + auto adding_damping_pr_lambda = [&map_res, d_val, N](const auto &vtx, + const auto &pg_rnk) { + auto visit_lambda = [](const auto &vtx_id, auto &pr, auto &da_val, + auto &d_val) { pr = da_val + d_val * pr; }; + map_res.async_insert_if_missing_else_visit(vtx, (float(1 - d_val) / N), visit_lambda, d_val); }; pr.for_all(adding_damping_pr_lambda); pr.swap(map_res); - auto agg_pr_lambda = [&agg_pr](auto &vtx_pr_pair) { - agg_pr = agg_pr + vtx_pr_pair.second; + auto agg_pr_lambda = [&agg_pr](const auto &vtx, const auto &pg_rnk) { + agg_pr = agg_pr + pg_rnk; }; pr.for_all(agg_pr_lambda); world.barrier(); diff --git a/examples/container/disjoint_set_cc.cpp b/examples/container/disjoint_set_cc.cpp index 77fb770e..0cbb35e0 100644 --- a/examples/container/disjoint_set_cc.cpp +++ b/examples/container/disjoint_set_cc.cpp @@ -37,8 +37,7 @@ int main(int argc, char** argv) { connected_components.all_compress(); world.cout0("Person : Representative"); - connected_components.for_all([&world](const auto& person_rep_pair) { - std::cout << person_rep_pair.first << " : " << person_rep_pair.second - << std::endl; + connected_components.for_all([&world](const auto& person, const auto& rep) { + std::cout << person << " : " << rep << std::endl; }); } diff --git a/examples/container/map_insert_if_missing.cpp b/examples/container/map_insert_if_missing.cpp index dc95301c..27ace5cb 100644 --- a/examples/container/map_insert_if_missing.cpp +++ b/examples/container/map_insert_if_missing.cpp @@ -22,10 +22,10 @@ int main(int argc, char **argv) { world.barrier(); - auto sounds_lambda = [](auto &kv_pair, const auto &new_value, - const int origin_rank) { - std::cout << "The " << kv_pair.first << " says " << kv_pair.second - << " for rank " << origin_rank << std::endl; + auto sounds_lambda = [](const auto &key, const auto &value, + const auto &new_value, const int origin_rank) { + std::cout << "The " << key << " says " << value << " for rank " + << origin_rank << std::endl; }; // Keys already exist. Visits occur instead. diff --git a/examples/container/map_set.cpp b/examples/container/map_set.cpp index d2348bf8..d31ad6ad 100644 --- a/examples/container/map_set.cpp +++ b/examples/container/map_set.cpp @@ -7,7 +7,7 @@ #include #include -int main(int argc, char** argv) { +int main(int argc, char **argv) { ygm::comm world(&argc, &argv); ygm::container::set str_set(world); @@ -27,8 +27,8 @@ int main(int argc, char** argv) { str_set.for_all([](auto k) { std::cout << "str_set: " << k << std::endl; }); - str_map.for_all([](auto kv) { - std::cout << "str_map: " << kv.first << " -> " << kv.second << std::endl; + str_map.for_all([](const auto &key, auto &value) { + std::cout << "str_map: " << key << " -> " << value << std::endl; }); return 0; } diff --git a/examples/container/map_visit.cpp b/examples/container/map_visit.cpp index 0e7e793f..49b44ab6 100644 --- a/examples/container/map_visit.cpp +++ b/examples/container/map_visit.cpp @@ -18,10 +18,9 @@ int main(int argc, char **argv) { world.barrier(); - auto favorites_lambda = [](auto kv_pair, const int favorite_num) { - std::cout << "My favorite animal is a " << kv_pair.first << ". It says '" - << kv_pair.second << "!' My favorite number is " << favorite_num - << std::endl; + auto favorites_lambda = [](auto key, auto &value, const int favorite_num) { + std::cout << "My favorite animal is a " << key << ". It says '" << value + << "!' My favorite number is " << favorite_num << std::endl; }; // Send visitors to map diff --git a/examples/container/map_visit_optional_arguments.cpp b/examples/container/map_visit_optional_arguments.cpp index 366c4252..2a19b9e7 100644 --- a/examples/container/map_visit_optional_arguments.cpp +++ b/examples/container/map_visit_optional_arguments.cpp @@ -17,20 +17,19 @@ int main(int argc, char **argv) { world.barrier(); - auto visit_lambda = [](auto pmap, auto kv_pair) { + auto visit_lambda = [](auto pmap, auto key, auto value) { std::cout << "Rank " << pmap->comm().rank() << " is receiving a lookup\n" - << "\tKey: " << kv_pair.first << " Value: " << kv_pair.second + << "\tKey: " << key << " Value: " << value << "\n\tGoing to ask rank 0 to say something." << std::endl; // Send message to rank 0 to introduce himself - pmap->comm().async(0, - [](auto pcomm, int from) { - std::cout << "Hi. I'm rank " << pcomm->rank() - << ". Rank " << from - << " wanted me to say something." - << std::endl; - }, - pmap->comm().rank()); + pmap->comm().async( + 0, + [](auto pcomm, int from) { + std::cout << "Hi. I'm rank " << pcomm->rank() << ". Rank " << from + << " wanted me to say something." << std::endl; + }, + pmap->comm().rank()); }; // Send lookup from odd-numbered ranks diff --git a/examples/container/multimap_visit_group.cpp b/examples/container/multimap_visit_group.cpp index 3b9d70a8..37498d96 100644 --- a/examples/container/multimap_visit_group.cpp +++ b/examples/container/multimap_visit_group.cpp @@ -21,9 +21,8 @@ int main(int argc, char **argv) { world.cout0("Visiting individual key-value pairs with async_visit"); // async_visit gives access to individual key-value pairs - auto visit_lambda = [](auto kv_pair) { - std::cout << "One thing a " << kv_pair.first << " says is " - << kv_pair.second << std::endl; + auto visit_lambda = [](const auto &key, const auto &value) { + std::cout << "One thing a " << key << " says is " << value << std::endl; }; if (world.rank() % 2) { diff --git a/examples/io/CMakeLists.txt b/examples/io/CMakeLists.txt index a82066a2..405c0339 100644 --- a/examples/io/CMakeLists.txt +++ b/examples/io/CMakeLists.txt @@ -1,10 +1,21 @@ -# Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +# Copyright 2019-2023 Lawrence Livermore National Security, LLC and other YGM # Project Developers. See the top-level COPYRIGHT file for details. # # SPDX-License-Identifier: MIT if (Arrow_FOUND AND Parquet_FOUND) add_ygm_example(arrow_parquet_stream_reader) - target_link_libraries(arrow_parquet_stream_reader PUBLIC arrow_shared parquet_shared) -endif() + target_link_libraries(arrow_parquet_stream_reader PUBLIC + Arrow::arrow_shared Parquet::parquet_shared) + add_ygm_example(arrow_parquet_stream_reader_variant) + target_link_libraries(arrow_parquet_stream_reader_variant PUBLIC + Arrow::arrow_shared Parquet::parquet_shared) + + if (Boost_FOUND) + add_ygm_example(arrow_parquet_stream_reader_json) + target_include_directories(arrow_parquet_stream_reader_json PUBLIC ${Boost_INCLUDE_DIRS}) + target_link_libraries(arrow_parquet_stream_reader_json PUBLIC + Arrow::arrow_shared Parquet::parquet_shared) + endif() +endif() \ No newline at end of file diff --git a/examples/io/arrow_parquet_stream_reader_json.cpp b/examples/io/arrow_parquet_stream_reader_json.cpp new file mode 100644 index 00000000..5b17d1dc --- /dev/null +++ b/examples/io/arrow_parquet_stream_reader_json.cpp @@ -0,0 +1,53 @@ +// Copyright 2019-2023 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +// Usage: +// cd /ygm/build/dir +// mpirun -np 2 ./arrow_parquet_stream_reader_json \ +// [(option) /path/to/parquet/file/or/dir] + +#include +#include +#include +#include +#include +#include + +#include + +#include +#include +#include + +int main(int argc, char** argv) { + ygm::comm world(&argc, &argv); + + world.cout0() + << "Arrow Parquet file parser example (reads data as JSON objects)" + << std::endl; + + // assuming the build directory is inside the YGM root directory + std::string dir_name = "../test/data/parquet_files_json/"; + if (argc == 2) { + dir_name = argv[1]; + } + + ygm::io::arrow_parquet_parser parquetp(world, {dir_name}); + + world.cout0() << "Schema:\n" << parquetp.schema_to_string() << std::endl; + + world.cout0() << "Read data as JSON:" << std::endl; + const auto& schema = parquetp.schema(); + parquetp.for_all([&schema, &world](auto& stream_reader, const auto&) { + // obj's type is boost::json::object + const auto obj = + ygm::io::detail::read_parquet_as_json(stream_reader, schema); + + world.async( + 0, [](auto, const auto& obj) { std::cout << obj << std::endl; }, obj); + }); + + return 0; +} diff --git a/examples/io/arrow_parquet_stream_reader_variant.cpp b/examples/io/arrow_parquet_stream_reader_variant.cpp new file mode 100644 index 00000000..41d31c7f --- /dev/null +++ b/examples/io/arrow_parquet_stream_reader_variant.cpp @@ -0,0 +1,74 @@ +// Copyright 2019-2024 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +// Usage: +// cd /ygm/build/dir +// mpirun -np 2 ./arrow_parquet_stream_reader_variant \ +// [(option) /path/to/parquet/file/or/dir] + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +int main(int argc, char** argv) { + ygm::comm world(&argc, &argv); + + world.cout0() + << "Arrow Parquet file parser example (reads data as JSON objects)" + << std::endl; + + // assuming the build directory is inside the YGM root directory + std::string dir_name = "../test/data/parquet_files_json/"; + if (argc == 2) { + dir_name = argv[1]; + } + + ygm::io::arrow_parquet_parser parquetp(world, {dir_name}); + + const auto& schema = parquetp.schema(); + + // Print column name + world.cout0() << "Column names:" << std::endl; + for (size_t i = 0; i < schema.size(); ++i) { + world.cout0() << std::get<1>(schema[i]); + if (i < schema.size() - 1) { + world.cout0() << "\t"; + } + } + world.cout0() << std::endl; + + world.cout0() << "Read data as variants:" << std::endl; + std::size_t num_rows = 0; + std::size_t num_valids = 0; + std::size_t num_invalids = 0; + parquetp.for_all([&schema, &num_valids, &num_invalids, &num_rows]( + auto& stream_reader, const auto&) { + const std::vector row = + ygm::io::detail::read_parquet_as_variant(stream_reader, schema); + ++num_rows; + for (const auto& field : row) { + if (std::holds_alternative(field)) { + ++num_invalids; + } else { + ++num_valids; + } + } + }); + + world.cout0() << "#of rows = " << world.all_reduce_sum(num_rows) << std::endl; + world.cout0() << "#of valid items = " << world.all_reduce_sum(num_valids) + << std::endl; + world.cout0() << "#of invalid items = " << world.all_reduce_sum(num_invalids) + << std::endl; + + return 0; +} diff --git a/include/ygm/collective.hpp b/include/ygm/collective.hpp new file mode 100644 index 00000000..e181b431 --- /dev/null +++ b/include/ygm/collective.hpp @@ -0,0 +1,172 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once + +#include + +namespace ygm { + +/** + * @brief Collective computes the prefix sum of value across all ranks in the + * communicator. + * + * @tparam T + * @param value + * @param c + * @return T + */ +template +T prefix_sum(const T &value, comm &c) { + T to_return{0}; + c.barrier(); + MPI_Comm mpi_comm = c.get_mpi_comm(); + ASSERT_MPI(MPI_Exscan(&value, &to_return, 1, detail::mpi_typeof(value), + MPI_SUM, mpi_comm)); + return to_return; +} + +/** + * @brief Collective computes the sum of value across all ranks in the + * communicator. + * + * @tparam T + * @param value + * @param c + * @return T + */ +template +T sum(const T &value, comm &c) { + T to_return; + c.barrier(); + MPI_Comm mpi_comm = c.get_mpi_comm(); + ASSERT_MPI(MPI_Allreduce(&value, &to_return, 1, detail::mpi_typeof(T()), + MPI_SUM, mpi_comm)); + return to_return; +} + +/** + * @brief Collective computes the min of value across all ranks in the + * communicator. + * + * @tparam T + * @param value + * @param c + * @return T + */ +template +T min(const T &value, comm &c) { + T to_return; + c.barrier(); + MPI_Comm mpi_comm = c.get_mpi_comm(); + ASSERT_MPI(MPI_Allreduce(&value, &to_return, 1, detail::mpi_typeof(T()), + MPI_MIN, mpi_comm)); + return to_return; +} + +/** + * @brief Collective computes the max of value across all ranks in the + * communicator. + * + * @tparam T + * @param value + * @param c + * @return T + */ +template +T max(const T &value, comm &c) { + T to_return; + c.barrier(); + MPI_Comm mpi_comm = c.get_mpi_comm(); + ASSERT_MPI(MPI_Allreduce(&value, &to_return, 1, detail::mpi_typeof(T()), + MPI_MAX, mpi_comm)); + return to_return; +} + +/** + * @brief Collective computes the logical and of value across all ranks in the + * communicator. + * + * @tparam T + * @param value + * @param c + * @return T + */ +inline bool logical_and(bool value, comm &c) { + bool to_return; + c.barrier(); + MPI_Comm mpi_comm = c.get_mpi_comm(); + ASSERT_MPI(MPI_Allreduce(&value, &to_return, 1, detail::mpi_typeof(bool()), + MPI_LAND, mpi_comm)); + return to_return; +} + +/** + * @brief Collective computes the logical or of value across all ranks in the + * communicator. + * + * @tparam T + * @param value + * @param c + * @return T + */ +inline bool logical_or(bool value, comm &c) { + bool to_return; + c.barrier(); + MPI_Comm mpi_comm = c.get_mpi_comm(); + ASSERT_MPI(MPI_Allreduce(&value, &to_return, 1, detail::mpi_typeof(bool()), + MPI_LOR, mpi_comm)); + return to_return; +} + +/** + * @brief Broadcasts to_bcast from root to all other ranks in communicator. + * + * @tparam T + * @param to_bcast + * @param root + * @param cm + */ +template +void bcast(T &to_bcast, int root, comm &cm) { + if constexpr (std::is_trivially_copyable::value && + std::is_standard_layout::value) { + ASSERT_MPI( + MPI_Bcast(&to_bcast, sizeof(T), MPI_BYTE, root, cm.get_mpi_comm())); + } else { + std::vector packed; + cereal::YGMOutputArchive oarchive(packed); + if (cm.rank() == root) { + oarchive(to_bcast); + } + size_t packed_size = packed.size(); + ASSERT_RELEASE(packed_size < 1024 * 1024 * 1024); + ASSERT_MPI(MPI_Bcast(&packed_size, 1, ygm::detail::mpi_typeof(packed_size), + root, cm.get_mpi_comm())); + if (cm.rank() != root) { + packed.resize(packed_size); + } + ASSERT_MPI(MPI_Bcast(packed.data(), packed_size, MPI_BYTE, root, + cm.get_mpi_comm())); + + if (cm.rank() != root) { + cereal::YGMInputArchive iarchive(packed.data(), packed.size()); + iarchive(to_bcast); + } + } +} + +template > +bool is_same(const T &to_check, comm &cm, const Equal &equals = Equal()) { + T to_bcast; + if (cm.rank() == 0) { + to_bcast = to_check; + } + bcast(to_bcast, 0, cm); + bool local_is_same = equals(to_check, to_bcast); + return logical_and(local_is_same, cm); +} + +} // namespace ygm \ No newline at end of file diff --git a/include/ygm/comm.hpp b/include/ygm/comm.hpp index ced70263..40eb0a6f 100644 --- a/include/ygm/comm.hpp +++ b/include/ygm/comm.hpp @@ -5,11 +5,19 @@ #pragma once +#include #include #include #include + +#include +#include +#include +#include #include +#include #include +#include #include namespace ygm { @@ -17,11 +25,15 @@ namespace ygm { namespace detail { class interrupt_mask; class comm_stats; +class layout; +class comm_router; } // namespace detail class comm { private: - class impl; + class mpi_irecv_request; + class mpi_isend_request; + class header_t; friend class detail::interrupt_mask; friend class detail::comm_stats; @@ -32,10 +44,6 @@ class comm { // map comm(MPI_Comm comm); - // Constructor to allow comm::impl to build temporary comm using itself as the - // impl - comm(std::shared_ptr impl_ptr); - ~comm(); /** @@ -78,6 +86,13 @@ class comm { */ void barrier(); + void local_progress(); + + bool local_process_incoming(); + + template + void local_wait_until(Function fn); + template ygm_ptr make_ygm_ptr(T &t); @@ -99,7 +114,7 @@ class comm { T all_reduce_max(const T &t) const; template - inline T all_reduce(const T &t, MergeFunction merge); + inline T all_reduce(const T &t, MergeFunction merge) const; // // Communicator information @@ -107,83 +122,124 @@ class comm { int size() const; int rank() const; + MPI_Comm get_mpi_comm() const; + const detail::layout &layout() const; - std::ostream &cout0() const { - static std::ostringstream dummy; - dummy.clear(); - if (rank() == 0) { - return std::cout; - } - return dummy; - } - - std::ostream &cerr0() const { - static std::ostringstream dummy; - dummy.clear(); - if (rank() == 0) { - return std::cerr; - } - return dummy; - } - - std::ostream &cout() const { - std::cout << rank() << ": "; - return std::cout; - } - - std::ostream &cerr() const { - std::cerr << rank() << ": "; - return std::cerr; - } + const detail::comm_router &router() const; bool rank0() const { return rank() == 0; } + template + void mpi_send(const T &data, int dest, int tag, MPI_Comm comm) const; + + template + T mpi_recv(int source, int tag, MPI_Comm comm) const; + + template + T mpi_bcast(const T &to_bcast, int root, MPI_Comm comm) const; + + std::ostream &cout0() const; + std::ostream &cerr0() const; + std::ostream &cout() const; + std::ostream &cerr() const; + template - void cout(Args &&...args) const { - std::cout << outstr(args...) << std::endl; - } + void cout(Args &&...args) const; template - void cerr(Args &&...args) const { - std::cerr << outstr(args...) << std::endl; - } + void cerr(Args &&...args) const; template - void cout0(Args &&...args) const { - if (rank0()) { - std::cout << outstr0(args...) << std::endl; - } - } + void cout0(Args &&...args) const; template - void cerr0(Args &&...args) const { - if (rank0()) { - std::cerr << outstr0(args...) << std::endl; - } - } + void cerr0(Args &&...args) const; + // Private member functions private: + void comm_setup(MPI_Comm comm); + + size_t pack_header(std::vector &packed, const int dest, + size_t size); + + std::pair barrier_reduce_counts(); + + void flush_send_buffer(int dest); + + void check_if_production_halt_required(); + + void flush_all_local_and_process_incoming(); + + void flush_to_capacity(); + + void post_new_irecv(std::shared_ptr &recv_buffer); + + template + size_t pack_lambda(std::vector &packed, Lambda l, + const PackArgs &...args); + + template + void pack_lambda_broadcast(Lambda l, const PackArgs &...args); + + template + size_t pack_lambda_generic(std::vector &packed, Lambda l, + RemoteLogicLambda rll, const PackArgs &...args); + + void queue_message_bytes(const std::vector &packed, + const int dest); + + void handle_next_receive(MPI_Status status, + std::shared_ptr buffer); + + bool process_receive_queue(); + template - std::string outstr0(Args &&...args) const { - std::stringstream ss; - (ss << ... << args); - return ss.str(); - } + std::string outstr(Args &&...args) const; template - std::string outstr(Args &&...args) const { - std::stringstream ss; - (ss << rank() << ": " << ... << args); - return ss.str(); - } + std::string outstr0(Args &&...args) const; comm() = delete; - std::shared_ptr pimpl; + comm(const comm &c) = delete; + + // Private member variables + private: std::shared_ptr pimpl_if; + + MPI_Comm m_comm_async; + MPI_Comm m_comm_barrier; + MPI_Comm m_comm_other; + + std::vector> m_vec_send_buffers; + size_t m_send_buffer_bytes = 0; + std::deque m_send_dest_queue; + + std::deque m_recv_queue; + std::deque m_send_queue; + std::vector>> m_free_send_buffers; + + size_t m_pending_isend_bytes = 0; + + std::deque> m_pre_barrier_callbacks; + + bool m_enable_interrupts = true; + + uint64_t m_recv_count = 0; + uint64_t m_send_count = 0; + + bool m_in_process_receive_queue = false; + + detail::comm_stats stats; + const detail::comm_environment config; + const detail::layout m_layout; + detail::comm_router m_router; + + detail::lambda_map + m_lambda_map; }; } // end namespace ygm -#include +#include diff --git a/include/ygm/container/array.hpp b/include/ygm/container/array.hpp index 61cb7c30..a8049630 100644 --- a/include/ygm/container/array.hpp +++ b/include/ygm/container/array.hpp @@ -5,110 +5,131 @@ #pragma once -#include +#include +#include + namespace ygm::container { template class array { public: - using self_type = array; - using value_type = Value; - using index_type = Index; - using impl_type = detail::array_impl; + using self_type = array; + using mapped_type = Value; + using key_type = Index; + using size_type = Index; + using ygm_for_all_types = std::tuple; + using ygm_container_type = ygm::container::array_tag; + using ptr_type = typename ygm::ygm_ptr; array() = delete; - array(ygm::comm& comm, const index_type size) : m_impl(comm, size) {} + array(ygm::comm& comm, const size_type size); - array(ygm::comm& comm, const index_type size, const value_type& default_value) - : m_impl(comm, size, default_value) {} + array(ygm::comm& comm, const size_type size, + const mapped_type& default_value); - array(const self_type& rhs) : m_impl(rhs.m_impl) {} + array(const self_type& rhs); - void async_set(const index_type index, const value_type& value) { - m_impl.async_set(index, value); - } + ~array(); + + void async_set(const key_type index, const mapped_type& value); template - void async_binary_op_update_value(const index_type index, - const value_type& value, - const BinaryOp& b) { - m_impl.async_binary_op_update_value(index, value, b); - } + void async_binary_op_update_value(const key_type index, + const mapped_type& value, + const BinaryOp& b); - void async_bit_and(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::bit_and()); + void async_bit_and(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::bit_and()); } - void async_bit_or(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::bit_or()); + void async_bit_or(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::bit_or()); } - void async_bit_xor(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::bit_xor()); + void async_bit_xor(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::bit_xor()); } - void async_logical_and(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::logical_and()); + void async_logical_and(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::logical_and()); } - void async_logical_or(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::logical_or()); + void async_logical_or(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::logical_or()); } - void async_multiplies(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::multiplies()); + void async_multiplies(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::multiplies()); } - void async_divides(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::divides()); + void async_divides(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::divides()); } - void async_plus(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::plus()); + void async_plus(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::plus()); } - void async_minus(const index_type index, const value_type& value) { - async_binary_op_update_value(index, value, std::minus()); + void async_minus(const key_type index, const mapped_type& value) { + async_binary_op_update_value(index, value, std::minus()); } template - void async_unary_op_update_value(const index_type index, const UnaryOp& u) { - m_impl.async_unary_op_update_value(index, u); - } + void async_unary_op_update_value(const key_type index, const UnaryOp& u); - void async_increment(const index_type index) { + void async_increment(const key_type index) { async_unary_op_update_value(index, - [](const value_type& v) { return v + 1; }); + [](const mapped_type& v) { return v + 1; }); } - void async_decrement(const index_type index) { + void async_decrement(const key_type index) { async_unary_op_update_value(index, - [](const value_type& v) { return v - 1; }); + [](const mapped_type& v) { return v - 1; }); } template - void async_visit(const index_type index, Visitor visitor, - const VisitorArgs&... args) { - m_impl.async_visit(index, visitor, - std::forward(args)...); - } + void async_visit(const key_type index, Visitor visitor, + const VisitorArgs&... args); template - void for_all(Function fn) { - m_impl.for_all(fn); - } + void for_all(Function fn); - index_type size() { return m_impl.size(); } + size_type size(); - typename ygm::ygm_ptr get_ygm_ptr() const { - return m_impl.get_ygm_ptr(); - } + typename ygm::ygm_ptr get_ygm_ptr() const; + + int owner(const key_type index) const; + + bool is_mine(const key_type index) const; - ygm::comm& comm() { return m_impl.comm(); } + ygm::comm& comm(); + + const mapped_type& default_value() const; + + void resize(const size_type size, const mapped_type& fill_value); + + void resize(const size_type size); private: - impl_type m_impl; + template + void local_for_all(Function fn); + + key_type local_index(key_type index); + + key_type global_index(key_type index); + + private: + size_type m_global_size; + size_type m_small_block_size; + size_type m_large_block_size; + size_type m_local_start_index; + mapped_type m_default_value; + std::vector m_local_vec; + ygm::comm& m_comm; + typename ygm::ygm_ptr pthis; }; } // namespace ygm::container + +#include diff --git a/include/ygm/container/bag.hpp b/include/ygm/container/bag.hpp index 1b498025..b2cb5579 100644 --- a/include/ygm/container/bag.hpp +++ b/include/ygm/container/bag.hpp @@ -4,45 +4,69 @@ // SPDX-License-Identifier: MIT #pragma once -#include + +#include +#include namespace ygm::container { template > class bag { public: - using self_type = bag; - using value_type = Item; - using impl_type = detail::bag_impl; + using self_type = bag; + using value_type = Item; + using size_type = size_t; + using ygm_for_all_types = std::tuple; + using ygm_container_type = ygm::container::bag_tag; - bag(ygm::comm &comm) : m_impl(comm) {} + bag(ygm::comm &comm); + ~bag(); - void async_insert(const value_type &item) { m_impl.async_insert(item); } + void async_insert(const value_type &item); + void async_insert(const value_type &item, int dest); + void async_insert(const std::vector &items, int dest); template - void for_all(Function fn) { - m_impl.for_all(fn); - } + void for_all(Function fn); + + void clear(); + + size_type size(); + size_type local_size(); + + void rebalance(); - void clear() { m_impl.clear(); } + void swap(self_type &s); - size_t size() { return m_impl.size(); } + template + void local_shuffle(RandomFunc &r); + void local_shuffle(); - void swap(self_type &s) { m_impl.swap(s.m_impl); } + template + void global_shuffle(RandomFunc &r); + void global_shuffle(); template - void local_for_all(Function fn) { - m_impl.local_for_all(fn); - } + void local_for_all(Function fn); - ygm::comm &comm() { return m_impl.comm(); } + ygm::comm &comm(); - void serialize(const std::string &fname) { m_impl.serialize(fname); } - void deserialize(const std::string &fname) { m_impl.deserialize(fname); } - std::vector gather_to_vector(int dest) { return m_impl.gather_to_vector(dest); } - std::vector gather_to_vector() { return m_impl.gather_to_vector(); } + void serialize(const std::string &fname); + void deserialize(const std::string &fname); + std::vector gather_to_vector(int dest); + std::vector gather_to_vector(); + private: + std::vector local_pop(int n); + + template + void local_for_all_pair_types(Function fn); private: - detail::bag_impl m_impl; + size_t m_round_robin = 0; + ygm::comm &m_comm; + std::vector m_local_bag; + typename ygm::ygm_ptr pthis; }; } // namespace ygm::container + +#include diff --git a/include/ygm/container/container_traits.hpp b/include/ygm/container/container_traits.hpp new file mode 100644 index 00000000..1f989fdb --- /dev/null +++ b/include/ygm/container/container_traits.hpp @@ -0,0 +1,78 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once +#include + + +namespace ygm::container { +// Identifiable ygm container tags usable for comparison +struct array_tag; +struct bag_tag; +struct counting_set_tag; +struct disjoint_set_tag; +struct map_tag; +struct set_tag; + + +// General template used as a base case +template +struct has_ygm_container_type : std::false_type {}; + +// Specialized template to ensure a tested container has a ygm::container::type +template +struct has_ygm_container_type< + Container, + std::void_t< typename Container::ygm_container_type > +> : std::true_type {}; + +/* Helper function which: + * 1) Checks if the input container type is part of YGM + * 2) If so, compares the ygm container's tag against desired tag + */ +template +constexpr bool check_ygm_container_type() { + if constexpr(has_ygm_container_type< Container >::value) { + return std::is_same< + typename Container::ygm_container_type, + Tag + >::value; + } else { + return false; + } +} + +// Tag checking functions for every YGM container +template +constexpr bool is_array(Container &c) { + return check_ygm_container_type(); +} + +template +constexpr bool is_bag(Container &c) { + return check_ygm_container_type(); +} + +template +constexpr bool is_counting_set(Container &c) { + return check_ygm_container_type(); +} + +template +constexpr bool is_disjoint_set(Container &c) { + return check_ygm_container_type(); +} + +template +constexpr bool is_map(Container &c) { + return check_ygm_container_type(); +} + +template +constexpr bool is_set(Container &c) { + return check_ygm_container_type(); +} + +} // ygm::container diff --git a/include/ygm/container/counting_set.hpp b/include/ygm/container/counting_set.hpp index 54929411..63134317 100644 --- a/include/ygm/container/counting_set.hpp +++ b/include/ygm/container/counting_set.hpp @@ -8,6 +8,7 @@ #include #include #include +#include namespace ygm::container { @@ -16,12 +17,16 @@ template , class Alloc = std::allocator>> class counting_set { public: - using self_type = counting_set; - using key_type = Key; - using value_type = size_t; - const size_t count_cache_size = 1024 * 1024; + using self_type = counting_set; + using mapped_type = size_t; + using key_type = Key; + using size_type = size_t; + using ygm_for_all_types = std::tuple< Key, size_t >; + using ygm_container_type = ygm::container::counting_set_tag; - counting_set(ygm::comm &comm) : m_map(comm, value_type(0)), pthis(this) { + const size_type count_cache_size = 1024 * 1024; + + counting_set(ygm::comm &comm) : m_map(comm, mapped_type(0)), pthis(this) { m_count_cache.resize(count_cache_size, {key_type(), -1}); } @@ -36,38 +41,39 @@ class counting_set { void clear() { m_map.clear(); } - size_t size() { return m_map.size(); } + size_type size() { return m_map.size(); } - size_t count(const key_type &key) { + mapped_type count(const key_type &key) { m_map.comm().barrier(); auto vals = m_map.local_get(key); - size_t local_count{0}; + mapped_type local_count{0}; for (auto v : vals) { local_count += v; } return m_map.comm().all_reduce_sum(local_count); } - size_t count_all() { - size_t local_count{0}; - for_all([&local_count](const auto &kv) { local_count += kv.second; }); + mapped_type count_all() { + mapped_type local_count{0}; + for_all( + [&local_count](const auto &key, auto &value) { local_count += value; }); return m_map.comm().all_reduce_sum(local_count); } bool is_mine(const key_type &key) const { return m_map.is_mine(key); } template - std::vector> topk(size_t k, + std::vector> topk(size_t k, CompareFunction cfn) { return m_map.topk(k, cfn); } template - std::map all_gather(const STLKeyContainer &keys) { + std::map all_gather(const STLKeyContainer &keys) { return m_map.all_gather(keys); } - std::map all_gather(const std::vector &keys) { + std::map all_gather(const std::vector &keys) { return m_map.all_gather(keys); } @@ -119,10 +125,12 @@ class counting_set { auto key = m_count_cache[slot].first; auto cached_count = m_count_cache[slot].second; ASSERT_DEBUG(cached_count > 0); - m_map.async_visit(key, - [](std::pair &key_count, - int32_t to_add) { key_count.second += to_add; }, - cached_count); + m_map.async_visit( + key, + [](const key_type &key, size_t &count, int32_t to_add) { + count += to_add; + }, + cached_count); m_count_cache[slot].first = key_type(); m_count_cache[slot].second = -1; } @@ -137,10 +145,10 @@ class counting_set { } counting_set() = delete; - std::vector> m_count_cache; - bool m_cache_empty = true; - map m_map; - typename ygm::ygm_ptr pthis; + std::vector> m_count_cache; + bool m_cache_empty = true; + map m_map; + typename ygm::ygm_ptr pthis; }; } // namespace ygm::container diff --git a/include/ygm/container/detail/array.ipp b/include/ygm/container/detail/array.ipp new file mode 100644 index 00000000..bcb7cde9 --- /dev/null +++ b/include/ygm/container/detail/array.ipp @@ -0,0 +1,230 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once + +namespace ygm::container { + +template +array::array(ygm::comm &comm, const size_type size) + : m_global_size(size), m_default_value{}, m_comm(comm), pthis(this) { + pthis.check(m_comm); + + resize(size); +} + +template +array::array(ygm::comm &comm, const size_type size, + const mapped_type &dv) + : m_default_value(dv), m_comm(comm), pthis(this) { + pthis.check(m_comm); + + resize(size); +} + +template +array::array(const self_type &rhs) + : m_default_value(rhs.m_default_value), + m_comm(rhs.m_comm), + m_global_size(rhs.m_global_size), + m_small_block_size(rhs.m_small_block_size), + m_large_block_size(rhs.m_large_block_size), + m_local_start_index(rhs.m_local_start_index), + m_local_vec(rhs.m_local_vec), + pthis(this) { + pthis.check(m_comm); +} + +template +array::~array() { + m_comm.barrier(); +} + +template +void array::resize(const size_type size, + const mapped_type &fill_value) { + m_comm.barrier(); + + m_global_size = size; + m_small_block_size = size / m_comm.size(); + m_large_block_size = m_small_block_size + ((size / m_comm.size()) > 0); + + m_local_vec.resize( + m_small_block_size + (m_comm.rank() < (size % m_comm.size())), + fill_value); + + if (m_comm.rank() < (size % m_comm.size())) { + m_local_start_index = m_comm.rank() * m_large_block_size; + } else { + m_local_start_index = + (size % m_comm.size()) * m_large_block_size + + (m_comm.rank() - (size % m_comm.size())) * m_small_block_size; + } + + m_comm.barrier(); +} + +template +void array::resize(const size_type size) { + resize(size, m_default_value); +} + +template +void array::async_set(const key_type index, + const mapped_type &value) { + ASSERT_RELEASE(index < m_global_size); + auto putter = [](auto parray, const key_type i, const mapped_type &v) { + key_type l_index = parray->local_index(i); + ASSERT_RELEASE(l_index < parray->m_local_vec.size()); + parray->m_local_vec[l_index] = v; + }; + + int dest = owner(index); + m_comm.async(dest, putter, pthis, index, value); +} + +template +template +void array::async_binary_op_update_value(const key_type index, + const mapped_type &value, + const BinaryOp &b) { + ASSERT_RELEASE(index < m_global_size); + auto updater = [](const key_type i, mapped_type &v, + const mapped_type &new_value) { + BinaryOp *binary_op; + v = (*binary_op)(v, new_value); + }; + + async_visit(index, updater, value); +} +template +template +void array::async_unary_op_update_value(const key_type index, + const UnaryOp &u) { + ASSERT_RELEASE(index < m_global_size); + auto updater = [](const key_type i, mapped_type &v) { + UnaryOp *u; + v = (*u)(v); + }; + + async_visit(index, updater); +} + +template +template +void array::async_visit(const key_type index, Visitor visitor, + const VisitorArgs &...args) { + ASSERT_RELEASE(index < m_global_size); + int dest = owner(index); + auto visit_wrapper = [](auto parray, const key_type i, + const VisitorArgs &...args) { + key_type l_index = parray->local_index(i); + ASSERT_RELEASE(l_index < parray->m_local_vec.size()); + mapped_type &l_value = parray->m_local_vec[l_index]; + Visitor *vis = nullptr; + if constexpr (std::is_invocable() || + std::is_invocable()) { + ygm::meta::apply_optional(*vis, std::make_tuple(parray), + std::forward_as_tuple(i, l_value, args...)); + } else { + static_assert( + ygm::detail::always_false<>, + "remote array lambda signature must be invocable with (const " + "&key_type, mapped_type&, ...) or (ptr_type, const " + "&key_type, mapped_type&, ...) signatures"); + } + }; + + m_comm.async(dest, visit_wrapper, pthis, index, + std::forward(args)...); +} + +template +template +void array::for_all(Function fn) { + m_comm.barrier(); + local_for_all(fn); +} + +template +template +void array::local_for_all(Function fn) { + if constexpr (std::is_invocable()) { + for (int i = 0; i < m_local_vec.size(); ++i) { + key_type g_index = global_index(i); + fn(g_index, m_local_vec[i]); + } + } else if constexpr (std::is_invocable()) { + std::for_each(std::begin(m_local_vec), std::end(m_local_vec), fn); + } else { + static_assert(ygm::detail::always_false<>, + "local array lambda must be invocable with (const " + "key_type, mapped_type &) or (mapped_type &) signatures"); + } +} + +template +typename array::size_type array::size() { + return m_global_size; +} + +template +typename array::ptr_type array::get_ygm_ptr() + const { + return pthis; +} + +template +ygm::comm &array::comm() { + return m_comm; +} + +template +const typename array::mapped_type & +array::default_value() const { + return m_default_value; +} + +template +int array::owner(const key_type index) const { + int to_return; + // Owner depends on whether index is before switching to small blocks + if (index < (m_global_size % m_comm.size()) * m_large_block_size) { + to_return = index / m_large_block_size; + } else { + to_return = (m_global_size % m_comm.size()) + + (index - (m_global_size % m_comm.size()) * m_large_block_size) / + m_small_block_size; + } + ASSERT_RELEASE((to_return >= 0) && (to_return < m_comm.size())); + + return to_return; +} + +template +bool array::is_mine(const key_type index) const { + return owner(index) == m_comm.rank(); +} + +template +typename array::key_type array::local_index( + const key_type index) { + key_type to_return = index - m_local_start_index; + ASSERT_RELEASE((to_return >= 0) && (to_return <= m_small_block_size)); + return to_return; +} + +template +typename array::key_type array::global_index( + const key_type index) { + key_type to_return; + return m_local_start_index + index; +} + +}; // namespace ygm::container diff --git a/include/ygm/container/detail/array_impl.hpp b/include/ygm/container/detail/array_impl.hpp deleted file mode 100644 index ae01276d..00000000 --- a/include/ygm/container/detail/array_impl.hpp +++ /dev/null @@ -1,156 +0,0 @@ -// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM -// Project Developers. See the top-level COPYRIGHT file for details. -// -// SPDX-License-Identifier: MIT - -#pragma once -#include -#include -#include - -namespace ygm::container::detail { - -template -class array_impl { - public: - using self_type = array_impl; - using value_type = Value; - using index_type = Index; - - array_impl(ygm::comm &comm, const index_type size) - : m_global_size(size), m_default_value{}, m_comm(comm), pthis(this) { - pthis.check(m_comm); - - resize(size); - } - - array_impl(ygm::comm &comm, const index_type size, const value_type &dv) - : m_default_value(dv), m_comm(comm), pthis(this) { - pthis.check(m_comm); - - resize(size); - } - - array_impl(const self_type &rhs) - : m_default_value(rhs.m_default_value), - m_comm(rhs.m_comm), - m_global_size(rhs.m_global_size), - m_local_vec(rhs.m_local_vec), - pthis(this) {} - - ~array_impl() { m_comm.barrier(); } - - void resize(const index_type size, const value_type &fill_value) { - m_comm.barrier(); - - m_global_size = size; - m_block_size = size / m_comm.size() + (size % m_comm.size() > 0); - - if (m_comm.rank() != m_comm.size() - 1) { - m_local_vec.resize(m_block_size, fill_value); - } else { - // Last rank may get less data - index_type block_size = m_global_size % m_block_size; - if (block_size == 0) { - block_size = m_block_size; - } - m_local_vec.resize(block_size, fill_value); - } - - m_comm.barrier(); - } - - void resize(const index_type size) { resize(size, m_default_value); } - - void async_set(const index_type index, const value_type &value) { - ASSERT_RELEASE(index < m_global_size); - auto putter = [](auto parray, const index_type i, const value_type &v) { - index_type l_index = parray->local_index(i); - ASSERT_RELEASE(l_index < parray->m_local_vec.size()); - parray->m_local_vec[l_index] = v; - }; - - int dest = owner(index); - m_comm.async(dest, putter, pthis, index, value); - } - - template - void async_binary_op_update_value(const index_type index, - const value_type &value, - const BinaryOp &b) { - ASSERT_RELEASE(index < m_global_size); - auto updater = [](const index_type i, value_type &v, - const value_type &new_value) { - BinaryOp *binary_op; - v = (*binary_op)(v, new_value); - }; - - async_visit(index, updater, value); - } - - template - void async_unary_op_update_value(const index_type index, const UnaryOp &u) { - ASSERT_RELEASE(index < m_global_size); - auto updater = [](const index_type i, value_type &v) { - UnaryOp *u; - v = (*u)(v); - }; - - async_visit(index, updater); - } - - template - void async_visit(const index_type index, Visitor visitor, - const VisitorArgs &...args) { - ASSERT_RELEASE(index < m_global_size); - int dest = owner(index); - auto visit_wrapper = [](auto parray, const index_type i, - const VisitorArgs &...args) { - index_type l_index = parray->local_index(i); - ASSERT_RELEASE(l_index < parray->m_local_vec.size()); - value_type &l_value = parray->m_local_vec[l_index]; - Visitor *vis = nullptr; - ygm::meta::apply_optional(*vis, std::make_tuple(parray), - std::forward_as_tuple(i, l_value, args...)); - }; - - m_comm.async(dest, visit_wrapper, pthis, index, - std::forward(args)...); - } - - template - void for_all(Function fn) { - m_comm.barrier(); - for (int i = 0; i < m_local_vec.size(); ++i) { - index_type g_index = global_index(i); - fn(g_index, m_local_vec[i]); - } - } - - index_type size() { return m_global_size; } - - typename ygm::ygm_ptr get_ygm_ptr() const { return pthis; } - - ygm::comm &comm() { return m_comm; } - - int owner(const index_type index) { return index / m_block_size; } - - index_type local_index(const index_type index) { - return index % m_block_size; - } - - index_type global_index(const index_type index) { - return m_comm.rank() * m_block_size + index; - } - - protected: - array_impl() = delete; - - index_type m_global_size; - index_type m_block_size; - value_type m_default_value; - std::vector m_local_vec; - ygm::comm &m_comm; - typename ygm::ygm_ptr pthis; -}; -} // namespace ygm::container::detail diff --git a/include/ygm/container/detail/bag.ipp b/include/ygm/container/detail/bag.ipp new file mode 100644 index 00000000..f64f1e9c --- /dev/null +++ b/include/ygm/container/detail/bag.ipp @@ -0,0 +1,271 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once + +#include +#include +#include + +namespace ygm::container { + +template +bag::bag(ygm::comm &comm) : m_comm(comm), pthis(this) { + pthis.check(m_comm); +} + +template +bag::~bag() { + m_comm.barrier(); +} + +template +void bag::async_insert(const value_type &item) { + auto inserter = [](auto mailbox, auto map, const value_type &item) { + map->m_local_bag.push_back(item); + }; + int dest = (m_round_robin++ + m_comm.rank()) % m_comm.size(); + m_comm.async(dest, inserter, pthis, item); +} + +template +void bag::async_insert(const value_type &item, int dest) { + auto inserter = [](auto mailbox, auto map, const value_type &item) { + map->m_local_bag.push_back(item); + }; + m_comm.async(dest, inserter, pthis, item); +} + +template +void bag::async_insert(const std::vector &items, + int dest) { + auto inserter = [](auto mailbox, auto map, + const std::vector &item) { + map->m_local_bag.insert(map->m_local_bag.end(), item.begin(), item.end()); + }; + m_comm.async(dest, inserter, pthis, items); +} + +template +template +void bag::for_all(Function fn) { + m_comm.barrier(); + local_for_all(fn); +} + +template +void bag::clear() { + m_comm.barrier(); + m_local_bag.clear(); +} + +template +typename bag::size_type bag::size() { + m_comm.barrier(); + return m_comm.all_reduce_sum(m_local_bag.size()); +} + +template +typename bag::size_type bag::local_size() { + return m_local_bag.size(); +} + +template +void bag::rebalance() { + m_comm.barrier(); + + // Find current rank's prefix val and desired target size + size_t prefix_val = ygm::prefix_sum(local_size(), m_comm); + size_t target_size = std::ceil((size() * 1.0) / m_comm.size()); + + // Init to_send array where index is dest and value is the num to send + // int to_send[m_comm.size()] = {0}; + std::unordered_map to_send; + + auto global_size = size(); + size_t small_block_size = global_size / m_comm.size(); + size_t large_block_size = + global_size / m_comm.size() + ((global_size / m_comm.size()) > 0); + + for (size_t i = 0; i < local_size(); i++) { + size_t idx = prefix_val + i; + size_t target_rank; + + // Determine target rank to match partitioning in ygm::container::array + if (idx < (global_size % m_comm.size()) * large_block_size) { + target_rank = idx / large_block_size; + } else { + target_rank = (global_size % m_comm.size()) + + (idx - (global_size % m_comm.size()) * large_block_size) / + small_block_size; + } + + if (target_rank != m_comm.rank()) { + to_send[target_rank]++; + } + } + m_comm.barrier(); + + // Build and send bag indexes as calculated by to_send + for (auto &kv_pair : to_send) { + async_insert(local_pop(kv_pair.second), kv_pair.first); + } + + m_comm.barrier(); +} + +template +void bag::swap(self_type &s) { + m_comm.barrier(); + m_local_bag.swap(s.m_local_bag); +} + +template +template +void bag::local_shuffle(RandomFunc &r) { + m_comm.barrier(); + std::shuffle(m_local_bag.begin(), m_local_bag.end(), r); +} + +template +void bag::local_shuffle() { + ygm::default_random_engine<> r(m_comm, std::random_device()()); + local_shuffle(r); +} + +template +template +void bag::global_shuffle(RandomFunc &r) { + m_comm.barrier(); + std::vector old_local_bag; + std::swap(old_local_bag, m_local_bag); + + auto send_item = [](auto bag, const value_type &item) { + bag->m_local_bag.push_back(item); + }; + + std::uniform_int_distribution<> distrib(0, m_comm.size() - 1); + for (value_type i : old_local_bag) { + m_comm.async(distrib(r), send_item, pthis, i); + } +} + +template +void bag::global_shuffle() { + ygm::default_random_engine<> r(m_comm, std::random_device()()); + global_shuffle(r); +} + +template +ygm::comm &bag::comm() { + return m_comm; +} + +template +void bag::serialize(const std::string &fname) { + m_comm.barrier(); + std::string rank_fname = fname + std::to_string(m_comm.rank()); + std::ofstream os(rank_fname, std::ios::binary); + cereal::JSONOutputArchive oarchive(os); + oarchive(m_local_bag, m_round_robin, m_comm.size()); +} + +template +void bag::deserialize(const std::string &fname) { + m_comm.barrier(); + + std::string rank_fname = fname + std::to_string(m_comm.rank()); + std::ifstream is(rank_fname, std::ios::binary); + + cereal::JSONInputArchive iarchive(is); + int comm_size; + iarchive(m_local_bag, m_round_robin, comm_size); + + if (comm_size != m_comm.size()) { + m_comm.cerr0( + "Attempting to deserialize bag_impl using communicator of " + "different size than serialized with"); + } +} + +template +template +void bag::local_for_all(Function fn) { + if constexpr (ygm::detail::is_std_pair) { + local_for_all_pair_types(fn); // pairs get special handling + } else { + if constexpr (std::is_invocable()) { + std::for_each(m_local_bag.begin(), m_local_bag.end(), fn); + } else { + static_assert(ygm::detail::always_false<>, + "local bag lambdas must be invocable with (value_type &) " + "signatures"); + } + } +} + +template +std::vector::value_type> +bag::gather_to_vector(int dest) { + std::vector result; + auto p_res = m_comm.make_ygm_ptr(result); + m_comm.barrier(); + auto gatherer = [](auto res, const std::vector &outer_data) { + res->insert(res->end(), outer_data.begin(), outer_data.end()); + }; + m_comm.async(dest, gatherer, p_res, m_local_bag); + m_comm.barrier(); + return result; +} + +template +std::vector::value_type> +bag::gather_to_vector() { + std::vector result; + auto p_res = m_comm.make_ygm_ptr(result); + m_comm.barrier(); + auto result0 = gather_to_vector(0); + if (m_comm.rank0()) { + auto distribute = [](auto res, const std::vector &data) { + res->insert(res->end(), data.begin(), data.end()); + }; + m_comm.async_bcast(distribute, p_res, result0); + } + m_comm.barrier(); + return result; +} + +template +std::vector::value_type> bag::local_pop( + int n) { + ASSERT_RELEASE(n <= local_size()); + + size_t new_size = local_size() - n; + auto pop_start = m_local_bag.begin() + new_size; + std::vector ret; + ret.assign(pop_start, m_local_bag.end()); + m_local_bag.resize(new_size); + return ret; +} + +template +template +void bag::local_for_all_pair_types(Function fn) { + if constexpr (std::is_invocable()) { + std::for_each(m_local_bag.begin(), m_local_bag.end(), fn); + } else if constexpr (std::is_invocable()) { + for (auto &kv : m_local_bag) { + fn(kv.first, kv.second); + } + } else { + static_assert(ygm::detail::always_false<>, + "local bag lambdas must be invocable with (pair &) " + "or (pair::first_type &, pair::second_type &) signatures"); + } +} + +} // namespace ygm::container diff --git a/include/ygm/container/detail/bag_impl.hpp b/include/ygm/container/detail/bag_impl.hpp deleted file mode 100644 index 445f019a..00000000 --- a/include/ygm/container/detail/bag_impl.hpp +++ /dev/null @@ -1,120 +0,0 @@ -// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM -// Project Developers. See the top-level COPYRIGHT file for details. -// -// SPDX-License-Identifier: MIT - -#pragma once -#include -#include -#include -#include -#include - -namespace ygm::container::detail { -template > -class bag_impl { - public: - using value_type = Item; - using self_type = bag_impl; - - bag_impl(ygm::comm &comm) : m_comm(comm), pthis(this) { pthis.check(m_comm); } - - ~bag_impl() { m_comm.barrier(); } - - void async_insert(const value_type &item) { - auto inserter = [](auto mailbox, auto map, const value_type &item) { - map->m_local_bag.push_back(item); - }; - int dest = (m_round_robin++ + m_comm.rank()) % m_comm.size(); - m_comm.async(dest, inserter, pthis, item); - } - - template - void for_all(Function fn) { - m_comm.barrier(); - local_for_all(fn); - } - - void clear() { - m_comm.barrier(); - m_local_bag.clear(); - } - - size_t size() { - m_comm.barrier(); - return m_comm.all_reduce_sum(m_local_bag.size()); - } - - void swap(self_type &s) { - m_comm.barrier(); - m_local_bag.swap(s.m_local_bag); - } - - ygm::comm &comm() { return m_comm; } - - void serialize(const std::string &fname) { - m_comm.barrier(); - std::string rank_fname = fname + std::to_string(m_comm.rank()); - std::ofstream os(rank_fname, std::ios::binary); - cereal::JSONOutputArchive oarchive(os); - oarchive(m_local_bag, m_round_robin, m_comm.size()); - } - - void deserialize(const std::string &fname) { - m_comm.barrier(); - - std::string rank_fname = fname + std::to_string(m_comm.rank()); - std::ifstream is(rank_fname, std::ios::binary); - - cereal::JSONInputArchive iarchive(is); - int comm_size; - iarchive(m_local_bag, m_round_robin, comm_size); - - if (comm_size != m_comm.size()) { - m_comm.cerr0( - "Attempting to deserialize bag_impl using communicator of " - "different size than serialized with"); - } - } - - template - void local_for_all(Function fn) { - std::for_each(m_local_bag.begin(), m_local_bag.end(), fn); - } - - - std::vector gather_to_vector(int dest) { - std::vector result; - auto p_res = m_comm.make_ygm_ptr(result); - m_comm.barrier(); - auto gatherer = [](auto res, const std::vector &outer_data) { - res->insert(res->end(), outer_data.begin(), outer_data.end()); - }; - m_comm.async(dest, gatherer, p_res, m_local_bag); - m_comm.barrier(); - return result; - } - - std::vector gather_to_vector() { - std::vector result; - auto p_res = m_comm.make_ygm_ptr(result); - m_comm.barrier(); - auto result0 = gather_to_vector(0); - if(m_comm.rank0()){ - auto distribute = [](auto res, const std::vector &data) { - res->insert(res->end(), data.begin(), data.end()); - }; - m_comm.async_bcast(distribute, p_res, result0); - } - m_comm.barrier(); - return result; - } - - - protected: - size_t m_round_robin = 0; - ygm::comm m_comm; - std::vector m_local_bag; - typename ygm::ygm_ptr pthis; -}; -} // namespace ygm::container::detail diff --git a/include/ygm/container/detail/disjoint_set_impl.hpp b/include/ygm/container/detail/disjoint_set_impl.hpp index b7f66441..3bb05c32 100644 --- a/include/ygm/container/detail/disjoint_set_impl.hpp +++ b/include/ygm/container/detail/disjoint_set_impl.hpp @@ -4,279 +4,429 @@ // SPDX-License-Identifier: MIT #pragma once +#include #include #include +#include #include #include +#include namespace ygm::container::detail { template class disjoint_set_impl { public: - using self_type = disjoint_set_impl; - using self_ygm_ptr_type = typename ygm::ygm_ptr; - using value_type = Item; + class rank_parent_t; + using self_type = disjoint_set_impl; + using self_ygm_ptr_type = typename ygm::ygm_ptr; + using value_type = Item; + using size_type = size_t; + using ygm_for_all_types = std::tuple; + using ygm_container_type = ygm::container::disjoint_set_tag; + using rank_type = int16_t; + using parent_map_type = std::map; Partitioner partitioner; + class rank_parent_t { + public: + rank_parent_t() : m_rank{-1} {} + + rank_parent_t(const rank_type rank, const value_type &parent) + : m_rank(rank), m_parent(parent) {} + + bool increase_rank(rank_type new_rank) { + if (new_rank > m_rank) { + m_rank = new_rank; + return true; + } else { + return false; + } + } + + void set_parent(const value_type &new_parent) { m_parent = new_parent; } + + const rank_type get_rank() const { return m_rank; } + const value_type &get_parent() const { return m_parent; } + + template + void serialize(Archive &ar) { + ar(m_parent, m_rank); + } + + private: + rank_type m_rank; + value_type m_parent; + }; + disjoint_set_impl(ygm::comm &comm) : m_comm(comm), pthis(this) { pthis.check(m_comm); } ~disjoint_set_impl() { m_comm.barrier(); } + typename ygm::ygm_ptr get_ygm_ptr() const { return pthis; } + + template + void async_visit(const value_type &item, Visitor visitor, + const VisitorArgs &...args) { + int dest = owner(item); + auto visit_wrapper = [](auto p_dset, const value_type &item, + const VisitorArgs &...args) { + auto rank_parent_pair_iter = p_dset->m_local_item_parent_map.find(item); + if (rank_parent_pair_iter == p_dset->m_local_item_parent_map.end()) { + rank_parent_t new_ranked_item = rank_parent_t(0, item); + rank_parent_pair_iter = + p_dset->m_local_item_parent_map + .insert(std::make_pair(item, new_ranked_item)) + .first; + } + Visitor *vis = nullptr; + + ygm::meta::apply_optional( + *vis, std::make_tuple(p_dset), + std::forward_as_tuple(*rank_parent_pair_iter, args...)); + }; + + m_comm.async(dest, visit_wrapper, pthis, item, + std::forward(args)...); + } + void async_union(const value_type &a, const value_type &b) { + static auto update_parent_lambda = [](auto &item_info, + const value_type &new_parent) { + item_info.second.set_parent(new_parent); + }; + + static auto resolve_merge_lambda = [](auto p_dset, auto &item_info, + const value_type &merging_item, + const rank_type merging_rank) { + const auto &my_item = item_info.first; + const auto my_rank = item_info.second.get_rank(); + const auto &my_parent = item_info.second.get_parent(); + ASSERT_RELEASE(my_rank >= merging_rank); + + if (my_rank > merging_rank) { + return; + } else { + ASSERT_RELEASE(my_rank == merging_rank); + if (my_parent == + my_item) { // Merging new item onto root. Need to increase rank. + item_info.second.increase_rank(merging_rank + 1); + } else { // Tell merging item about new parent + p_dset->async_visit( + merging_item, + [](auto &item_info, const value_type &new_parent) { + item_info.second.set_parent(new_parent); + }, + my_parent); + } + } + }; + // Walking up parent trees can be expressed as a recursive operation struct simul_parent_walk_functor { - void operator()(self_ygm_ptr_type pdset, const value_type &my_item, - const value_type &other_item) { - const auto my_parent = pdset->local_get_parent(my_item); + void operator()(self_ygm_ptr_type p_dset, + std::pair &my_item_info, + const value_type &my_child, + const value_type &other_parent, + const value_type &other_item, + const rank_type other_rank) { + // Note: other_item needs rank info for comparison with my_item's + // parent. All others need rank and item to determine if other_item + // has been visited/initialized. + + const value_type &my_item = my_item_info.first; + const rank_type &my_rank = my_item_info.second.get_rank(); + const value_type &my_parent = my_item_info.second.get_parent(); + + // Path splitting + if (my_child != my_item) { + p_dset->async_visit(my_child, update_parent_lambda, my_parent); + } - // Found root - if (my_parent == my_item) { - pdset->local_set_parent(my_item, other_item); + if (my_parent == other_parent || my_parent == other_item) { return; } - // Switch branches - if (my_parent < other_item) { - int dest = pdset->owner(other_item); - pdset->comm().async(dest, simul_parent_walk_functor(), pdset, - other_item, my_parent); - } - // Keep walking up current branch - else if (my_parent > other_item) { - pdset->local_set_parent(my_item, other_item); // Splicing - int dest = pdset->owner(my_parent); - pdset->comm().async(dest, simul_parent_walk_functor(), pdset, - my_parent, other_item); - } - // Paths converged. Sets were already merged. - else { - return; + if (my_rank > other_rank) { // Other path has lower rank + p_dset->async_visit(other_parent, simul_parent_walk_functor(), + other_item, my_parent, my_item, my_rank); + } else if (my_rank == other_rank) { + if (my_parent == my_item) { // At a root + + if (my_item < other_parent) { // Need to break ties in rank before + // merging to avoid cycles of merges + // creating cycles in disjoint set + // Perform merge + my_item_info.second.set_parent( + other_parent); // other_parent may be of same rank as my_item + p_dset->async_visit(other_parent, resolve_merge_lambda, my_item, + my_rank); + } else { + // Switch to other path to attempt merge + p_dset->async_visit(other_parent, simul_parent_walk_functor(), + other_item, my_parent, my_item, my_rank); + } + } else { // Not at a root + // Continue walking current path + p_dset->async_visit(my_parent, simul_parent_walk_functor(), my_item, + other_parent, other_item, other_rank); + } + } else { // Current path has lower rank + if (my_parent == my_item) { // At a root + my_item_info.second.set_parent( + other_parent); // Safe to attach to other path + } else { // Not at a root + // Continue walking current path + p_dset->async_visit(my_parent, simul_parent_walk_functor(), my_item, + other_parent, other_item, other_rank); + } } } }; - // Visit a first - if (a > b) { - int main_dest = owner(a); - int sub_dest = owner(b); - m_comm.async(main_dest, simul_parent_walk_functor(), pthis, a, b); - // Side-effect of looking up parent of b is setting b's parent to be - // itself if b has no parent - m_comm.async( - sub_dest, - [](self_ygm_ptr_type pdset, const value_type &item) { - pdset->local_get_parent(item); - }, - pthis, b); - } - // Visit b first - else if (a < b) { - int main_dest = owner(b); - int sub_dest = owner(a); - m_comm.async(main_dest, simul_parent_walk_functor(), pthis, b, a); - m_comm.async( - sub_dest, - [](self_ygm_ptr_type pdset, const value_type &item) { - pdset->local_get_parent(item); - }, - pthis, a); - } else { - // Set item as own parent - m_comm.async( - owner(a), - [](self_ygm_ptr_type pdset, const value_type &item) { - pdset->local_get_parent(item); - }, - pthis, a); - } + async_visit(a, simul_parent_walk_functor(), a, b, b, -1); } template void async_union_and_execute(const value_type &a, const value_type &b, - Function fn, const FunctionArgs &... args) { - // Walking up parent trees can be expressed as a recursive operation - struct simul_parent_walk_functor { - void operator()(self_ygm_ptr_type pdset, const value_type &my_item, - const value_type &other_item, const value_type &orig_a, - const value_type &orig_b, const FunctionArgs &... args) { - const auto my_parent = pdset->local_get_parent(my_item); - - // Found root - if (my_parent == my_item) { - pdset->local_set_parent(my_item, other_item); + Function fn, const FunctionArgs &...args) { + static auto update_parent_lambda = [](auto &item_info, + const value_type &new_parent) { + item_info.second.set_parent(new_parent); + }; - // Perform user function after merge - Function *f = nullptr; - ygm::meta::apply_optional( - *f, std::make_tuple(pdset), - std::forward_as_tuple(orig_a, orig_b, args...)); + static auto resolve_merge_lambda = [](auto p_dset, auto &item_info, + const value_type &merging_item, + const rank_type merging_rank) { + const auto &my_item = item_info.first; + const auto my_rank = item_info.second.get_rank(); + const auto &my_parent = item_info.second.get_parent(); + ASSERT_RELEASE(my_rank >= merging_rank); - return; + if (my_rank > merging_rank) { + return; + } else { + ASSERT_RELEASE(my_rank == merging_rank); + if (my_parent == my_item) { // Has not found new parent + item_info.second.increase_rank(merging_rank + 1); + } else { // Tell merging item about new parent + p_dset->async_visit( + merging_item, + [](auto &item_info, const value_type &new_parent) { + item_info.second.set_parent(new_parent); + }, + my_parent); } + } + }; - // Switch branches - if (my_parent < other_item) { - int dest = pdset->owner(other_item); - pdset->comm().async(dest, simul_parent_walk_functor(), pdset, - other_item, my_parent, orig_a, orig_b, args...); - } - // Keep walking up current branch - else if (my_parent > other_item) { - pdset->local_set_parent(my_item, other_item); // Splicing - int dest = pdset->owner(my_parent); - pdset->comm().async(dest, simul_parent_walk_functor(), pdset, - my_parent, other_item, orig_a, orig_b, args...); + // Walking up parent trees can be expressed as a recursive operation + struct simul_parent_walk_functor { + void operator()(self_ygm_ptr_type p_dset, + std::pair &my_item_info, + const value_type &my_child, + const value_type &other_parent, + const value_type &other_item, const rank_type other_rank, + const value_type &orig_a, const value_type &orig_b, + const FunctionArgs &...args) { + // Note: other_item needs rank info for comparison with my_item's + // parent. All others need rank and item to determine if other_item + // has been visited/initialized. + + const value_type &my_item = my_item_info.first; + const rank_type &my_rank = my_item_info.second.get_rank(); + const value_type &my_parent = my_item_info.second.get_parent(); + + // Path splitting + if (my_child != my_item) { + p_dset->async_visit(my_child, update_parent_lambda, my_parent); } - // Paths converged. Sets were already merged. - else { + + if (my_parent == other_parent || my_parent == other_item) { return; } + + if (my_rank > other_rank) { // Other path has lower rank + p_dset->async_visit(other_parent, simul_parent_walk_functor(), + other_item, my_parent, my_item, my_rank, orig_a, + orig_b, args...); + } else if (my_rank == other_rank) { + if (my_parent == my_item) { // At a root + + if (my_item < other_parent) { // Need to break ties in rank before + // merging to avoid cycles of merges + // creating cycles in disjoint set + // Perform merge + my_item_info.second.set_parent( + other_parent); // Guaranteed any path through current + // item will find an item with rank >= + // my_rank+1 by going to other_parent + + // Perform user function after merge + Function *f = nullptr; + if constexpr (std::is_invocable() || + std::is_invocable()) { + ygm::meta::apply_optional( + *f, std::make_tuple(p_dset), + std::forward_as_tuple(orig_a, orig_b, args...)); + } else { + static_assert( + ygm::detail::always_false<>, + "remote disjoint_set lambda signature must be invocable " + "with (const value_type &, const value_type &) signature"); + } + + return; + + p_dset->async_visit(other_parent, resolve_merge_lambda, my_item, + my_rank); + } else { + // Switch to other path to attempt merge + p_dset->async_visit(other_parent, simul_parent_walk_functor(), + other_item, my_parent, my_item, my_rank, + orig_a, orig_b, args...); + } + } else { // Not at a root + // Continue walking current path + p_dset->async_visit(my_parent, simul_parent_walk_functor(), my_item, + other_parent, other_item, other_rank, orig_a, + orig_b, args...); + } + } else { // Current path has lower rank + if (my_parent == my_item) { // At a root + my_item_info.second.set_parent( + other_parent); // Safe to attach to other path + + // Perform user function after merge + Function *f = nullptr; + ygm::meta::apply_optional( + *f, std::make_tuple(p_dset), + std::forward_as_tuple(orig_a, orig_b, args...)); + + return; + + } else { // Not at a root + // Continue walking current path + p_dset->async_visit(my_parent, simul_parent_walk_functor(), my_item, + other_parent, other_item, other_rank, orig_a, + orig_b, args...); + } + } } }; - // Visit a first - if (a > b) { - int main_dest = owner(a); - int sub_dest = owner(b); - m_comm.async(main_dest, simul_parent_walk_functor(), pthis, a, b, a, b, - args...); - // Side-effect of looking up parent of b is setting b's parent to be - // itself if b has no parent - m_comm.async( - sub_dest, - [](self_ygm_ptr_type pdset, const value_type &item) { - pdset->local_get_parent(item); - }, - pthis, b); - } - // Visit b first - else if (a < b) { - int main_dest = owner(b); - int sub_dest = owner(a); - m_comm.async(main_dest, simul_parent_walk_functor(), pthis, b, a, a, b, - args...); - m_comm.async( - sub_dest, - [](self_ygm_ptr_type pdset, const value_type &item) { - pdset->local_get_parent(item); - }, - pthis, a); - } else { - // Set item as own parent - m_comm.async( - owner(a), - [](self_ygm_ptr_type pdset, const value_type &item) { - pdset->local_get_parent(item); - }, - pthis, a); - } + async_visit(a, simul_parent_walk_functor(), a, b, b, -1, a, b, args...); } void all_compress() { - m_comm.barrier(); - - static std::set active_set; - static std::vector active_set_to_remove; - // parents being looked up -> vector, - // grandparent (if returned), active parent (if returned), lookup returned - // flag - static std::map, value_type, bool, bool>> - parent_lookup_map; - - active_set.clear(); - active_set_to_remove.clear(); - parent_lookup_map.clear(); - - auto find_grandparent_lambda = [](auto p_dset, const value_type &parent, - const int inquiring_rank) { - const value_type &grandparent = p_dset->local_get_parent(parent); - - if (active_set.count(parent)) { - p_dset->comm().async( - inquiring_rank, - [](auto p_dset, const value_type &parent, - const value_type &grandparent) { - auto &inquiry_tuple = parent_lookup_map[parent]; - std::get<1>(inquiry_tuple) = grandparent; - std::get<2>(inquiry_tuple) = true; - std::get<3>(inquiry_tuple) = true; - - // Process all waiting lookups - auto &child_vec = std::get<0>(inquiry_tuple); - for (const auto &child : child_vec) { - p_dset->local_set_parent(child, grandparent); - } + struct rep_query { + value_type rep; + std::vector local_inquiring_items; + }; - child_vec.clear(); - }, - p_dset, parent, grandparent); - } else { - p_dset->comm().async( - inquiring_rank, - [](auto p_dset, const value_type &parent, - const value_type &grandparent) { - auto &inquiry_tuple = parent_lookup_map[parent]; - std::get<1>(inquiry_tuple) = grandparent; - std::get<2>(inquiry_tuple) = true; - std::get<3>(inquiry_tuple) = false; - - // Process all waiting lookups - auto &child_vec = std::get<0>(inquiry_tuple); - for (const auto &child : child_vec) { - p_dset->local_set_parent(child, grandparent); - active_set_to_remove.push_back(child); - } + struct item_status { + bool found_root; + std::vector held_responses; + }; - child_vec.clear(); - }, - p_dset, parent, grandparent); + static rank_type level; + static std::unordered_map queries; + static std::unordered_map + local_item_status; // For holding incoming queries while my items are + // waiting for their representatives (only needed + // for when parent rank is same as mine) + + struct update_rep_functor { + public: + void operator()(self_ygm_ptr_type p_dset, const value_type &parent, + const value_type &rep) { + auto &local_rep_query = queries.at(parent); + local_rep_query.rep = rep; + + for (const auto &local_item : local_rep_query.local_inquiring_items) { + p_dset->local_set_parent(local_item, rep); + + // Forward rep for any held responses + auto local_item_statuses_iter = local_item_status.find(local_item); + if (local_item_statuses_iter != local_item_status.end()) { + for (int dest : local_item_statuses_iter->second.held_responses) { + p_dset->comm().async(dest, update_rep_functor(), p_dset, + local_item, rep); + } + local_item_statuses_iter->second.found_root = true; + local_item_statuses_iter->second.held_responses.clear(); + } + } + local_rep_query.local_inquiring_items.clear(); } }; - // Initialize active set to contain all non-roots - for (const auto &item_parent_pair : m_local_item_parent_map) { - if (item_parent_pair.first != item_parent_pair.second) { - active_set.emplace(item_parent_pair.first); + auto query_rep_lambda = [](self_ygm_ptr_type p_dset, const value_type &item, + int inquiring_rank) { + const auto &item_info = p_dset->m_local_item_parent_map[item]; + + if (item_info.get_rank() > level) { + const value_type &rep = item_info.get_parent(); + + p_dset->comm().async(inquiring_rank, update_rep_functor(), p_dset, item, + rep); + } else { // May need to hold because this item is in the current level + auto local_item_status_iter = local_item_status.find(item); + // If query is ongoing for my parent, hold response + if ((local_item_status_iter != local_item_status.end()) && + (local_item_status_iter->second.found_root == false)) { + local_item_status[item].held_responses.push_back(inquiring_rank); + } else { + p_dset->comm().async(inquiring_rank, update_rep_functor(), p_dset, + item, item_info.get_parent()); + } } - } + }; - while (m_comm.all_reduce_sum(active_set.size())) { - for (const auto &item : active_set) { - const value_type &parent = local_get_parent(item); - - auto parent_lookup_iter = parent_lookup_map.find(parent); - // Already seen this parent - if (parent_lookup_iter != parent_lookup_map.end()) { - // Already found grandparent - if (std::get<2>(parent_lookup_iter->second)) { - local_set_parent(item, std::get<1>(parent_lookup_iter->second)); - if (!std::get<3>(parent_lookup_iter->second)) { - active_set_to_remove.push_back(item); - } - } else { // Grandparent hasn't returned yet - std::get<0>(parent_lookup_iter->second).push_back(item); + m_comm.barrier(); + + level = max_rank(); + while (level >= 0) { + queries.clear(); + local_item_status.clear(); + + // Prepare all queries for this round + for (const auto &[local_item, item_info] : m_local_item_parent_map) { + if (item_info.get_rank() == level && + item_info.get_parent() != local_item) { + local_item_status[local_item].found_root = false; + + auto query_iter = queries.find(item_info.get_parent()); + if (query_iter == queries.end()) { // Have not queried for parent's + // rep. Begin new query. + auto &new_query = queries[item_info.get_parent()]; + new_query.rep = item_info.get_parent(); + new_query.local_inquiring_items.push_back(local_item); + + } else { + query_iter->second.local_inquiring_items.push_back(local_item); } - } else { // Need to look up grandparent - parent_lookup_map.emplace(std::make_pair( - parent, std::make_tuple(std::vector({item}), parent, - false, true))); - - const int dest = owner(parent); - m_comm.async(dest, find_grandparent_lambda, pthis, parent, - m_comm.rank()); } } - m_comm.barrier(); - for (const auto &item : active_set_to_remove) { - active_set.erase(item); + m_comm.cf_barrier(); + + // Start all queries for this round + for (const auto &[item, query] : queries) { + int dest = owner(item); + m_comm.async(dest, query_rep_lambda, pthis, item, m_comm.rank()); } - active_set_to_remove.clear(); - parent_lookup_map.clear(); + + m_comm.barrier(); + + --level; } } @@ -284,8 +434,18 @@ class disjoint_set_impl { void for_all(Function fn) { all_compress(); - std::for_each(m_local_item_parent_map.begin(), - m_local_item_parent_map.end(), fn); + if constexpr (std::is_invocable()) { + const auto end = m_local_item_parent_map.end(); + for (auto iter = m_local_item_parent_map.begin(); iter != end; ++iter) { + const auto &[item, rank_parent_pair] = *iter; + fn(item, rank_parent_pair.get_parent()); + } + } else { + static_assert(ygm::detail::always_false<>, + "local disjoint_set lambda signature must be invocable " + "with (const value_type &, const value_type &) signature"); + } } std::map all_find( @@ -317,7 +477,7 @@ class disjoint_set_impl { pdset->comm().async( source_rank, [](ygm_ptr p_to_return, - const value_type & source_item, + const value_type &source_item, const value_type &rep) { (*p_to_return)[source_item] = rep; }, p_to_return, source_item, parent); } else { @@ -328,7 +488,7 @@ class disjoint_set_impl { } }; - for (size_t i = 0; i < items.size(); ++i) { + for (size_type i = 0; i < items.size(); ++i) { int dest = owner(items[i]); m_comm.async(dest, find_rep_functor(), pthis, p_to_return, items[i], m_comm.rank(), items[i]); @@ -338,20 +498,26 @@ class disjoint_set_impl { return to_return; } - size_t size() { + void clear() { + m_comm.barrier(); + m_local_item_parent_map.clear(); + } + + size_type size() { m_comm.barrier(); return m_comm.all_reduce_sum(m_local_item_parent_map.size()); } - size_t num_sets() { + size_type num_sets() { m_comm.barrier(); size_t num_local_sets{0}; for (const auto &item_parent_pair : m_local_item_parent_map) { - if (item_parent_pair.first == item_parent_pair.second) { + if (item_parent_pair.first == item_parent_pair.second.get_parent()) { ++num_local_sets; } } return m_comm.all_reduce_sum(num_local_sets); + return 0; } int owner(const value_type &item) const { @@ -370,15 +536,38 @@ class disjoint_set_impl { // Create new set if item is not found if (itr == m_local_item_parent_map.end()) { - m_local_item_parent_map.insert(std::make_pair(item, item)); - return item; + m_local_item_parent_map.insert( + std::make_pair(item, rank_parent_t(0, item))); + return m_local_item_parent_map[item].get_parent(); } else { - return itr->second; + return itr->second.get_parent(); } + return m_local_item_parent_map[item].get_parent(); + } + + const rank_type local_get_rank(const value_type &item) { + ASSERT_DEBUG(is_mine(item) == true); + + auto itr = m_local_item_parent_map.find(item); + + if (itr != m_local_item_parent_map.end()) { + return itr->second.get_rank(); + } + return 0; } void local_set_parent(const value_type &item, const value_type &parent) { - m_local_item_parent_map[item] = parent; + m_local_item_parent_map[item].set_parent(parent); + } + + rank_type max_rank() { + rank_type local_max{0}; + + for (const auto &local_item : m_local_item_parent_map) { + local_max = std::max(local_max, local_item.second.get_rank()); + } + + return m_comm.all_reduce_max(local_max); } ygm::comm &comm() { return m_comm; } @@ -386,8 +575,8 @@ class disjoint_set_impl { protected: disjoint_set_impl() = delete; - ygm::comm m_comm; - self_ygm_ptr_type pthis; - std::map m_local_item_parent_map; + ygm::comm &m_comm; + self_ygm_ptr_type pthis; + parent_map_type m_local_item_parent_map; }; } // namespace ygm::container::detail diff --git a/include/ygm/container/detail/map_impl.hpp b/include/ygm/container/detail/map_impl.hpp index 68c8c90d..d424e757 100644 --- a/include/ygm/container/detail/map_impl.hpp +++ b/include/ygm/container/detail/map_impl.hpp @@ -9,9 +9,11 @@ #include #include #include +#include #include #include #include +#include namespace ygm::container::detail { @@ -21,9 +23,13 @@ template >> class map_impl { public: - using self_type = map_impl; - using value_type = Value; - using key_type = Key; + using self_type = map_impl; + using ptr_type = typename ygm::ygm_ptr; + using mapped_type = Value; + using key_type = Key; + using size_type = size_t; + using ygm_for_all_types = std::tuple; + using ygm_container_type = ygm::container::map_tag; Partitioner partitioner; @@ -31,7 +37,7 @@ class map_impl { pthis.check(m_comm); } - map_impl(ygm::comm &comm, const value_type &dv) + map_impl(ygm::comm &comm, const mapped_type &dv) : m_default_value(dv), m_comm(comm), pthis(this) { pthis.check(m_comm); } @@ -44,9 +50,9 @@ class map_impl { ~map_impl() { m_comm.barrier(); } - void async_insert_unique(const key_type &key, const value_type &value) { + void async_insert_unique(const key_type &key, const mapped_type &value) { auto inserter = [](auto mailbox, auto map, const key_type &key, - const value_type &value) { + const mapped_type &value) { auto itr = map->m_local_map.find(key); if (itr != map->m_local_map.end()) { itr->second = value; @@ -58,16 +64,16 @@ class map_impl { m_comm.async(dest, inserter, pthis, key, value); } - void async_insert_if_missing(const key_type &key, const value_type &value) { + void async_insert_if_missing(const key_type &key, const mapped_type &value) { async_insert_if_missing_else_visit( key, value, - [](const std::pair &kv, - const value_type &new_value) {}); + [](const key_type &k, const mapped_type &v, + const mapped_type &new_value) {}); } - void async_insert_multi(const key_type &key, const value_type &value) { + void async_insert_multi(const key_type &key, const mapped_type &value) { auto inserter = [](auto mailbox, auto map, const key_type &key, - const value_type &value) { + const mapped_type &value) { map->m_local_map.insert(std::make_pair(key, value)); }; int dest = owner(key); @@ -134,13 +140,13 @@ class map_impl { } template - void async_insert_if_missing_else_visit(const key_type &key, - const value_type &value, - Visitor visitor, + void async_insert_if_missing_else_visit(const key_type &key, + const mapped_type &value, + Visitor visitor, const VisitorArgs &...args) { int dest = owner(key); auto insert_else_visit_wrapper = [](auto pmap, const key_type &key, - const value_type &value, + const mapped_type &value, const VisitorArgs &...args) { auto itr = pmap->m_local_map.find(key); if (itr == pmap->m_local_map.end()) { @@ -155,6 +161,24 @@ class map_impl { std::forward(args)...); } + template + void async_reduce(const key_type &key, const mapped_type &value, + ReductionOp reducer) { + int dest = owner(key); + auto reduce_wrapper = [](auto pmap, const key_type &key, + const mapped_type &value) { + auto itr = pmap->m_local_map.find(key); + if (itr == pmap->m_local_map.end()) { + pmap->m_local_map.insert(std::make_pair(key, value)); + } else { + ReductionOp *reducer = nullptr; + itr->second = (*reducer)(itr->second, value); + } + }; + + m_comm.async(dest, reduce_wrapper, pthis, key, value); + } + void async_erase(const key_type &key) { int dest = owner(key); auto erase_wrapper = [](auto pcomm, auto pmap, const key_type &key) { @@ -177,7 +201,7 @@ class map_impl { m_local_map.clear(); } - size_t size() { + size_type size() { m_comm.barrier(); return m_comm.all_reduce_sum(m_local_map.size()); } @@ -202,7 +226,7 @@ class map_impl { auto fetcher = [](auto pcomm, int from, const key_type &key, auto pmap, auto pcont) { auto returner = [](auto pcomm, const key_type &key, - const std::vector &values, auto pcont) { + const std::vector &values, auto pcont) { for (const auto &v : values) { pcont->insert(std::make_pair(key, v)); } @@ -255,8 +279,8 @@ class map_impl { return owner(key) == m_comm.rank(); } - std::vector local_get(const key_type &key) { - std::vector to_return; + std::vector local_get(const key_type &key) { + std::vector to_return; auto range = m_local_map.equal_range(key); for (auto itr = range.first; itr != range.second; ++itr) { @@ -272,9 +296,20 @@ class map_impl { ygm::detail::interrupt_mask mask(m_comm); auto range = m_local_map.equal_range(key); - for (auto itr = range.first; itr != range.second; ++itr) { - ygm::meta::apply_optional(fn, std::make_tuple(pthis), - std::forward_as_tuple(*itr, args...)); + if constexpr (std::is_invocable() || + std::is_invocable()) { + for (auto itr = range.first; itr != range.second; ++itr) { + ygm::meta::apply_optional( + fn, std::make_tuple(pthis), + std::forward_as_tuple(itr->first, itr->second, args...)); + } + } else { + static_assert(ygm::detail::always_false<>, + "remote map lambda signature must be invocable with (const " + "&key_type, mapped_type&, ...) or (ptr_type, const " + "&key_type, mapped_type&, ...) signatures"); } } @@ -282,7 +317,7 @@ class map_impl { void local_clear() { m_local_map.clear(); } - size_t local_size() const { return m_local_map.size(); } + size_type local_size() const { return m_local_map.size(); } size_t local_const(const key_type &k) const { return m_local_map.count(k); } @@ -290,13 +325,22 @@ class map_impl { template void local_for_all(Function fn) { - std::for_each(m_local_map.begin(), m_local_map.end(), fn); + if constexpr (std::is_invocable()) { + for (std::pair &kv : m_local_map) { + fn(kv.first, kv.second); + } + } else { + static_assert(ygm::detail::always_false<>, + "local map lambda signature must be invocable with (const " + "&key_type, mapped_type&) signature"); + } } template - std::vector> topk(size_t k, - CompareFunction cfn) { - using vec_type = std::vector>; + std::vector> topk(size_t k, + CompareFunction cfn) { + using vec_type = std::vector>; m_comm.barrier(); @@ -322,14 +366,14 @@ class map_impl { return to_return; } - const value_type &default_value() const { return m_default_value; } + const mapped_type &default_value() const { return m_default_value; } protected: map_impl() = delete; - value_type m_default_value; - std::multimap m_local_map; - ygm::comm m_comm; - typename ygm::ygm_ptr pthis; + mapped_type m_default_value; + std::multimap m_local_map; + ygm::comm &m_comm; + ptr_type pthis; }; } // namespace ygm::container::detail diff --git a/include/ygm/container/detail/reducing_adapter.hpp b/include/ygm/container/detail/reducing_adapter.hpp new file mode 100644 index 00000000..01d17f33 --- /dev/null +++ b/include/ygm/container/detail/reducing_adapter.hpp @@ -0,0 +1,128 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once +#include +#include +#include + +namespace ygm::container::detail { + +template +class reducing_adapter { + public: + using self_type = reducing_adapter; + using mapped_type = typename Container::mapped_type; + using key_type = typename Container::key_type; + //using value_type = typename Container::value_type; + + const size_t cache_size = 1024 * 1024; + + reducing_adapter(Container &c, ReductionOp reducer) + : m_container(c), m_reducer(reducer), pthis(this) { + pthis.check(c.comm()); + m_cache.resize(cache_size); + } + + ~reducing_adapter() { m_container.comm().barrier(); } + + void async_reduce(const key_type &key, const mapped_type &value) { + cache_reduce(key, value); + } + + private: + struct cache_entry { + key_type key; + mapped_type value; + bool occupied = false; + }; + + void cache_reduce(const key_type &key, const mapped_type &value) { + // Bypass cache if current rank owns key + if (m_container.comm().rank() == m_container.owner(key)) { + container_reduction(key, value); + } else { + if (m_cache_empty) { + m_cache_empty = false; + m_container.comm().register_pre_barrier_callback( + [this]() { this->cache_flush_all(); }); + } + + size_t slot = std::hash{}(key) % cache_size; + + if (m_cache[slot].occupied == false) { + m_cache[slot].key = key; + m_cache[slot].value = value; + m_cache[slot].occupied = true; + } else { // Slot is occupied + if (m_cache[slot].key == key) { + m_cache[slot].value = m_reducer(m_cache[slot].value, value); + } else { + cache_flush(slot); + ASSERT_DEBUG(m_cache[slot].occupied == false); + m_cache[slot].key = key; + m_cache[slot].value = value; + m_cache[slot].occupied = true; + } + } + } + } + + void cache_flush(const size_t slot) { + // Use NLNR for reductions + int next_dest = m_container.comm().router().next_hop( + m_container.owner(m_cache[slot].key), ygm::detail::routing_type::NLNR); + + m_container.comm().async( + next_dest, + [](auto p_reducing_adapter, const key_type &key, + const mapped_type &value) { + p_reducing_adapter->cache_reduce(key, value); + }, + pthis, m_cache[slot].key, m_cache[slot].value); + + m_cache[slot].occupied = false; + } + + void cache_flush_all() { + for (size_t i = 0; i < cache_size; ++i) { + if (m_cache[i].occupied) { + cache_flush(i); + } + } + m_cache_empty = true; + } + + void container_reduction(const key_type &key, const mapped_type &value) { + if constexpr(ygm::container::check_ygm_container_type< + Container, + ygm::container::map_tag>()) { + m_container.async_reduce(key, value, m_reducer); + + } else if constexpr(ygm::container::check_ygm_container_type< + Container, + ygm::container::array_tag>()) { + m_container.async_binary_op_update_value(key, value, m_reducer); + } else { + static_assert(ygm::detail::always_false<>, + "Container unsuitable for reducing_adapter"); + } + } + + std::vector m_cache; + bool m_cache_empty = true; + + Container &m_container; + ReductionOp m_reducer; + typename ygm::ygm_ptr pthis; +}; + +template +reducing_adapter make_reducing_adapter( + Container &c, ReductionOp reducer) { + return reducing_adapter(c, reducer); +} + +} // namespace ygm::container::detail diff --git a/include/ygm/container/detail/set_impl.hpp b/include/ygm/container/detail/set_impl.hpp index 72ccee88..d68b40ea 100644 --- a/include/ygm/container/detail/set_impl.hpp +++ b/include/ygm/container/detail/set_impl.hpp @@ -8,8 +8,10 @@ #include #include #include +#include #include #include +#include namespace ygm::container::detail { template , @@ -17,12 +19,18 @@ template , class Alloc = std::allocator> class set_impl { public: - using self_type = set_impl; - using key_type = Key; + using self_type = set_impl; + using key_type = Key; + using size_type = size_t; + using ygm_container_type = ygm::container::set_tag; Partitioner partitioner; set_impl(ygm::comm &comm) : m_comm(comm), pthis(this) { pthis.check(m_comm); } + set_impl(set_impl &&s) noexcept + : m_comm(s.m_comm), pthis(this), m_local_set(std::move(s.m_local_set)) { + pthis.check(m_comm); + } ~set_impl() { m_comm.barrier(); } @@ -53,18 +61,87 @@ class set_impl { m_comm.async(dest, erase_wrapper, pthis, key); } + template + void async_insert_exe_if_missing(const key_type &key, Visitor visitor, + const VisitorArgs &...args) { + auto insert_and_visit = [](auto mailbox, auto pset, const key_type &key, + const VisitorArgs &...args) { + if (pset->m_local_set.count(key) == 0) { + pset->m_local_set.insert(key); + Visitor *vis = nullptr; + std::apply(*vis, std::forward_as_tuple(key, args...)); + } + }; + int dest = owner(key); + m_comm.async(dest, insert_and_visit, pthis, key, + std::forward(args)...); + } + + template + void async_insert_exe_if_contains(const key_type &key, Visitor visitor, + const VisitorArgs &...args) { + auto insert_and_visit = [](auto mailbox, auto pset, const key_type &key, + const VisitorArgs &...args) { + if (pset->m_local_set.count(key) == 0) { + pset->m_local_set.insert(key); + } else { + Visitor *vis = nullptr; + std::apply(*vis, std::forward_as_tuple(key, args...)); + } + }; + int dest = owner(key); + m_comm.async(dest, insert_and_visit, pthis, key, + std::forward(args)...); + } + + template + void async_exe_if_missing(const key_type &key, Visitor visitor, + const VisitorArgs &...args) { + auto checker = [](auto mailbox, auto pset, const key_type &key, + const VisitorArgs &...args) { + if (pset->m_local_set.count(key) == 0) { + Visitor *vis = nullptr; + std::apply(*vis, std::forward_as_tuple(key, args...)); + } + }; + int dest = owner(key); + m_comm.async(dest, checker, pthis, key, + std::forward(args)...); + } + + template + void async_exe_if_contains(const key_type &key, Visitor visitor, + const VisitorArgs &...args) { + auto checker = [](auto mailbox, auto pset, const key_type &key, + const VisitorArgs &...args) { + if (pset->m_local_set.count(key) == 1) { + Visitor *vis = nullptr; + std::apply(*vis, std::forward_as_tuple(key, args...)); + } + }; + int dest = owner(key); + m_comm.async(dest, checker, pthis, key, + std::forward(args)...); + } + template void for_all(Function fn) { m_comm.barrier(); local_for_all(fn); } + template + void consume_all(Function fn) { + m_comm.barrier(); + local_consume_all(fn); + } + void clear() { m_comm.barrier(); m_local_set.clear(); } - size_t size() { + size_type size() { m_comm.barrier(); return m_comm.all_reduce_sum(m_local_set.size()); } @@ -110,10 +187,30 @@ class set_impl { ygm::comm &comm() { return m_comm; } - // protected: template void local_for_all(Function fn) { - std::for_each(m_local_set.begin(), m_local_set.end(), fn); + if constexpr (std::is_invocable()) { + std::for_each(m_local_set.begin(), m_local_set.end(), fn); + } else { + static_assert(ygm::detail::always_false<>, + "local set lambda signature must be invocable with (const " + "key_type &) signature"); + } + } + + template + void local_consume_all(Function fn) { + if constexpr (std::is_invocable()) { + while (!m_local_set.empty()) { + auto tmp = *(m_local_set.begin()); + m_local_set.erase(m_local_set.begin()); + fn(tmp); + } + } else { + static_assert(ygm::detail::always_false<>, + "local set lambda signature must be invocable with (const " + "key_type &) signature"); + } } int owner(const key_type &key) const { @@ -123,7 +220,7 @@ class set_impl { set_impl() = delete; std::multiset m_local_set; - ygm::comm m_comm; + ygm::comm &m_comm; typename ygm::ygm_ptr pthis; }; } // namespace ygm::container::detail diff --git a/include/ygm/container/disjoint_set.hpp b/include/ygm/container/disjoint_set.hpp index b5afc9b7..29364ff6 100644 --- a/include/ygm/container/disjoint_set.hpp +++ b/include/ygm/container/disjoint_set.hpp @@ -4,27 +4,38 @@ // SPDX-License-Identifier: MIT #pragma once + +#include #include namespace ygm::container { template > class disjoint_set { public: - using self_type = disjoint_set; - using value_type = Item; - using impl_type = detail::disjoint_set_impl; + using self_type = disjoint_set; + using value_type = Item; + using size_type = size_t; + using ygm_for_all_types = std::tuple; + using ygm_container_type = ygm::container::disjoint_set_tag; + using impl_type = detail::disjoint_set_impl; disjoint_set() = delete; disjoint_set(ygm::comm &comm) : m_impl(comm) {} + template + void async_visit(const value_type &item, Visitor visitor, + const VisitorArgs &...args) { + m_impl.async_visit(item, visitor, std::forward(args)...); + } + void async_union(const value_type &a, const value_type &b) { m_impl.async_union(a, b); } template void async_union_and_execute(const value_type &a, const value_type &b, - Function fn, const FunctionArgs &... args) { + Function fn, const FunctionArgs &...args) { m_impl.async_union_and_execute(a, b, fn, std::forward(args)...); } @@ -41,15 +52,17 @@ class disjoint_set { return m_impl.all_find(items); } - size_t size() { return m_impl.size(); } + void clear() { m_impl.clear(); } - size_t num_sets() { return m_impl.num_sets(); } + size_type size() { return m_impl.size(); } + + size_type num_sets() { return m_impl.num_sets(); } typename ygm::ygm_ptr get_ygm_ptr() const { return m_impl.get_ygm_ptr(); - } - -private: + } + + private: impl_type m_impl; }; } // namespace ygm::container diff --git a/include/ygm/container/experimental/detail/adj_impl.hpp b/include/ygm/container/experimental/detail/adj_impl.hpp index 4b737612..ceac190f 100644 --- a/include/ygm/container/experimental/detail/adj_impl.hpp +++ b/include/ygm/container/experimental/detail/adj_impl.hpp @@ -97,9 +97,9 @@ class adj_impl { template void async_visit_if_exists(const key_type &row, const key_type &col, - Visitor visitor, const VisitorArgs &... args) { + Visitor visitor, const VisitorArgs &...args) { auto visit_wrapper = [](auto pcomm, auto padj, const key_type &row, - const key_type &col, const VisitorArgs &... args) { + const key_type &col, const VisitorArgs &...args) { Visitor *vis; padj->local_visit(row, col, *vis, args...); }; @@ -111,10 +111,10 @@ class adj_impl { template void local_visit(const key_type &row, const key_type &col, Function &fn, - const VisitorArgs &... args) { + const VisitorArgs &...args) { /* Fetch the row map, key: col id, value: val. */ inner_map_type &inner_map = m_map[row]; - value_type & value = inner_map[col]; + value_type &value = inner_map[col]; /* Assuming this changes the value at row, col. */ ygm::meta::apply_optional(fn, std::make_tuple(pthis), @@ -123,10 +123,10 @@ class adj_impl { template void async_visit_const(const key_type &key, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { int dest = owner(key); auto visit_wrapper = [](auto pcomm, auto padj, const key_type &key, - const VisitorArgs &... args) { + const VisitorArgs &...args) { Visitor *vis; padj->inner_local_for_all(key, *vis, args...); }; @@ -137,7 +137,7 @@ class adj_impl { template void inner_local_for_all(const key_type &key, Function fn, - const VisitorArgs &... args) { + const VisitorArgs &...args) { auto &inner_map = m_map[key]; for (auto itr = inner_map.begin(); itr != inner_map.end(); ++itr) { key_type outer_key = key; @@ -148,14 +148,14 @@ class adj_impl { } template - void async_insert_if_missing_else_visit(const key_type & row, - const key_type & col, + void async_insert_if_missing_else_visit(const key_type &row, + const key_type &col, const value_type &value, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { auto visit_wrapper = [](auto pcomm, auto padj, const key_type &row, const key_type &col, const value_type &value, - const VisitorArgs &... args) { + const VisitorArgs &...args) { Visitor *vis; padj->local_insert_if_missing_else_visit(row, col, value, *vis, args...); }; @@ -167,10 +167,10 @@ class adj_impl { /* Do we really need a value here? */ template - void local_insert_if_missing_else_visit(const key_type & row, - const key_type & col, + void local_insert_if_missing_else_visit(const key_type &row, + const key_type &col, const value_type &value, Function &fn, - const VisitorArgs &... args) { + const VisitorArgs &...args) { inner_map_type &inner_map = m_map[row]; if (inner_map.find(col) == inner_map.end()) { inner_map.insert(std::make_pair(col, value)); @@ -184,7 +184,7 @@ class adj_impl { template void async_visit_mutate(const key_type &outer_key, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { auto &inner_map = m_map.find(outer_key)->second; for (auto i_itr = inner_map.begin(); i_itr != inner_map.end(); ++i_itr) { auto inner_key = i_itr->first; @@ -203,7 +203,7 @@ class adj_impl { protected: value_type m_default_value; std::map m_map; - ygm::comm m_comm; + ygm::comm &m_comm; typename ygm::ygm_ptr pthis; }; } // namespace ygm::container::experimental::detail diff --git a/include/ygm/container/experimental/detail/algorithms/spmv.hpp b/include/ygm/container/experimental/detail/algorithms/spmv.hpp index e638dc8c..d8cab385 100644 --- a/include/ygm/container/experimental/detail/algorithms/spmv.hpp +++ b/include/ygm/container/experimental/detail/algorithms/spmv.hpp @@ -29,8 +29,8 @@ class times { template ygm::container::map spmv( ygm::container::experimental::maptrix &A, - ygm::container::map & x, - const OpPlus & plus_op = std::plus(), + ygm::container::map &x, + const OpPlus &plus_op = std::plus(), const OpMultiply ×_op = times()) { using key_type = Key; using value_type = Value; @@ -39,20 +39,17 @@ ygm::container::map spmv( map_type y(A.comm()); auto y_ptr = y.get_ygm_ptr(); - auto kv_lambda = [&A, &y_ptr, &plus_op, ×_op](const auto &kv_pair) { - auto &col = kv_pair.first; - auto &col_value = kv_pair.second; - + auto kv_lambda = [&A, &y_ptr, &plus_op, ×_op](const auto &col, + const auto &col_value) { auto csc_visit_lambda = [](const auto &col, const auto &row, const auto &A_value, const auto &x_value, const auto &y_ptr, const auto &plus_op, const auto ×_op) { auto element_wise = times_op(A_value, x_value); - auto update_lambda = [](auto &rv_pair, const auto &update_val, - const auto &plus_op) { - auto row_id = rv_pair.first; - rv_pair.second = plus_op(rv_pair.second, update_val); + auto update_lambda = [](const auto &row_id, auto &row_val, + const auto &update_val, const auto &plus_op) { + row_val = plus_op(row_val, update_val); }; y_ptr->async_insert_if_missing_else_visit(row, element_wise, diff --git a/include/ygm/container/experimental/detail/column_view_impl.hpp b/include/ygm/container/experimental/detail/column_view_impl.hpp index cca06215..6beae517 100644 --- a/include/ygm/container/experimental/detail/column_view_impl.hpp +++ b/include/ygm/container/experimental/detail/column_view_impl.hpp @@ -71,37 +71,37 @@ class column_view_impl { } template - void print_all(std::ostream &os, VisitorArgs const &... args) { + void print_all(std::ostream &os, VisitorArgs const &...args) { ((os << args), ...); } template void async_visit_if_exists(const key_type &row, const key_type &col, - Visitor visitor, const VisitorArgs &... args) { + Visitor visitor, const VisitorArgs &...args) { m_column_view.async_visit_if_exists( col, row, visitor, std::forward(args)...); } template void async_visit_col_mutate(const key_type &col, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { m_column_view.async_visit_mutate(col, visitor, std::forward(args)...); } template void async_visit_col_const(const key_type &col, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { m_column_view.async_visit_const(col, visitor, std::forward(args)...); } template - void async_insert_if_missing_else_visit(const key_type & row, - const key_type & col, + void async_insert_if_missing_else_visit(const key_type &row, + const key_type &col, const value_type &value, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { m_column_view.async_insert_if_missing_else_visit( col, row, value, visitor, std::forward(args)...); } @@ -117,7 +117,7 @@ class column_view_impl { value_type m_default_value; adj_impl m_column_view; - ygm::comm m_comm; + ygm::comm &m_comm; typename ygm::ygm_ptr pthis; }; } // namespace ygm::container::experimental::detail diff --git a/include/ygm/container/experimental/detail/maptrix_impl.hpp b/include/ygm/container/experimental/detail/maptrix_impl.hpp index c3ab0196..2e834c47 100644 --- a/include/ygm/container/experimental/detail/maptrix_impl.hpp +++ b/include/ygm/container/experimental/detail/maptrix_impl.hpp @@ -87,13 +87,13 @@ class maptrix_impl { } template - void print_all(std::ostream &os, VisitorArgs const &... args) { + void print_all(std::ostream &os, VisitorArgs const &...args) { ((os << args), ...); } template void async_visit_if_exists(const key_type &row, const key_type &col, - Visitor visitor, const VisitorArgs &... args) { + Visitor visitor, const VisitorArgs &...args) { m_row_view.async_visit_if_exists(row, col, visitor, std::forward(args)...); m_column_view.async_visit_if_exists( @@ -102,7 +102,7 @@ class maptrix_impl { template void async_visit_col_mutate(const key_type &col, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { auto &m_map = m_column_view.column_view(); auto &inner_map = m_map.find(col)->second; for (auto itr = inner_map.begin(); itr != inner_map.end(); ++itr) { @@ -116,24 +116,24 @@ class maptrix_impl { template void async_visit_row_const(const key_type &row, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { m_row_view.async_visit_row_const(row, visitor, std::forward(args)...); } template void async_visit_col_const(const key_type &col, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { m_column_view.async_visit_col_const( col, visitor, std::forward(args)...); } template - void async_insert_if_missing_else_visit(const key_type & row, - const key_type & col, + void async_insert_if_missing_else_visit(const key_type &row, + const key_type &col, const value_type &value, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { m_row_view.async_insert_if_missing_else_visit( row, col, value, visitor, std::forward(args)...); m_column_view.async_insert_if_missing_else_visit( @@ -158,7 +158,7 @@ class maptrix_impl { value_type m_default_value; row_view_impl m_row_view; column_view_impl m_column_view; - ygm::comm m_comm; + ygm::comm &m_comm; typename ygm::ygm_ptr pthis; }; } // namespace ygm::container::experimental::detail diff --git a/include/ygm/container/experimental/detail/row_view_impl.hpp b/include/ygm/container/experimental/detail/row_view_impl.hpp index ee3fd117..9e6a94e5 100644 --- a/include/ygm/container/experimental/detail/row_view_impl.hpp +++ b/include/ygm/container/experimental/detail/row_view_impl.hpp @@ -59,30 +59,30 @@ class row_view_impl { } template - void print_all(std::ostream &os, VisitorArgs const &... args) { + void print_all(std::ostream &os, VisitorArgs const &...args) { ((os << args), ...); } template void async_visit_if_exists(const key_type &row, const key_type &col, - Visitor visitor, const VisitorArgs &... args) { + Visitor visitor, const VisitorArgs &...args) { m_row_view.async_visit_if_exists(row, col, visitor, std::forward(args)...); } template - void async_insert_if_missing_else_visit(const key_type & row, - const key_type & col, + void async_insert_if_missing_else_visit(const key_type &row, + const key_type &col, const value_type &value, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { m_row_view.async_insert_if_missing_else_visit( row, col, value, visitor, std::forward(args)...); } template void async_visit_row_const(const key_type &row, Visitor visitor, - const VisitorArgs &... args) { + const VisitorArgs &...args) { m_row_view.async_visit_const(row, visitor, std::forward(args)...); } @@ -99,7 +99,7 @@ class row_view_impl { value_type m_default_value; adj_impl m_row_view; - ygm::comm m_comm; + ygm::comm &m_comm; typename ygm::ygm_ptr pthis; }; } // namespace ygm::container::experimental::detail diff --git a/include/ygm/container/map.hpp b/include/ygm/container/map.hpp index 43d17100..86848582 100644 --- a/include/ygm/container/map.hpp +++ b/include/ygm/container/map.hpp @@ -6,6 +6,8 @@ #pragma once #include +#include + namespace ygm::container { template >> class map { public: - using self_type = map; - using value_type = Value; - using key_type = Key; + using self_type = map; + using mapped_type = Value; + using key_type = Key; + using size_type = size_t; + using ygm_for_all_types = std::tuple< Key, Value >; + using ygm_container_type = ygm::container::map_tag; using impl_type = - detail::map_impl; + detail::map_impl; + map() = delete; map(ygm::comm& comm) : m_impl(comm) {} - map(ygm::comm& comm, const value_type& dv) : m_impl(comm, dv) {} + map(ygm::comm& comm, const mapped_type& dv) : m_impl(comm, dv) {} map(const self_type& rhs) : m_impl(rhs.m_impl) {} - void async_insert(const std::pair& kv) { + void async_insert(const std::pair& kv) { async_insert(kv.first, kv.second); } - void async_insert(const key_type& key, const value_type& value) { + void async_insert(const key_type& key, const mapped_type& value) { m_impl.async_insert_unique(key, value); } - void async_insert_if_missing(const std::pair& kv) { + void async_insert_if_missing(const std::pair& kv) { async_insert_if_missing(kv.first, kv.second); } - void async_insert_if_missing(const key_type& key, const value_type& value) { + void async_insert_if_missing(const key_type& key, const mapped_type& value) { m_impl.async_insert_if_missing(key, value); } - void async_set(const key_type& key, const value_type& value) { + void async_set(const key_type& key, const mapped_type& value) { async_insert(key, value); } @@ -60,13 +66,19 @@ class map { template void async_insert_if_missing_else_visit(const key_type& key, - const value_type& value, + const mapped_type& value, Visitor visitor, const VisitorArgs&... args) { m_impl.async_insert_if_missing_else_visit( key, value, visitor, std::forward(args)...); } + template + void async_reduce(const key_type& key, const mapped_type& value, + ReductionOp reducer) { + m_impl.async_reduce(key, value, reducer); + } + void async_erase(const key_type& key) { m_impl.async_erase(key); } size_t local_count(const key_type& key) { return m_impl.local_count(key); } @@ -78,7 +90,7 @@ class map { void clear() { m_impl.clear(); } - size_t size() { return m_impl.size(); } + size_type size() { return m_impl.size(); } size_t count(const key_type& key) { return m_impl.count(key); } @@ -93,21 +105,21 @@ class map { bool is_mine(const key_type& key) const { return m_impl.is_mine(key); } - std::vector local_get(const key_type& key) { + std::vector local_get(const key_type& key) { return m_impl.local_get(key); } void swap(self_type& s) { m_impl.swap(s.m_impl); } template - std::map all_gather(const STLKeyContainer& keys) { - std::map to_return; + std::map all_gather(const STLKeyContainer& keys) { + std::map to_return; m_impl.all_gather(keys, to_return); return to_return; } - std::map all_gather(const std::vector& keys) { - std::map to_return; + std::map all_gather(const std::vector& keys) { + std::map to_return; m_impl.all_gather(keys, to_return); return to_return; } @@ -115,12 +127,12 @@ class map { ygm::comm& comm() { return m_impl.comm(); } template - std::vector> topk(size_t k, + std::vector> topk(size_t k, CompareFunction cfn) { return m_impl.topk(k, cfn); } - const value_type& default_value() const { return m_impl.default_value(); } + const mapped_type& default_value() const { return m_impl.default_value(); } private: impl_type m_impl; @@ -132,23 +144,24 @@ template >> class multimap { public: - using self_type = multimap; - using value_type = Value; - using key_type = Key; + using self_type = multimap; + using mapped_type = Value; + using key_type = Key; + using size_type = size_t; using impl_type = - detail::map_impl; + detail::map_impl; multimap() = delete; multimap(ygm::comm& comm) : m_impl(comm) {} - multimap(ygm::comm& comm, const value_type& dv) : m_impl(comm, dv) {} + multimap(ygm::comm& comm, const mapped_type& dv) : m_impl(comm, dv) {} multimap(const self_type& rhs) : m_impl(rhs.m_impl) {} - void async_insert(const std::pair& kv) { + void async_insert(const std::pair& kv) { async_insert(kv.first, kv.second); } - void async_insert(const key_type& key, const value_type& value) { + void async_insert(const key_type& key, const mapped_type& value) { m_impl.async_insert_multi(key, value); } @@ -183,7 +196,7 @@ class multimap { void clear() { m_impl.clear(); } - size_t size() { return m_impl.size(); } + size_type size() { return m_impl.size(); } size_t count(const key_type& key) { return m_impl.count(key); } @@ -198,22 +211,22 @@ class multimap { bool is_mine(const key_type& key) const { return m_impl.is_mine(key); } - std::vector local_get(const key_type& key) { + std::vector local_get(const key_type& key) { return m_impl.local_get(key); } void swap(self_type& s) { m_impl.swap(s.m_impl); } template - std::multimap all_gather(const STLKeyContainer& keys) { - std::multimap to_return; + std::multimap all_gather(const STLKeyContainer& keys) { + std::multimap to_return; m_impl.all_gather(keys, to_return); return to_return; } - std::multimap all_gather( + std::multimap all_gather( const std::vector& keys) { - std::multimap to_return; + std::multimap to_return; m_impl.all_gather(keys, to_return); return to_return; } @@ -221,12 +234,12 @@ class multimap { ygm::comm& comm() { return m_impl.comm(); } template - std::vector> topk(size_t k, + std::vector> topk(size_t k, CompareFunction cfn) { return m_impl.topk(k, cfn); } - const value_type& default_value() const { return m_impl.default_value(); } + const mapped_type& default_value() const { return m_impl.default_value(); } private: impl_type m_impl; diff --git a/include/ygm/container/reduce_by_key.hpp b/include/ygm/container/reduce_by_key.hpp new file mode 100644 index 00000000..135bd944 --- /dev/null +++ b/include/ygm/container/reduce_by_key.hpp @@ -0,0 +1,66 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once + +#include +#include +#include +#include + +namespace ygm::container { + +/** + * @brief Collective reduce_by_key that outputs a ygm::map + * + * @tparam Key + * @tparam Value + * @tparam Container + * @tparam ReductionFunction + * @param container + * @param reducer + * @param cm + * @return ygm::map + */ +template +ygm::container::map reduce_by_key_map(Container& container, + ReductionFunction reducer, + comm& cm) { + cm.barrier(); + ygm::container::map to_return(cm); + + auto the_reducer = + ygm::container::detail::make_reducing_adapter(to_return, reducer); + + auto lambda_two = [&the_reducer](const Key& k, const Value& v) { + the_reducer.async_reduce(k, v); + }; + + auto lambda_pair = [&the_reducer](const std::pair& kv) { + the_reducer.async_reduce(kv.first, kv.second); + }; + + if constexpr (ygm::detail::is_for_each_invocable< + Container, decltype(lambda_two)>::value) { + std::for_each(container.begin(), container.end(), lambda_two); + } else if constexpr (ygm::detail::is_for_each_invocable< + Container, decltype(lambda_pair)>::value) { + std::for_each(container.begin(), container.end(), lambda_pair); + } else if constexpr (ygm::detail::is_for_all_invocable< + Container, decltype(lambda_two)>::value) { + container.for_all(lambda_two); + } else if constexpr (ygm::detail::is_for_all_invocable< + Container, decltype(lambda_pair)>::value) { + container.for_all(lambda_pair); + } else { + static_assert(ygm::detail::always_false<>, + "Unsupported Lambda or Container"); + } + + cm.barrier(); + return to_return; +} +} // namespace ygm::container diff --git a/include/ygm/container/set.hpp b/include/ygm/container/set.hpp index 78f349d7..609e9db8 100644 --- a/include/ygm/container/set.hpp +++ b/include/ygm/container/set.hpp @@ -4,6 +4,8 @@ // SPDX-License-Identifier: MIT #pragma once + +#include #include namespace ygm::container { @@ -13,8 +15,10 @@ template , class Alloc = std::allocator> class multiset { public: - using self_type = multiset; - using key_type = Key; + using self_type = multiset; + using key_type = Key; + using size_type = size_t; + using ygm_for_all_types = std::tuple; using impl_type = detail::set_impl; Partitioner partitioner; @@ -32,9 +36,16 @@ class multiset { m_impl.for_all(fn); } + template + void consume_all(Function fn) { + m_impl.consume_all(fn); + } + void clear() { m_impl.clear(); } - size_t size() { return m_impl.size(); } + size_type size() { return m_impl.size(); } + + bool empty() { return m_impl.size() == 0; } size_t count(const key_type& key) { return m_impl.count(key); } @@ -43,7 +54,7 @@ class multiset { void serialize(const std::string& fname) { m_impl.serialize(fname); } void deserialize(const std::string& fname) { m_impl.deserialize(fname); } - typename ygm::ygm_ptr get_ygm_ptr() const { + typename ygm::ygm_ptr get_ygm_ptr() const { return m_impl.get_ygm_ptr(); } @@ -59,13 +70,17 @@ class multiset { private: impl_type m_impl; }; + template , typename Compare = std::less, class Alloc = std::allocator> class set { public: - using self_type = set; - using key_type = Key; + using self_type = set; + using key_type = Key; + using size_type = size_t; + using ygm_container_type = ygm::container::set_tag; + using ygm_for_all_types = std::tuple; using impl_type = detail::set_impl; Partitioner partitioner; @@ -78,14 +93,49 @@ class set { void async_erase(const key_type& key) { m_impl.async_erase(key); } + template + void async_insert_exe_if_missing(const key_type& key, Visitor visitor, + const VisitorArgs&... args) { + m_impl.async_insert_exe_if_missing( + key, visitor, std::forward(args)...); + } + + template + void async_insert_exe_if_contains(const key_type& key, Visitor visitor, + const VisitorArgs&... args) { + m_impl.async_insert_exe_if_contains( + key, visitor, std::forward(args)...); + } + + template + void async_exe_if_missing(const key_type& key, Visitor visitor, + const VisitorArgs&... args) { + m_impl.async_exe_if_missing(key, visitor, + std::forward(args)...); + } + + template + void async_exe_if_contains(const key_type& key, Visitor visitor, + const VisitorArgs&... args) { + m_impl.async_exe_if_contains(key, visitor, + std::forward(args)...); + } + template void for_all(Function fn) { m_impl.for_all(fn); } + template + void consume_all(Function fn) { + m_impl.consume_all(fn); + } + void clear() { m_impl.clear(); } - size_t size() { return m_impl.size(); } + size_type size() { return m_impl.size(); } + + bool empty() { return m_impl.size() == 0; } size_t count(const key_type& key) { return m_impl.count(key); } @@ -94,7 +144,7 @@ class set { void serialize(const std::string& fname) { m_impl.serialize(fname); } void deserialize(const std::string& fname) { m_impl.deserialize(fname); } - typename ygm::ygm_ptr get_ygm_ptr() const { + typename ygm::ygm_ptr get_ygm_ptr() const { return m_impl.get_ygm_ptr(); } diff --git a/include/ygm/container/tagged_bag.hpp b/include/ygm/container/tagged_bag.hpp new file mode 100644 index 00000000..728670bf --- /dev/null +++ b/include/ygm/container/tagged_bag.hpp @@ -0,0 +1,123 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once +#include +#include +#include + +namespace ygm::container { + +// A tagged bag defines a bag-like data structure in which each +// element gets a (globally) unique reference to itself. +// Under the covers, this is simply a map with autogenerated keys. +// The keys are generated using a combination of rank (24 bits) +// and a rank-specific serial number (40 bits). +template > +class tagged_bag { + public: + using tag_type = size_t; + using value_type = Item; + using self_type = tagged_bag; + + tagged_bag(const tagged_bag &) = delete; + tagged_bag(tagged_bag &&) noexcept = delete; + tagged_bag &operator=(const tagged_bag &) = delete; + tagged_bag &operator=(tagged_bag &&) noexcept = delete; + tagged_bag(ygm::comm &comm) + : m_next_tag(tag_type(comm.rank()) << TAG_BITS), + m_tagged_bag(ygm::container::map(comm)), + pthis(this) {} + ~tagged_bag() = default; + + tag_type async_insert(const value_type &item) { + tag_type tag = m_next_tag++; + m_tagged_bag.async_insert(tag, item); + return tag; + } + + template + void async_visit(const tag_type &tag, Visitor visitor, + const VisitorArgs &...args) { + return m_tagged_bag.async_visit(tag, visitor, args...); + } + + template + void async_visit_if_exists(const tag_type &tag, Visitor visitor, + const VisitorArgs &...args) { + return m_tagged_bag.async_visit_if_exists(tag, visitor, args...); + } + + void async_erase(const tag_type &tag) { + return m_tagged_bag.async_erase(tag); + } + + template + void for_all(Function fn) { + return m_tagged_bag.for_all(fn); + } + + void clear() { return m_tagged_bag.clear(); } + + size_t size() { return m_tagged_bag.size(); } + + void swap(self_type &s) { + m_tagged_bag.swap(s.m_tagged_bag); + std::swap(m_next_tag, s.m_next_tag); + } + + ygm::comm &comm() { return m_tagged_bag.comm(); } + + // TODO sbromberger 20230626: serialize and deserialize + + [[nodiscard]] int owner(const tag_type &tag) const { + return m_tagged_bag.owner(tag); + } + + [[nodiscard]] bool is_mine(const tag_type &tag) const { + return m_tagged_bag.is_mine(tag); + } + + std::vector local_get(const tag_type &tag) { + return m_tagged_bag.local_get(tag); + } + + template + void local_visit(const tag_type &tag, Function &fn, + const VisitorArgs &...args) { + return m_tagged_bag.local_visit(tag, fn, args...); + } + + void local_erase(const tag_type &tag) { m_tagged_bag.m_local_map.erase(tag); } + + void local_clear() { m_tagged_bag.m_local_map.clear(); } + + [[nodiscard]] size_t local_size() const { + return m_tagged_bag.m_local_map.size(); + } + + template + std::map all_gather(const STLKeyContainer &tags) { + return m_tagged_bag.all_gather(tags); + } + + std::map all_gather(const std::vector &tags) { + return m_tagged_bag.all_gather(tags); + } + template + void local_for_all(Function fn) { + return m_tagged_bag.local_for_all(fn); + } + + private: + // TODO 20230628: these consts should probably migrate to a configuration + // file. + const tag_type TAG_BITS = 40; + const tag_type MAX_TAGS = (size_t(1) << TAG_BITS) - 1; + tag_type m_next_tag; + ygm::container::map m_tagged_bag; + typename ygm::ygm_ptr pthis; +}; +} // namespace ygm::container diff --git a/include/ygm/detail/assert.hpp b/include/ygm/detail/assert.hpp index b124e3a0..7e286858 100644 --- a/include/ygm/detail/assert.hpp +++ b/include/ygm/detail/assert.hpp @@ -10,7 +10,7 @@ #include // work on this: https://github.com/lattera/glibc/blob/master/assert/assert.c -void release_assert_fail(const char *assertion, const char *file, +inline void release_assert_fail(const char *assertion, const char *file, unsigned int line, const char *function) { std::stringstream ss; ss << " " << assertion << " " << file << ":" << line << " " << function diff --git a/include/ygm/detail/comm.ipp b/include/ygm/detail/comm.ipp new file mode 100644 index 00000000..a46b76ac --- /dev/null +++ b/include/ygm/detail/comm.ipp @@ -0,0 +1,991 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once +#include +#include + +namespace ygm { + +struct comm::mpi_irecv_request { + std::shared_ptr buffer; + MPI_Request request; +}; + +struct comm::mpi_isend_request { + std::shared_ptr> buffer; + MPI_Request request; +}; + +struct comm::header_t { + uint32_t message_size; + int32_t dest; +}; + +inline comm::comm(int *argc, char ***argv) + : pimpl_if(std::make_shared(argc, argv)), + m_layout(MPI_COMM_WORLD), + m_router(m_layout, config.routing) { + // pimpl_if = std::make_shared(argc, argv); + comm_setup(MPI_COMM_WORLD); +} + +inline comm::comm(MPI_Comm mcomm) + : m_layout(mcomm), m_router(m_layout, config.routing) { + pimpl_if.reset(); + int flag(0); + ASSERT_MPI(MPI_Initialized(&flag)); + if (!flag) { + throw std::runtime_error("YGM::COMM ERROR: MPI not initialized"); + } + comm_setup(mcomm); +} + +inline void comm::comm_setup(MPI_Comm c) { + ASSERT_MPI(MPI_Comm_dup(c, &m_comm_async)); + ASSERT_MPI(MPI_Comm_dup(c, &m_comm_barrier)); + ASSERT_MPI(MPI_Comm_dup(c, &m_comm_other)); + + m_vec_send_buffers.resize(m_layout.size()); + + if (config.welcome) { + welcome(std::cout); + } + + for (size_t i = 0; i < config.num_irecvs; ++i) { + std::shared_ptr recv_buffer{new std::byte[config.irecv_size]}; + post_new_irecv(recv_buffer); + } +} + +inline void comm::welcome(std::ostream &os) { + static bool already_printed = false; + if (already_printed) return; + already_printed = true; + std::stringstream sstr; + sstr << "======================================\n" + << " YY YY GGGGGG MM MM \n" + << " YY YY GG GG MMM MMM \n" + << " YYYY GG MMMM MMMM \n" + << " YY GG GGGG MM MMM MM \n" + << " YY GG GG MM MM \n" + << " YY GG GG MM MM \n" + << " YY GGGGGG MM MM \n" + << "======================================\n" + << "COMM_SIZE = " << m_layout.size() << "\n" + << "RANKS_PER_NODE = " << m_layout.local_size() << "\n" + << "NUM_NODES = " << m_layout.node_size() << "\n"; + + config.print(sstr); + + if (rank() == 0) { + os << sstr.str(); + } +} + +inline void comm::stats_reset() { stats.reset(); } +inline void comm::stats_print(const std::string &name, std::ostream &os) { + std::stringstream sstr; + sstr << "============== STATS =================\n" + << "NAME = " << name << "\n" + << "TIME = " << stats.get_elapsed_time() << "\n" + << "GLOBAL_ASYNC_COUNT = " + << all_reduce_sum(stats.get_async_count()) << "\n" + << "GLOBAL_ISEND_COUNT = " + << all_reduce_sum(stats.get_isend_count()) << "\n" + << "GLOBAL_ISEND_BYTES = " + << all_reduce_sum(stats.get_isend_bytes()) << "\n" + << "MAX_WAITSOME_ISEND_IRECV = " + << all_reduce_max(stats.get_waitsome_isend_irecv_time()) << "\n" + << "MAX_WAITSOME_IALLREDUCE = " + << all_reduce_max(stats.get_waitsome_iallreduce_time()) << "\n" + << "COUNT_IALLREDUCE = " << stats.get_iallreduce_count() << "\n" + << "======================================"; + + if (rank0()) { + os << sstr.str() << std::endl; + } +} + +inline comm::~comm() { + barrier(); + + ASSERT_RELEASE(MPI_Barrier(m_comm_async) == MPI_SUCCESS); + + ASSERT_RELEASE(m_send_queue.empty()); + ASSERT_RELEASE(m_send_dest_queue.empty()); + ASSERT_RELEASE(m_send_buffer_bytes == 0); + ASSERT_RELEASE(m_pending_isend_bytes == 0); + + for (size_t i = 0; i < m_recv_queue.size(); ++i) { + ASSERT_RELEASE(MPI_Cancel(&(m_recv_queue[i].request)) == MPI_SUCCESS); + } + ASSERT_RELEASE(MPI_Barrier(m_comm_async) == MPI_SUCCESS); + ASSERT_RELEASE(MPI_Comm_free(&m_comm_async) == MPI_SUCCESS); + ASSERT_RELEASE(MPI_Comm_free(&m_comm_barrier) == MPI_SUCCESS); + ASSERT_RELEASE(MPI_Comm_free(&m_comm_other) == MPI_SUCCESS); + + pimpl_if.reset(); +} + +template +inline void comm::async(int dest, AsyncFunction fn, const SendArgs &...args) { + static_assert(std::is_trivially_copyable::value && + std::is_standard_layout::value, + "comm::async() AsyncFunction must be is_trivially_copyable & " + "is_standard_layout."); + ASSERT_RELEASE(dest < m_layout.size()); + stats.async(dest); + + check_if_production_halt_required(); + m_send_count++; + + // + // + int next_dest = dest; + if (config.routing != detail::routing_type::NONE) { + // next_dest = next_hop(dest); + next_dest = m_router.next_hop(dest); + } + + // + // add data to the to dest buffer + if (m_vec_send_buffers[next_dest].empty()) { + m_send_dest_queue.push_back(next_dest); + m_vec_send_buffers[next_dest].reserve(config.buffer_size / + m_layout.node_size()); + } + + // // Add header without message size + size_t header_bytes = 0; + if (config.routing != detail::routing_type::NONE) { + header_bytes = pack_header(m_vec_send_buffers[next_dest], dest, 0); + m_send_buffer_bytes += header_bytes; + } + + uint32_t bytes = pack_lambda(m_vec_send_buffers[next_dest], fn, + std::forward(args)...); + m_send_buffer_bytes += bytes; + + // // Add message size to header + if (config.routing != detail::routing_type::NONE) { + auto iter = m_vec_send_buffers[next_dest].end(); + iter -= (header_bytes + bytes); + std::memcpy(&*iter, &bytes, sizeof(header_t::dest)); + } + + // + // Check if send buffer capacity has been exceeded + if (!m_in_process_receive_queue) { + flush_to_capacity(); + } +} + +template +inline void comm::async_bcast(AsyncFunction fn, const SendArgs &...args) { + static_assert( + std::is_trivially_copyable::value && + std::is_standard_layout::value, + "comm::async_bcast() AsyncFunction must be is_trivially_copyable & " + "is_standard_layout."); + check_if_production_halt_required(); + + pack_lambda_broadcast(fn, std::forward(args)...); + + // + // Check if send buffer capacity has been exceeded + if (!m_in_process_receive_queue) { + flush_to_capacity(); + } +} + +template +inline void comm::async_mcast(const std::vector &dests, AsyncFunction fn, + const SendArgs &...args) { + static_assert( + std::is_trivially_copyable::value && + std::is_standard_layout::value, + "comm::async_mcast() AsyncFunction must be is_trivially_copyable & " + "is_standard_layout."); + for (auto dest : dests) { + async(dest, fn, std::forward(args)...); + } +} + +inline const detail::layout &comm::layout() const { return m_layout; } + +inline const detail::comm_router &comm::router() const { return m_router; } + +inline int comm::size() const { + return m_layout.size(); + ; +} +inline int comm::rank() const { return m_layout.rank(); } + +inline MPI_Comm comm::get_mpi_comm() const { return m_comm_other; } + +/** + * @brief Full communicator barrier + * + */ +inline void comm::barrier() { + flush_all_local_and_process_incoming(); + std::pair previous_counts{1, 2}; + std::pair current_counts{3, 4}; + while (!(current_counts.first == current_counts.second && + previous_counts == current_counts)) { + previous_counts = current_counts; + current_counts = barrier_reduce_counts(); + if (current_counts.first != current_counts.second) { + flush_all_local_and_process_incoming(); + } + } + ASSERT_RELEASE(m_pre_barrier_callbacks.empty()); + ASSERT_RELEASE(m_send_dest_queue.empty()); +} + +/** + * @brief Control Flow Barrier + * Only blocks the control flow until all processes in the communicator have + * called it. See: MPI_Barrier() + */ +inline void comm::cf_barrier() { ASSERT_MPI(MPI_Barrier(m_comm_barrier)); } + +template +inline ygm_ptr comm::make_ygm_ptr(T &t) { + ygm_ptr to_return(&t); + to_return.check(*this); + return to_return; +} + +inline void comm::register_pre_barrier_callback( + const std::function &fn) { + m_pre_barrier_callbacks.push_back(fn); +} + +template +inline T comm::all_reduce_sum(const T &t) const { + T to_return; + ASSERT_MPI(MPI_Allreduce(&t, &to_return, 1, detail::mpi_typeof(T()), MPI_SUM, + m_comm_other)); + return to_return; +} + +template +inline T comm::all_reduce_min(const T &t) const { + T to_return; + ASSERT_MPI(MPI_Allreduce(&t, &to_return, 1, detail::mpi_typeof(T()), MPI_MIN, + m_comm_other)); + return to_return; +} + +template +inline T comm::all_reduce_max(const T &t) const { + T to_return; + ASSERT_MPI(MPI_Allreduce(&t, &to_return, 1, detail::mpi_typeof(T()), MPI_MAX, + m_comm_other)); + return to_return; +} + +/** + * @brief Tree based reduction, could be optimized significantly + * + * @tparam T + * @tparam MergeFunction + * @param in + * @param merge + * @return T + */ +template +inline T comm::all_reduce(const T &in, MergeFunction merge) const { + int first_child = 2 * rank() + 1; + int second_child = 2 * (rank() + 1); + int parent = (rank() - 1) / 2; + + // Step 1: Receive from children, merge into tmp + T tmp = in; + if (first_child < size()) { + T fc = mpi_recv(first_child, 0, m_comm_other); + tmp = merge(tmp, fc); + } + if (second_child < size()) { + T sc = mpi_recv(second_child, 0, m_comm_other); + tmp = merge(tmp, sc); + } + + // Step 2: Send merged to parent + if (rank() != 0) { + mpi_send(tmp, parent, 0, m_comm_other); + } + + // Step 3: Rank 0 bcasts + T to_return = mpi_bcast(tmp, 0, m_comm_other); + return to_return; +} + +template +inline void comm::mpi_send(const T &data, int dest, int tag, + MPI_Comm comm) const { + std::vector packed; + cereal::YGMOutputArchive oarchive(packed); + oarchive(data); + size_t packed_size = packed.size(); + ASSERT_RELEASE(packed_size < 1024 * 1024 * 1024); + ASSERT_MPI(MPI_Send(&packed_size, 1, detail::mpi_typeof(packed_size), dest, + tag, comm)); + ASSERT_MPI(MPI_Send(packed.data(), packed_size, MPI_BYTE, dest, tag, comm)); +} + +template +inline T comm::mpi_recv(int source, int tag, MPI_Comm comm) const { + std::vector packed; + size_t packed_size{0}; + ASSERT_MPI(MPI_Recv(&packed_size, 1, detail::mpi_typeof(packed_size), source, + tag, comm, MPI_STATUS_IGNORE)); + packed.resize(packed_size); + ASSERT_MPI(MPI_Recv(packed.data(), packed_size, MPI_BYTE, source, tag, comm, + MPI_STATUS_IGNORE)); + + T to_return; + cereal::YGMInputArchive iarchive(packed.data(), packed.size()); + iarchive(to_return); + return to_return; +} + +template +inline T comm::mpi_bcast(const T &to_bcast, int root, MPI_Comm comm) const { + std::vector packed; + cereal::YGMOutputArchive oarchive(packed); + if (rank() == root) { + oarchive(to_bcast); + } + size_t packed_size = packed.size(); + ASSERT_RELEASE(packed_size < 1024 * 1024 * 1024); + ASSERT_MPI( + MPI_Bcast(&packed_size, 1, detail::mpi_typeof(packed_size), root, comm)); + if (rank() != root) { + packed.resize(packed_size); + } + ASSERT_MPI(MPI_Bcast(packed.data(), packed_size, MPI_BYTE, root, comm)); + + cereal::YGMInputArchive iarchive(packed.data(), packed.size()); + T to_return; + iarchive(to_return); + return to_return; +} + +inline std::ostream &comm::cout0() const { + static std::ostringstream dummy; + dummy.clear(); + if (rank() == 0) { + return std::cout; + } + return dummy; +} + +inline std::ostream &comm::cerr0() const { + static std::ostringstream dummy; + dummy.clear(); + if (rank() == 0) { + return std::cerr; + } + return dummy; +} + +inline std::ostream &comm::cout() const { + std::cout << rank() << ": "; + return std::cout; +} + +inline std::ostream &comm::cerr() const { + std::cerr << rank() << ": "; + return std::cerr; +} + +template +inline void comm::cout(Args &&...args) const { + std::cout << outstr(args...) << std::endl; +} + +template +inline void comm::cerr(Args &&...args) const { + std::cerr << outstr(args...) << std::endl; +} + +template +inline void comm::cout0(Args &&...args) const { + if (rank0()) { + std::cout << outstr0(args...) << std::endl; + } +} + +template +inline void comm::cerr0(Args &&...args) const { + if (rank0()) { + std::cerr << outstr0(args...) << std::endl; + } +} + +template +inline std::string comm::outstr0(Args &&...args) const { + std::stringstream ss; + (ss << ... << args); + return ss.str(); +} + +template +inline std::string comm::outstr(Args &&...args) const { + std::stringstream ss; + (ss << rank() << ": " << ... << args); + return ss.str(); +} + +inline size_t comm::pack_header(std::vector &packed, const int dest, + size_t size) { + size_t size_before = packed.size(); + + header_t h; + h.dest = dest; + h.message_size = size; + + packed.resize(size_before + sizeof(header_t)); + std::memcpy(packed.data() + size_before, &h, sizeof(header_t)); + + // cereal::YGMOutputArchive oarchive(packed); + // oarchive(h); + + return packed.size() - size_before; +} + +inline std::pair comm::barrier_reduce_counts() { + uint64_t local_counts[2] = {m_recv_count, m_send_count}; + uint64_t global_counts[2] = {0, 0}; + + ASSERT_RELEASE(m_pending_isend_bytes == 0); + ASSERT_RELEASE(m_send_buffer_bytes == 0); + + MPI_Request req = MPI_REQUEST_NULL; + ASSERT_MPI(MPI_Iallreduce(local_counts, global_counts, 2, MPI_UINT64_T, + MPI_SUM, m_comm_barrier, &req)); + stats.iallreduce(); + bool iallreduce_complete(false); + while (!iallreduce_complete) { + MPI_Request twin_req[2]; + twin_req[0] = req; + twin_req[1] = m_recv_queue.front().request; + + int outcount; + int twin_indices[2]; + MPI_Status twin_status[2]; + + { + auto timer = stats.waitsome_iallreduce(); + ASSERT_MPI( + MPI_Waitsome(2, twin_req, &outcount, twin_indices, twin_status)); + } + + for (int i = 0; i < outcount; ++i) { + if (twin_indices[i] == 0) { // completed a Iallreduce + iallreduce_complete = true; + // std::cout << m_layout.rank() << ": iallreduce_complete: " << + // global_counts[0] << " " << global_counts[1] << std::endl; + } else { + mpi_irecv_request req_buffer = m_recv_queue.front(); + m_recv_queue.pop_front(); + handle_next_receive(twin_status[i], req_buffer.buffer); + flush_all_local_and_process_incoming(); + } + } + } + return {global_counts[0], global_counts[1]}; +} + +/** + * @brief Flushes send buffer to dest + * + * @param dest + */ +inline void comm::flush_send_buffer(int dest) { + static size_t counter = 0; + if (m_vec_send_buffers[dest].size() > 0) { + mpi_isend_request request; + if (m_free_send_buffers.empty()) { + request.buffer = std::make_shared>(); + } else { + request.buffer = m_free_send_buffers.back(); + m_free_send_buffers.pop_back(); + } + request.buffer->swap(m_vec_send_buffers[dest]); + if (config.freq_issend > 0 && counter++ % config.freq_issend == 0) { + ASSERT_MPI(MPI_Issend(request.buffer->data(), request.buffer->size(), + MPI_BYTE, dest, 0, m_comm_async, + &(request.request))); + } else { + ASSERT_MPI(MPI_Isend(request.buffer->data(), request.buffer->size(), + MPI_BYTE, dest, 0, m_comm_async, + &(request.request))); + } + stats.isend(dest, request.buffer->size()); + m_pending_isend_bytes += request.buffer->size(); + m_send_buffer_bytes -= request.buffer->size(); + m_send_queue.push_back(request); + if (!m_in_process_receive_queue) { + process_receive_queue(); + } + } +} + +inline void comm::check_if_production_halt_required() { + while (m_enable_interrupts && !m_in_process_receive_queue && + m_pending_isend_bytes > config.buffer_size) { + process_receive_queue(); + } +} + +/** + * @brief Checks for incoming unless called from receive queue and flushes + * one buffer. + */ +inline void comm::local_progress() { + if (not m_in_process_receive_queue) { + process_receive_queue(); + } + if (not m_send_dest_queue.empty()) { + int dest = m_send_dest_queue.front(); + m_send_dest_queue.pop_front(); + flush_send_buffer(dest); + } +} + +/** + * @brief Waits until provided condition function returns true. + * + * @tparam Function + * @param fn Wait condition function, must match []() -> bool + */ +template +inline void comm::local_wait_until(Function fn) { + while (not fn()) { + local_progress(); + } +} + +/** + * @brief Flushes all local state and buffers. + * Notifies any registered barrier watchers. + */ +inline void comm::flush_all_local_and_process_incoming() { + // Keep flushing until all local work is complete + bool did_something = true; + while (did_something) { + did_something = process_receive_queue(); + // + // Notify registered barrier watchers + while (!m_pre_barrier_callbacks.empty()) { + did_something = true; + std::function fn = m_pre_barrier_callbacks.front(); + m_pre_barrier_callbacks.pop_front(); + fn(); + } + + // + // Flush each send buffer + while (!m_send_dest_queue.empty()) { + did_something = true; + int dest = m_send_dest_queue.front(); + m_send_dest_queue.pop_front(); + flush_send_buffer(dest); + process_receive_queue(); + } + + // + // Wait on isends + while (!m_send_queue.empty()) { + did_something |= process_receive_queue(); + } + } +} + +/** + * @brief Flush send buffers until queued sends are smaller than buffer + * capacity + */ +inline void comm::flush_to_capacity() { + while (m_send_buffer_bytes > config.buffer_size) { + ASSERT_DEBUG(!m_send_dest_queue.empty()); + int dest = m_send_dest_queue.front(); + m_send_dest_queue.pop_front(); + flush_send_buffer(dest); + } +} + +inline void comm::post_new_irecv(std::shared_ptr &recv_buffer) { + mpi_irecv_request recv_req; + recv_req.buffer = recv_buffer; + + //::madvise(recv_req.buffer.get(), config.irecv_size, MADV_DONTNEED); + ASSERT_MPI(MPI_Irecv(recv_req.buffer.get(), config.irecv_size, MPI_BYTE, + MPI_ANY_SOURCE, MPI_ANY_TAG, m_comm_async, + &(recv_req.request))); + m_recv_queue.push_back(recv_req); +} + +template +inline size_t comm::pack_lambda(std::vector &packed, Lambda l, + const PackArgs &...args) { + size_t size_before = packed.size(); + const std::tuple tuple_args( + std::forward(args)...); + + auto dispatch_lambda = [](comm *c, cereal::YGMInputArchive *bia, Lambda l) { + Lambda *pl = nullptr; + size_t l_storage[sizeof(Lambda) / sizeof(size_t) + + (sizeof(Lambda) % sizeof(size_t) > 0)]; + if constexpr (!std::is_empty::value) { + bia->loadBinary(l_storage, sizeof(Lambda)); + pl = (Lambda *)l_storage; + } + + std::tuple ta; + if constexpr (!std::is_empty>::value) { + (*bia)(ta); + } + + auto t1 = std::make_tuple((comm *)c); + + // \pp was: std::apply(*pl, std::tuple_cat(t1, ta)); + ygm::meta::apply_optional(*pl, std::move(t1), std::move(ta)); + }; + + return pack_lambda_generic(packed, l, dispatch_lambda, + std::forward(args)...); +} + +template +inline void comm::pack_lambda_broadcast(Lambda l, const PackArgs &...args) { + const std::tuple tuple_args( + std::forward(args)...); + + auto forward_remote_and_dispatch_lambda = [](comm *c, + cereal::YGMInputArchive *bia, + Lambda l) { + Lambda *pl = nullptr; + size_t l_storage[sizeof(Lambda) / sizeof(size_t) + + (sizeof(Lambda) % sizeof(size_t) > 0)]; + if constexpr (!std::is_empty::value) { + bia->loadBinary(l_storage, sizeof(Lambda)); + pl = (Lambda *)l_storage; + } + + std::tuple ta; + if constexpr (!std::is_empty>::value) { + (*bia)(ta); + } + + auto forward_local_and_dispatch_lambda = + [](comm *c, cereal::YGMInputArchive *bia, Lambda l) { + Lambda *pl = nullptr; + size_t l_storage[sizeof(Lambda) / sizeof(size_t) + + (sizeof(Lambda) % sizeof(size_t) > 0)]; + if constexpr (!std::is_empty::value) { + bia->loadBinary(l_storage, sizeof(Lambda)); + pl = (Lambda *)l_storage; + } + + std::tuple ta; + if constexpr (!std::is_empty>::value) { + (*bia)(ta); + } + + auto local_dispatch_lambda = [](comm *c, cereal::YGMInputArchive *bia, + Lambda l) { + Lambda *pl = nullptr; + size_t l_storage[sizeof(Lambda) / sizeof(size_t) + + (sizeof(Lambda) % sizeof(size_t) > 0)]; + if constexpr (!std::is_empty::value) { + bia->loadBinary(l_storage, sizeof(Lambda)); + pl = (Lambda *)l_storage; + } + + std::tuple ta; + if constexpr (!std::is_empty>::value) { + (*bia)(ta); + } + + auto t1 = std::make_tuple((comm *)c); + + // \pp was: std::apply(*pl, std::tuple_cat(t1, ta)); + ygm::meta::apply_optional(*pl, std::move(t1), std::move(ta)); + }; + + // Pack lambda telling terminal ranks to execute user lambda. + // TODO: Why does this work? Passing ta (tuple of args) to a function + // expecting a parameter pack shouldn't work... + std::vector packed_msg; + c->pack_lambda_generic(packed_msg, *pl, local_dispatch_lambda, ta); + + for (auto dest : c->layout().local_ranks()) { + if (dest != c->layout().rank()) { + c->queue_message_bytes(packed_msg, dest); + } + } + + auto t1 = std::make_tuple((comm *)c); + + // \pp was: std::apply(*pl, std::tuple_cat(t1, ta)); + ygm::meta::apply_optional(*pl, std::move(t1), std::move(ta)); + }; + + std::vector packed_msg; + c->pack_lambda_generic(packed_msg, *pl, forward_local_and_dispatch_lambda, + ta); + + int num_layers = c->layout().node_size() / c->layout().local_size() + + (c->layout().node_size() % c->layout().local_size() > 0); + int num_ranks_per_layer = + c->layout().local_size() * c->layout().local_size(); + int node_partner_offset = (c->layout().local_id() - c->layout().node_id()) % + c->layout().local_size(); + + // % operator is remainder, not actually mod. Need to fix result if result + // was negative + if (node_partner_offset < 0) { + node_partner_offset += c->layout().local_size(); + } + + // Only forward remotely if initial remote node exists + if (node_partner_offset < c->layout().node_size()) { + int curr_partner = c->layout().strided_ranks()[node_partner_offset]; + for (int l = 0; l < num_layers; l++) { + if (curr_partner >= c->layout().size()) { + break; + } + if (!c->layout().is_local(curr_partner)) { + c->queue_message_bytes(packed_msg, curr_partner); + } + + curr_partner += num_ranks_per_layer; + } + } + + auto t1 = std::make_tuple((comm *)c); + + // \pp was: std::apply(*pl, std::tuple_cat(t1, ta)); + ygm::meta::apply_optional(*pl, std::move(t1), std::move(ta)); + }; + + std::vector packed_msg; + pack_lambda_generic(packed_msg, l, forward_remote_and_dispatch_lambda, + std::forward(args)...); + + // Initial send to all local ranks + for (auto dest : layout().local_ranks()) { + queue_message_bytes(packed_msg, dest); + } +} + +template +inline size_t comm::pack_lambda_generic(std::vector &packed, + Lambda l, RemoteLogicLambda rll, + const PackArgs &...args) { + size_t size_before = packed.size(); + const std::tuple tuple_args( + std::forward(args)...); + + auto remote_dispatch_lambda = [](comm *c, cereal::YGMInputArchive *bia) { + RemoteLogicLambda *rll = nullptr; + Lambda *pl = nullptr; + + (*rll)(c, bia, *pl); + }; + + uint16_t lid = m_lambda_map.register_lambda(remote_dispatch_lambda); + + { + size_t size_before = packed.size(); + packed.resize(size_before + sizeof(lid)); + std::memcpy(packed.data() + size_before, &lid, sizeof(lid)); + } + + if constexpr (!std::is_empty::value) { + // oarchive.saveBinary(&l, sizeof(Lambda)); + size_t size_before = packed.size(); + packed.resize(size_before + sizeof(Lambda)); + std::memcpy(packed.data() + size_before, &l, sizeof(Lambda)); + } + + if constexpr (!std::is_empty>::value) { + // Only create cereal archive is tuple needs serialization + cereal::YGMOutputArchive oarchive(packed); // Create an output archive + oarchive(tuple_args); + } + return packed.size() - size_before; +} + +/** + * @brief Adds packed message directly to send buffer for specific + * destination. Does not modify packed message to add headers for routing. + * + */ +inline void comm::queue_message_bytes(const std::vector &packed, + const int dest) { + m_send_count++; + + // + // add data to the dest buffer + if (m_vec_send_buffers[dest].empty()) { + m_send_dest_queue.push_back(dest); + m_vec_send_buffers[dest].reserve(config.buffer_size / m_layout.node_size()); + } + + std::vector &send_buff = m_vec_send_buffers[dest]; + + // Add dummy header with dest of -1 and size of 0. + // This is to avoid peeling off and replacing the dest as messages are + // forwarded in a bcast + if (config.routing != detail::routing_type::NONE) { + size_t header_bytes = pack_header(send_buff, -1, 0); + m_send_buffer_bytes += header_bytes; + } + + size_t size_before = send_buff.size(); + send_buff.resize(size_before + packed.size()); + std::memcpy(send_buff.data() + size_before, packed.data(), packed.size()); + + m_send_buffer_bytes += packed.size(); +} + +inline void comm::handle_next_receive(MPI_Status status, + std::shared_ptr buffer) { + int count{0}; + ASSERT_MPI(MPI_Get_count(&status, MPI_BYTE, &count)); + stats.irecv(status.MPI_SOURCE, count); + cereal::YGMInputArchive iarchive(buffer.get(), count); + while (!iarchive.empty()) { + if (config.routing != detail::routing_type::NONE) { + header_t h; + iarchive.loadBinary(&h, sizeof(header_t)); + if (h.dest == m_layout.rank() || (h.dest == -1 && h.message_size == 0)) { + uint16_t lid; + iarchive.loadBinary(&lid, sizeof(lid)); + m_lambda_map.execute(lid, this, &iarchive); + m_recv_count++; + stats.rpc_execute(); + } else { + int next_dest = m_router.next_hop(h.dest); + + if (m_vec_send_buffers[next_dest].empty()) { + m_send_dest_queue.push_back(next_dest); + } + + size_t header_bytes = + pack_header(m_vec_send_buffers[next_dest], h.dest, h.message_size); + m_send_buffer_bytes += header_bytes; + + size_t precopy_size = m_vec_send_buffers[next_dest].size(); + m_vec_send_buffers[next_dest].resize(precopy_size + h.message_size); + iarchive.loadBinary(&m_vec_send_buffers[next_dest][precopy_size], + h.message_size); + + m_send_buffer_bytes += h.message_size; + + flush_to_capacity(); + } + } else { + uint16_t lid; + iarchive.loadBinary(&lid, sizeof(lid)); + m_lambda_map.execute(lid, this, &iarchive); + m_recv_count++; + stats.rpc_execute(); + } + } + post_new_irecv(buffer); + flush_to_capacity(); +} + +/** + * @brief Process receive queue of messages received by the listener thread. + * + * @return True if receive queue was non-empty, else false + */ +inline bool comm::process_receive_queue() { + ASSERT_RELEASE(!m_in_process_receive_queue); + m_in_process_receive_queue = true; + bool received_to_return = false; + + if (!m_enable_interrupts) { + m_in_process_receive_queue = false; + return received_to_return; + } + + // + // if we have a pending iRecv, then we can issue a Waitsome + if (m_send_queue.size() > config.num_isends_wait) { + MPI_Request twin_req[2]; + twin_req[0] = m_send_queue.front().request; + twin_req[1] = m_recv_queue.front().request; + + int outcount; + int twin_indices[2]; + MPI_Status twin_status[2]; + { + auto timer = stats.waitsome_isend_irecv(); + ASSERT_MPI( + MPI_Waitsome(2, twin_req, &outcount, twin_indices, twin_status)); + } + for (int i = 0; i < outcount; ++i) { + if (twin_indices[i] == 0) { // completed a iSend + m_pending_isend_bytes -= m_send_queue.front().buffer->size(); + m_send_queue.front().buffer->clear(); + m_free_send_buffers.push_back(m_send_queue.front().buffer); + m_send_queue.pop_front(); + } else { // completed an iRecv -- COPIED FROM BELOW + received_to_return = true; + mpi_irecv_request req_buffer = m_recv_queue.front(); + m_recv_queue.pop_front(); + handle_next_receive(twin_status[i], req_buffer.buffer); + } + } + } else { + if (!m_send_queue.empty()) { + int flag(0); + ASSERT_MPI( + MPI_Test(&(m_send_queue.front().request), &flag, MPI_STATUS_IGNORE)); + stats.isend_test(); + if (flag) { + m_pending_isend_bytes -= m_send_queue.front().buffer->size(); + m_send_queue.front().buffer->clear(); + m_free_send_buffers.push_back(m_send_queue.front().buffer); + m_send_queue.pop_front(); + } + } + } + + received_to_return != local_process_incoming(); + + m_in_process_receive_queue = false; + return received_to_return; +} + +inline bool comm::local_process_incoming() { + bool received_to_return = false; + + while (true) { + int flag(0); + MPI_Status status; + ASSERT_MPI(MPI_Test(&(m_recv_queue.front().request), &flag, &status)); + stats.irecv_test(); + if (flag) { + received_to_return = true; + mpi_irecv_request req_buffer = m_recv_queue.front(); + m_recv_queue.pop_front(); + handle_next_receive(status, req_buffer.buffer); + } else { + break; // not ready yet + } + } + return received_to_return; +} +}; // namespace ygm diff --git a/include/ygm/detail/comm_environment.hpp b/include/ygm/detail/comm_environment.hpp index fec797d0..3b833a7d 100644 --- a/include/ygm/detail/comm_environment.hpp +++ b/include/ygm/detail/comm_environment.hpp @@ -6,6 +6,7 @@ #pragma once #include +#include #include #include @@ -13,6 +14,8 @@ namespace ygm { namespace detail { +enum class routing_type { NONE, NR, NLNR }; + /** * @brief Configuration enviornment for ygm::comm. * @@ -53,11 +56,11 @@ class comm_environment { } if (const char* cc = std::getenv("YGM_COMM_ROUTING")) { if (std::string(cc) == "NONE") { - routing = NONE; + routing = routing_type::NONE; } else if (std::string(cc) == "NR") { - routing = NR; + routing = routing_type::NR; } else if (std::string(cc) == "NLNR") { - routing = NLNR; + routing = routing_type::NLNR; } else { throw std::runtime_error("comm_enviornment -- unknown routing type"); } @@ -73,13 +76,13 @@ class comm_environment { << "YGM_COMM_ISSEND_FREQ = " << freq_issend << "\n" << "YGM_COMM_ROUTING = "; switch (routing) { - case NONE: + case routing_type::NONE: os << "NONE\n"; break; - case NR: + case routing_type::NR: os << "NR\n"; break; - case NLNR: + case routing_type::NLNR: os << "NLNR\n"; break; } @@ -96,8 +99,7 @@ class comm_environment { size_t num_isends_wait = 4; size_t freq_issend = 8; - enum routing_type { NONE = 0, NR = 1, NLNR = 2 }; - routing_type routing = NONE; + routing_type routing = routing_type::NONE; bool welcome = false; }; diff --git a/include/ygm/detail/comm_impl.hpp b/include/ygm/detail/comm_impl.hpp deleted file mode 100644 index 08195a28..00000000 --- a/include/ygm/detail/comm_impl.hpp +++ /dev/null @@ -1,1028 +0,0 @@ -// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM -// Project Developers. See the top-level COPYRIGHT file for details. -// -// SPDX-License-Identifier: MIT - -#pragma once - -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include - -namespace ygm { - -class comm::impl : public std::enable_shared_from_this { - private: - friend class ygm::detail::interrupt_mask; - friend class ygm::detail::comm_stats; - - struct mpi_irecv_request { - std::shared_ptr buffer; - MPI_Request request; - }; - - struct mpi_isend_request { - std::shared_ptr> buffer; - MPI_Request request; - }; - - struct header_t { - uint32_t message_size; - uint32_t dest; - - // template - // void serialize(Archive &ar) { - // ar(message_size, dest); - // } - }; - - // NR Routing - int next_hop(const int dest) { - ASSERT_RELEASE(config.routing); - // - // Trevor's - // if (m_layout.is_local(dest)) { - // return dest; - // } else { - // if (config.routing == detail::comm_environment::routing_type::NR) { - // return m_layout.strided_ranks()[m_layout.node_id(dest)]; - // } // else is NLNR - // const auto [dest_node, dest_local] = m_layout.rank_to_nl(dest); - // auto dest_layer_offset = dest_node % m_layout.local_size(); - // if (m_layout.local_id() == dest_layer_offset) { - // auto my_layer_offset = m_layout.node_id() % m_layout.local_size(); - // return m_layout.nl_to_rank(dest_node, my_layer_offset); - // } else { - // return m_layout.nl_to_rank(m_layout.node_id(), dest_layer_offset); - // } - // } - - // - // Roger's hack - static int my_node_id = m_layout.rank() / m_layout.local_size(); - static int my_offset = m_layout.rank() % m_layout.local_size(); - static int my_node_r0 = my_node_id * m_layout.local_size(); - static int my_node_nlnr_offset = my_node_id % m_layout.local_size(); - int dest_node = dest / m_layout.local_size(); - if (my_node_id == dest_node) { - return dest; - } else { - if (config.routing == detail::comm_environment::routing_type::NR) { - return dest_node * m_layout.local_size() + my_offset; - } // else is NLNR - - int responsible_core = my_node_r0 + (dest_node % m_layout.local_size()); - - if (m_layout.rank() == responsible_core) { - return (dest_node * m_layout.local_size()) + my_node_nlnr_offset; - } - return responsible_core; - } - } - - size_t pack_header(std::vector &packed, const int dest, - size_t size) { - size_t size_before = packed.size(); - - header_t h; - h.dest = dest; - h.message_size = size; - - packed.resize(size_before + sizeof(header_t)); - std::memcpy(packed.data() + size_before, &h, sizeof(header_t)); - - // cereal::YGMOutputArchive oarchive(packed); - // oarchive(h); - - return packed.size() - size_before; - } - - public: - impl(MPI_Comm c) : m_layout(c) { - ASSERT_MPI(MPI_Comm_dup(c, &m_comm_async)); - ASSERT_MPI(MPI_Comm_dup(c, &m_comm_barrier)); - ASSERT_MPI(MPI_Comm_dup(c, &m_comm_other)); - - m_vec_send_buffers.resize(m_layout.size()); - - if (config.welcome) { - welcome(std::cout); - } - - for (size_t i = 0; i < config.num_irecvs; ++i) { - std::shared_ptr recv_buffer{ - new std::byte[config.irecv_size]}; - post_new_irecv(recv_buffer); - } - } - - ~impl() { - ASSERT_RELEASE(MPI_Barrier(m_comm_async) == MPI_SUCCESS); - // print_stats(); - - ASSERT_RELEASE(m_send_queue.empty()); - ASSERT_RELEASE(m_send_dest_queue.empty()); - ASSERT_RELEASE(m_send_buffer_bytes == 0); - ASSERT_RELEASE(m_pending_isend_bytes == 0); - - for (size_t i = 0; i < m_recv_queue.size(); ++i) { - ASSERT_RELEASE(MPI_Cancel(&(m_recv_queue[i].request)) == MPI_SUCCESS); - } - ASSERT_RELEASE(MPI_Barrier(m_comm_async) == MPI_SUCCESS); - ASSERT_RELEASE(MPI_Comm_free(&m_comm_async) == MPI_SUCCESS); - ASSERT_RELEASE(MPI_Comm_free(&m_comm_barrier) == MPI_SUCCESS); - ASSERT_RELEASE(MPI_Comm_free(&m_comm_other) == MPI_SUCCESS); - } - - void welcome(std::ostream &os) { - static bool already_printed = false; - if (already_printed) return; - already_printed = true; - std::stringstream sstr; - sstr << "======================================\n" - << " YY YY GGGGGG MM MM \n" - << " YY YY GG GG MMM MMM \n" - << " YYYY GG MMMM MMMM \n" - << " YY GG GGGG MM MMM MM \n" - << " YY GG GG MM MM \n" - << " YY GG GG MM MM \n" - << " YY GGGGGG MM MM \n" - << "======================================\n" - << "COMM_SIZE = " << m_layout.size() << "\n" - << "RANKS_PER_NODE = " << m_layout.local_size() << "\n" - << "NUM_NODES = " << m_layout.node_size() << "\n"; - - config.print(sstr); - - if (rank() == 0) { - os << sstr.str(); - } - } - - int size() const { return m_layout.size(); } - int rank() const { return m_layout.rank(); } - - void stats_reset() { stats.reset(); } - void stats_print(const std::string &name, std::ostream &os) { - comm tmp_comm(shared_from_this()); - stats.print(name, os, tmp_comm); - } - - template - void async(int dest, const SendArgs &...args) { - ASSERT_RELEASE(dest < m_layout.size()); - stats.async(dest); - - check_if_production_halt_required(); - m_send_count++; - - // - // - int next_dest = dest; - if (config.routing) { - next_dest = next_hop(dest); - } - - // - // add data to the to dest buffer - if (m_vec_send_buffers[next_dest].empty()) { - m_send_dest_queue.push_back(next_dest); - m_vec_send_buffers[next_dest].reserve(config.buffer_size / - m_layout.node_size()); - } - - // // Add header without message size - size_t header_bytes = 0; - if (config.routing) { - header_bytes = pack_header(m_vec_send_buffers[next_dest], dest, 0); - m_send_buffer_bytes += header_bytes; - } - - uint32_t bytes = pack_lambda(m_vec_send_buffers[next_dest], - std::forward(args)...); - m_send_buffer_bytes += bytes; - - // // Add message size to header - if (config.routing) { - auto iter = m_vec_send_buffers[next_dest].end(); - iter -= (header_bytes + bytes); - std::memcpy(&*iter, &bytes, sizeof(header_t::dest)); - } - - // - // Check if send buffer capacity has been exceeded - if (!m_in_process_receive_queue) { - flush_to_capacity(); - } - } - - template - void async_bcast(const SendArgs &...args) { - check_if_production_halt_required(); - - pack_lambda_broadcast(std::forward(args)...); - - // - // Check if send buffer capacity has been exceeded - if (!m_in_process_receive_queue) { - flush_to_capacity(); - } - } - - template - void async_mcast(const std::vector &dests, const SendArgs &...args) { - for (auto dest : dests) { - async(dest, std::forward(args)...); - } - } - - /** - * @brief Control Flow Barrier - * Only blocks the control flow until all processes in the communicator have - * called it. See: MPI_Barrier() - */ - void cf_barrier() { ASSERT_MPI(MPI_Barrier(m_comm_barrier)); } - - /** - * @brief Full communicator barrier - * - */ - void barrier() { - flush_all_local_and_process_incoming(); - std::pair previous_counts{1, 2}; - std::pair current_counts{3, 4}; - while (!(current_counts.first == current_counts.second && - previous_counts == current_counts)) { - previous_counts = current_counts; - current_counts = barrier_reduce_counts(); - if (current_counts.first != current_counts.second) { - flush_all_local_and_process_incoming(); - } - } - ASSERT_RELEASE(m_pre_barrier_callbacks.empty()); - ASSERT_RELEASE(m_send_dest_queue.empty()); - } - - /** - * @brief Registers a callback that will be executed prior to the barrier - * completion - * - * @param fn callback function - */ - void register_pre_barrier_callback(const std::function &fn) { - m_pre_barrier_callbacks.push_back(fn); - } - - template - ygm_ptr make_ygm_ptr(T &t) { - ygm_ptr to_return(&t); - to_return.check(*this); - return to_return; - } - - template - T all_reduce_sum(const T &t) const { - T to_return; - ASSERT_MPI(MPI_Allreduce(&t, &to_return, 1, detail::mpi_typeof(T()), - MPI_SUM, m_comm_other)); - return to_return; - } - - template - T all_reduce_min(const T &t) const { - T to_return; - ASSERT_MPI(MPI_Allreduce(&t, &to_return, 1, detail::mpi_typeof(T()), - MPI_MIN, m_comm_other)); - return to_return; - } - - template - T all_reduce_max(const T &t) const { - T to_return; - ASSERT_MPI(MPI_Allreduce(&t, &to_return, 1, detail::mpi_typeof(T()), - MPI_MAX, m_comm_other)); - return to_return; - } - - template - void mpi_send(const T &data, int dest, int tag, MPI_Comm comm) const { - std::vector packed; - cereal::YGMOutputArchive oarchive(packed); - oarchive(data); - size_t packed_size = packed.size(); - ASSERT_RELEASE(packed_size < 1024 * 1024 * 1024); - ASSERT_MPI(MPI_Send(&packed_size, 1, detail::mpi_typeof(packed_size), dest, - tag, comm)); - ASSERT_MPI(MPI_Send(packed.data(), packed_size, MPI_BYTE, dest, tag, comm)); - } - - template - T mpi_recv(int source, int tag, MPI_Comm comm) const { - std::vector packed; - size_t packed_size{0}; - ASSERT_MPI(MPI_Recv(&packed_size, 1, detail::mpi_typeof(packed_size), - source, tag, comm, MPI_STATUS_IGNORE)); - packed.resize(packed_size); - ASSERT_MPI(MPI_Recv(packed.data(), packed_size, MPI_BYTE, source, tag, comm, - MPI_STATUS_IGNORE)); - - T to_return; - cereal::YGMInputArchive iarchive(packed.data(), packed.size()); - iarchive(to_return); - return to_return; - } - - template - T mpi_bcast(const T &to_bcast, int root, MPI_Comm comm) const { - std::vector packed; - cereal::YGMOutputArchive oarchive(packed); - if (rank() == root) { - oarchive(to_bcast); - } - size_t packed_size = packed.size(); - ASSERT_RELEASE(packed_size < 1024 * 1024 * 1024); - ASSERT_MPI(MPI_Bcast(&packed_size, 1, detail::mpi_typeof(packed_size), root, - comm)); - if (rank() != root) { - packed.resize(packed_size); - } - ASSERT_MPI(MPI_Bcast(packed.data(), packed_size, MPI_BYTE, root, comm)); - - cereal::YGMInputArchive iarchive(packed.data(), packed.size()); - T to_return; - iarchive(to_return); - return to_return; - } - - /** - * @brief Tree based reduction, could be optimized significantly - * - * @tparam T - * @tparam MergeFunction - * @param in - * @param merge - * @return T - */ - template - T all_reduce(const T &in, MergeFunction merge) const { - int first_child = 2 * rank() + 1; - int second_child = 2 * (rank() + 1); - int parent = (rank() - 1) / 2; - - // Step 1: Receive from children, merge into tmp - T tmp = in; - if (first_child < size()) { - T fc = mpi_recv(first_child, 0, m_comm_other); - tmp = merge(tmp, fc); - } - if (second_child < size()) { - T sc = mpi_recv(second_child, 0, m_comm_other); - tmp = merge(tmp, sc); - } - - // Step 2: Send merged to parent - if (rank() != 0) { - mpi_send(tmp, parent, 0, m_comm_other); - } - - // Step 3: Rank 0 bcasts - T to_return = mpi_bcast(tmp, 0, m_comm_other); - return to_return; - } - - const detail::layout &layout() const { return m_layout; } - - private: - std::pair barrier_reduce_counts() { - uint64_t local_counts[2] = {m_recv_count, m_send_count}; - uint64_t global_counts[2] = {0, 0}; - - ASSERT_RELEASE(m_pending_isend_bytes == 0); - ASSERT_RELEASE(m_send_buffer_bytes == 0); - - MPI_Request req = MPI_REQUEST_NULL; - ASSERT_MPI(MPI_Iallreduce(local_counts, global_counts, 2, MPI_UINT64_T, - MPI_SUM, m_comm_barrier, &req)); - stats.iallreduce(); - bool iallreduce_complete(false); - while (!iallreduce_complete) { - MPI_Request twin_req[2]; - twin_req[0] = req; - twin_req[1] = m_recv_queue.front().request; - - int outcount; - int twin_indices[2]; - MPI_Status twin_status[2]; - - { - auto timer = stats.waitsome_iallreduce(); - ASSERT_MPI( - MPI_Waitsome(2, twin_req, &outcount, twin_indices, twin_status)); - } - - for (int i = 0; i < outcount; ++i) { - if (twin_indices[i] == 0) { // completed a Iallreduce - iallreduce_complete = true; - // std::cout << m_layout.rank() << ": iallreduce_complete: " << - // global_counts[0] << " " << global_counts[1] << std::endl; - } else { - handle_next_receive(twin_status[i]); - flush_all_local_and_process_incoming(); - } - } - } - return {global_counts[0], global_counts[1]}; - } - - /** - * @brief Flushes send buffer to dest - * - * @param dest - */ - void flush_send_buffer(int dest) { - static size_t counter = 0; - if (m_vec_send_buffers[dest].size() > 0) { - mpi_isend_request request; - if (m_free_send_buffers.empty()) { - request.buffer = std::make_shared>(); - } else { - request.buffer = m_free_send_buffers.back(); - m_free_send_buffers.pop_back(); - } - request.buffer->swap(m_vec_send_buffers[dest]); - if (config.freq_issend > 0 && counter++ % config.freq_issend == 0) { - ASSERT_MPI(MPI_Issend(request.buffer->data(), request.buffer->size(), - MPI_BYTE, dest, 0, m_comm_async, - &(request.request))); - } else { - ASSERT_MPI(MPI_Isend(request.buffer->data(), request.buffer->size(), - MPI_BYTE, dest, 0, m_comm_async, - &(request.request))); - } - stats.isend(dest, request.buffer->size()); - m_pending_isend_bytes += request.buffer->size(); - m_send_buffer_bytes -= request.buffer->size(); - m_send_queue.push_back(request); - if (!m_in_process_receive_queue) { - process_receive_queue(); - } - } - } - - void check_if_production_halt_required() { - while (m_enable_interrupts && !m_in_process_receive_queue && - m_pending_isend_bytes > config.buffer_size) { - process_receive_queue(); - } - } - - /** - * @brief Flushes all local state and buffers. - * Notifies any registered barrier watchers. - */ - void flush_all_local_and_process_incoming() { - // Keep flushing until all local work is complete - bool did_something = true; - while (did_something) { - did_something = process_receive_queue(); - // - // Notify registered barrier watchers - while (!m_pre_barrier_callbacks.empty()) { - did_something = true; - std::function fn = m_pre_barrier_callbacks.front(); - m_pre_barrier_callbacks.pop_front(); - fn(); - } - - // - // Flush each send buffer - while (!m_send_dest_queue.empty()) { - did_something = true; - int dest = m_send_dest_queue.front(); - m_send_dest_queue.pop_front(); - flush_send_buffer(dest); - process_receive_queue(); - } - - // - // Wait on isends - while (!m_send_queue.empty()) { - did_something |= process_receive_queue(); - } - } - } - - /** - * @brief Flush send buffers until queued sends are smaller than buffer - * capacity - */ - void flush_to_capacity() { - while (m_send_buffer_bytes > config.buffer_size) { - ASSERT_DEBUG(!m_send_dest_queue.empty()); - int dest = m_send_dest_queue.front(); - m_send_dest_queue.pop_front(); - flush_send_buffer(dest); - } - } - - void post_new_irecv(std::shared_ptr &recv_buffer) { - mpi_irecv_request recv_req; - recv_req.buffer = recv_buffer; - - //::madvise(recv_req.buffer.get(), config.irecv_size, MADV_DONTNEED); - ASSERT_MPI(MPI_Irecv(recv_req.buffer.get(), config.irecv_size, MPI_BYTE, - MPI_ANY_SOURCE, MPI_ANY_TAG, m_comm_async, - &(recv_req.request))); - m_recv_queue.push_back(recv_req); - } - - template - size_t pack_lambda(std::vector &packed, Lambda l, - const PackArgs &...args) { - size_t size_before = packed.size(); - const std::tuple tuple_args( - std::forward(args)...); - ASSERT_DEBUG(sizeof(Lambda) == 1); - - auto dispatch_lambda = [](comm *c, cereal::YGMInputArchive *bia, Lambda l) { - Lambda *pl = nullptr; - size_t l_storage[sizeof(Lambda) / sizeof(size_t) + - (sizeof(Lambda) % sizeof(size_t) > 0)]; - if constexpr (!std::is_empty::value) { - bia->loadBinary(l_storage, sizeof(Lambda)); - pl = (Lambda *)l_storage; - } - - std::tuple ta; - if constexpr (!std::is_empty>::value) { - (*bia)(ta); - } - - auto t1 = std::make_tuple((comm *)c); - - // \pp was: std::apply(*pl, std::tuple_cat(t1, ta)); - ygm::meta::apply_optional(*pl, std::move(t1), std::move(ta)); - }; - - return pack_lambda_generic(packed, l, dispatch_lambda, - std::forward(args)...); - } - - template - void pack_lambda_broadcast(Lambda l, const PackArgs &...args) { - const std::tuple tuple_args( - std::forward(args)...); - ASSERT_DEBUG(sizeof(Lambda) == 1); - - auto forward_remote_and_dispatch_lambda = [](comm *c, - cereal::YGMInputArchive *bia, - Lambda l) { - Lambda *pl = nullptr; - size_t l_storage[sizeof(Lambda) / sizeof(size_t) + - (sizeof(Lambda) % sizeof(size_t) > 0)]; - if constexpr (!std::is_empty::value) { - bia->loadBinary(l_storage, sizeof(Lambda)); - pl = (Lambda *)l_storage; - } - - std::tuple ta; - if constexpr (!std::is_empty>::value) { - (*bia)(ta); - } - - auto forward_local_and_dispatch_lambda = [](comm *c, - cereal::YGMInputArchive *bia, - Lambda l) { - Lambda *pl = nullptr; - size_t l_storage[sizeof(Lambda) / sizeof(size_t) + - (sizeof(Lambda) % sizeof(size_t) > 0)]; - if constexpr (!std::is_empty::value) { - bia->loadBinary(l_storage, sizeof(Lambda)); - pl = (Lambda *)l_storage; - } - - std::tuple ta; - if constexpr (!std::is_empty>::value) { - (*bia)(ta); - } - - auto local_dispatch_lambda = [](comm *c, cereal::YGMInputArchive *bia, - Lambda l) { - Lambda *pl = nullptr; - size_t l_storage[sizeof(Lambda) / sizeof(size_t) + - (sizeof(Lambda) % sizeof(size_t) > 0)]; - if constexpr (!std::is_empty::value) { - bia->loadBinary(l_storage, sizeof(Lambda)); - pl = (Lambda *)l_storage; - } - - std::tuple ta; - if constexpr (!std::is_empty>::value) { - (*bia)(ta); - } - - auto t1 = std::make_tuple((comm *)c); - - // \pp was: std::apply(*pl, std::tuple_cat(t1, ta)); - ygm::meta::apply_optional(*pl, std::move(t1), std::move(ta)); - }; - - // Pack lambda telling terminal ranks to execute user lambda. - // TODO: Why does this work? Passing ta (tuple of args) to a function - // expecting a parameter pack shouldn't work... - std::vector packed_msg; - c->pimpl->pack_lambda_generic(packed_msg, *pl, local_dispatch_lambda, - ta); - - for (auto dest : c->layout().local_ranks()) { - if (dest != c->layout().rank()) { - c->pimpl->queue_message_bytes(packed_msg, dest); - } - } - - auto t1 = std::make_tuple((comm *)c); - - // \pp was: std::apply(*pl, std::tuple_cat(t1, ta)); - ygm::meta::apply_optional(*pl, std::move(t1), std::move(ta)); - }; - - std::vector packed_msg; - c->pimpl->pack_lambda_generic(packed_msg, *pl, - forward_local_and_dispatch_lambda, ta); - - int num_layers = c->layout().node_size() / c->layout().local_size() + - (c->layout().node_size() % c->layout().local_size() > 0); - int num_ranks_per_layer = - c->layout().local_size() * c->layout().local_size(); - int layer_comm_partner_offset = - c->layout().local_id() * c->layout().local_size() + - c->layout().node_id() % c->layout().local_size(); - int curr_partner = layer_comm_partner_offset; - for (int l = 0; l < num_layers; l++) { - if (curr_partner >= c->layout().size()) { - break; - } - if (!c->layout().is_local(curr_partner)) { - c->pimpl->queue_message_bytes(packed_msg, curr_partner); - } - - curr_partner += num_ranks_per_layer; - } - - auto t1 = std::make_tuple((comm *)c); - - // \pp was: std::apply(*pl, std::tuple_cat(t1, ta)); - ygm::meta::apply_optional(*pl, std::move(t1), std::move(ta)); - }; - - std::vector packed_msg; - pack_lambda_generic(packed_msg, l, forward_remote_and_dispatch_lambda, - std::forward(args)...); - - // Initial send to all local ranks - for (auto dest : layout().local_ranks()) { - queue_message_bytes(packed_msg, dest); - } - } - - template - size_t pack_lambda_generic(std::vector &packed, Lambda l, - RemoteLogicLambda rll, const PackArgs &...args) { - size_t size_before = packed.size(); - const std::tuple tuple_args( - std::forward(args)...); - ASSERT_DEBUG(sizeof(Lambda) == 1); - - auto remote_dispatch_lambda = [](comm *c, cereal::YGMInputArchive *bia) { - RemoteLogicLambda *rll = nullptr; - Lambda *pl = nullptr; - - (*rll)(c, bia, *pl); - }; - - uint16_t lid = m_lambda_map.register_lambda(remote_dispatch_lambda); - - { - size_t size_before = packed.size(); - packed.resize(size_before + sizeof(lid)); - std::memcpy(packed.data() + size_before, &lid, sizeof(lid)); - } - - if constexpr (!std::is_empty::value) { - // oarchive.saveBinary(&l, sizeof(Lambda)); - size_t size_before = packed.size(); - packed.resize(size_before + sizeof(Lambda)); - std::memcpy(packed.data() + size_before, &l, sizeof(Lambda)); - } - - if constexpr (!std::is_empty>::value) { - // Only create cereal archive is tuple needs serialization - cereal::YGMOutputArchive oarchive(packed); // Create an output archive - oarchive(tuple_args); - } - return packed.size() - size_before; - } - - /** - * @brief Adds packed message directly to send buffer for specific - * destination. Does not modify packed message to add headers for routing. - * - */ - void queue_message_bytes(const std::vector &packed, - const int dest) { - m_send_count++; - - // - // add data to the to dest buffer - if (m_vec_send_buffers[dest].empty()) { - m_send_dest_queue.push_back(dest); - m_vec_send_buffers[dest].reserve(config.buffer_size / - m_layout.node_size()); - } - - std::vector &send_buff = m_vec_send_buffers[dest]; - - size_t size_before = send_buff.size(); - send_buff.resize(size_before + packed.size()); - std::memcpy(send_buff.data() + size_before, packed.data(), packed.size()); - - m_send_buffer_bytes += packed.size(); - } - - /** - * @brief Static reference point to anchor address space randomization. - * - */ - static void reference() {} - - void handle_next_receive(MPI_Status status) { - comm tmp_comm(shared_from_this()); - int count{0}; - ASSERT_MPI(MPI_Get_count(&status, MPI_BYTE, &count)); - stats.irecv(status.MPI_SOURCE, count); - // std::cout << m_layout.rank() << ": received " << count << std::endl; - cereal::YGMInputArchive iarchive(m_recv_queue.front().buffer.get(), count); - while (!iarchive.empty()) { - if (config.routing) { - header_t h; - iarchive.loadBinary(&h, sizeof(header_t)); - if (h.dest == m_layout.rank()) { - uint16_t lid; - iarchive.loadBinary(&lid, sizeof(lid)); - m_lambda_map.execute(lid, &tmp_comm, &iarchive); - m_recv_count++; - stats.rpc_execute(); - } else { - int next_dest = next_hop(h.dest); - - if (m_vec_send_buffers[next_dest].empty()) { - m_send_dest_queue.push_back(next_dest); - } - - size_t header_bytes = pack_header(m_vec_send_buffers[next_dest], - h.dest, h.message_size); - m_send_buffer_bytes += header_bytes; - - size_t precopy_size = m_vec_send_buffers[next_dest].size(); - m_vec_send_buffers[next_dest].resize(precopy_size + h.message_size); - iarchive.loadBinary(&m_vec_send_buffers[next_dest][precopy_size], - h.message_size); - - m_send_buffer_bytes += h.message_size; - - flush_to_capacity(); - } - } else { - uint16_t lid; - iarchive.loadBinary(&lid, sizeof(lid)); - m_lambda_map.execute(lid, &tmp_comm, &iarchive); - m_recv_count++; - stats.rpc_execute(); - } - } - post_new_irecv(m_recv_queue.front().buffer); - m_recv_queue.pop_front(); - flush_to_capacity(); - } - - /** - * @brief Process receive queue of messages received by the listener thread. - * - * @return True if receive queue was non-empty, else false - */ - bool process_receive_queue() { - ASSERT_RELEASE(!m_in_process_receive_queue); - m_in_process_receive_queue = true; - bool received_to_return = false; - - if (!m_enable_interrupts) { - m_in_process_receive_queue = false; - return received_to_return; - } - - // - // if we have a pending iRecv, then we can issue a Waitsome - if (m_send_queue.size() > config.num_isends_wait) { - MPI_Request twin_req[2]; - twin_req[0] = m_send_queue.front().request; - twin_req[1] = m_recv_queue.front().request; - - int outcount; - int twin_indices[2]; - MPI_Status twin_status[2]; - { - auto timer = stats.waitsome_isend_irecv(); - ASSERT_MPI( - MPI_Waitsome(2, twin_req, &outcount, twin_indices, twin_status)); - } - for (int i = 0; i < outcount; ++i) { - if (twin_indices[i] == 0) { // completed a iSend - m_pending_isend_bytes -= m_send_queue.front().buffer->size(); - m_send_queue.front().buffer->clear(); - m_free_send_buffers.push_back(m_send_queue.front().buffer); - m_send_queue.pop_front(); - } else { // completed an iRecv -- COPIED FROM BELOW - received_to_return = true; - handle_next_receive(twin_status[i]); - } - } - } else { - if (!m_send_queue.empty()) { - int flag(0); - ASSERT_MPI(MPI_Test(&(m_send_queue.front().request), &flag, - MPI_STATUS_IGNORE)); - stats.isend_test(); - if (flag) { - m_pending_isend_bytes -= m_send_queue.front().buffer->size(); - m_send_queue.front().buffer->clear(); - m_free_send_buffers.push_back(m_send_queue.front().buffer); - m_send_queue.pop_front(); - } - } - } - - while (true) { - int flag(0); - MPI_Status status; - ASSERT_MPI(MPI_Test(&(m_recv_queue.front().request), &flag, &status)); - stats.irecv_test(); - if (flag) { - received_to_return = true; - handle_next_receive(status); - } else { - break; // not ready yet - } - } - - m_in_process_receive_queue = false; - return received_to_return; - } - - MPI_Comm m_comm_async; - MPI_Comm m_comm_barrier; - MPI_Comm m_comm_other; - // int m_layout.size(); - // int m_layout.rank(); - - std::vector> m_vec_send_buffers; - size_t m_send_buffer_bytes = 0; - std::deque m_send_dest_queue; - - std::deque m_recv_queue; - std::deque m_send_queue; - std::vector>> m_free_send_buffers; - - size_t m_pending_isend_bytes = 0; - - std::deque> m_pre_barrier_callbacks; - - bool m_enable_interrupts = true; - - uint64_t m_recv_count = 0; - uint64_t m_send_count = 0; - - bool m_in_process_receive_queue = false; - - detail::comm_stats stats; - const detail::comm_environment config; - const detail::layout m_layout; - - ygm::detail::lambda_map - m_lambda_map; -}; - -inline comm::comm(int *argc, char ***argv) { - pimpl_if = std::make_shared(argc, argv); - pimpl = std::make_shared(MPI_COMM_WORLD); -} - -inline comm::comm(MPI_Comm mcomm) { - pimpl_if.reset(); - int flag(0); - ASSERT_MPI(MPI_Initialized(&flag)); - if (!flag) { - throw std::runtime_error("YGM::COMM ERROR: MPI not initialized"); - } - pimpl = std::make_shared(mcomm); -} - -inline void comm::welcome(std::ostream &os) { pimpl->welcome(os); } - -inline void comm::stats_reset() { pimpl->stats_reset(); } -inline void comm::stats_print(const std::string &name, std::ostream &os) { - pimpl->stats_print(name, os); -} - -inline comm::comm(std::shared_ptr impl_ptr) : pimpl(impl_ptr) {} - -inline comm::~comm() { - if (pimpl.use_count() == 1) { - barrier(); - } - pimpl.reset(); - pimpl_if.reset(); -} - -template -inline void comm::async(int dest, AsyncFunction fn, const SendArgs &...args) { - static_assert(std::is_trivially_copyable::value && - std::is_standard_layout::value, - "comm::async() AsyncFunction must be is_trivially_copyable & " - "is_standard_layout."); - pimpl->async(dest, fn, std::forward(args)...); -} - -template -inline void comm::async_bcast(AsyncFunction fn, const SendArgs &...args) { - static_assert( - std::is_trivially_copyable::value && - std::is_standard_layout::value, - "comm::async_bcast() AsyncFunction must be is_trivially_copyable & " - "is_standard_layout."); - pimpl->async_bcast(fn, std::forward(args)...); -} - -template -inline void comm::async_mcast(const std::vector &dests, AsyncFunction fn, - const SendArgs &...args) { - static_assert( - std::is_trivially_copyable::value && - std::is_standard_layout::value, - "comm::async_mcast() AsyncFunction must be is_trivially_copyable & " - "is_standard_layout."); - pimpl->async_mcast(dests, fn, std::forward(args)...); -} - -inline const detail::layout &comm::layout() const { return pimpl->layout(); } - -inline int comm::size() const { return pimpl->size(); } -inline int comm::rank() const { return pimpl->rank(); } - -inline void comm::barrier() { pimpl->barrier(); } - -inline void comm::cf_barrier() { pimpl->cf_barrier(); } - -template -inline ygm_ptr comm::make_ygm_ptr(T &t) { - return pimpl->make_ygm_ptr(t); -} - -inline void comm::register_pre_barrier_callback( - const std::function &fn) { - pimpl->register_pre_barrier_callback(fn); -} - -template -inline T comm::all_reduce_sum(const T &t) const { - return pimpl->all_reduce_sum(t); -} - -template -inline T comm::all_reduce_min(const T &t) const { - return pimpl->all_reduce_min(t); -} - -template -inline T comm::all_reduce_max(const T &t) const { - return pimpl->all_reduce_max(t); -} - -template -inline T comm::all_reduce(const T &t, MergeFunction merge) { - return pimpl->all_reduce(t, merge); -} - -} // namespace ygm diff --git a/include/ygm/detail/comm_router.hpp b/include/ygm/detail/comm_router.hpp new file mode 100644 index 00000000..aeb98e50 --- /dev/null +++ b/include/ygm/detail/comm_router.hpp @@ -0,0 +1,84 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once + +#include +#include + +namespace ygm { + +namespace detail { + +/** + * @brief Provides routing destinations for messages in comm + */ +class comm_router { + public: + comm_router(const layout &l, const routing_type route = routing_type::NONE) + : m_layout(l), m_default_route(route) {} + + /** + * @brief Calculates the next hop based on the given routing scheme and final + * destination + * + * @note The routes calculated should always satisfy the following + * assumptions: + * 1. routing_type::NONE sends directly to the destination + * 2. routing_type::NR makes at most 2 hops, a remote hop followed by an + * on-node hop + * 3. routing_type::NLNR makes at most 3 hops, an on-node hop, followed by a + * remote hop, followed by an on-node hop + * 4. The pairs of remote processes communicating in routing_type::NLNR is a + * subset of those communicating in routing_type::NR + */ + int next_hop(const int dest, const routing_type route) const { + int to_return; + switch (route) { + case routing_type::NONE: + to_return = dest; + break; + case routing_type::NR: + if (m_layout.is_local(dest)) { + to_return = dest; + } else { + to_return = m_layout.strided_ranks()[m_layout.node_id(dest)]; + } + break; + case routing_type::NLNR: + if (m_layout.is_local(dest)) { + to_return = dest; + } else { + int dest_node = m_layout.node_id(dest); + + // Determine core offset for off-node communication + int comm_channel_offset = + (dest_node + m_layout.node_id()) % m_layout.local_size(); + int local_comm_rank = m_layout.local_ranks()[comm_channel_offset]; + + if (m_layout.rank() == local_comm_rank) { + to_return = m_layout.strided_ranks()[dest_node]; + } else { + to_return = local_comm_rank; + } + } + break; + default: + std::cerr << "Unknown routing type" << std::endl; + return -1; + } + + return to_return; + } + + int next_hop(const int dest) const { return next_hop(dest, m_default_route); } + + private: + routing_type m_default_route; + const layout &m_layout; +}; + +} // namespace detail +} // namespace ygm diff --git a/include/ygm/detail/comm_stats.hpp b/include/ygm/detail/comm_stats.hpp index a5cd3e72..20ae670f 100644 --- a/include/ygm/detail/comm_stats.hpp +++ b/include/ygm/detail/comm_stats.hpp @@ -5,7 +5,7 @@ #pragma once -#include +#include namespace ygm { namespace detail { @@ -24,28 +24,6 @@ class comm_stats { comm_stats() : m_time_start(MPI_Wtime()) {} - void print(const std::string& name, std::ostream& os, ygm::comm& comm) { - std::stringstream sstr; - sstr << "============== STATS =================\n" - << "NAME = " << name << "\n" - << "TIME = " << MPI_Wtime() - m_time_start << "\n" - << "GLOBAL_ASYNC_COUNT = " << comm.all_reduce_sum(m_async_count) - << "\n" - << "GLOBAL_ISEND_COUNT = " << comm.all_reduce_sum(m_isend_count) - << "\n" - << "GLOBAL_ISEND_BYTES = " << comm.all_reduce_sum(m_isend_bytes) - << "\n" - << "MAX_WAITSOME_ISEND_IRECV = " - << comm.all_reduce_max(m_waitsome_isend_irecv_time) << "\n" - << "MAX_WAITSOME_IALLREDUCE = " - << comm.all_reduce_max(m_waitsome_iallreduce_time) << "\n" - << "COUNT_IALLREDUCE = " << m_iallreduce_count << "\n" - << "======================================"; - if (comm.rank0()) { - os << sstr.str() << std::endl; - } - } - void isend(int dest, size_t bytes) { m_isend_count += 1; m_isend_bytes += bytes; @@ -96,6 +74,35 @@ class comm_stats { m_time_start = MPI_Wtime(); } + size_t get_async_count() const { return m_async_count; } + size_t get_rpc_count() const { return m_rpc_count; } + size_t get_route_count() const { return m_route_count; } + + size_t get_isend_count() const { return m_isend_count; } + size_t get_isend_bytes() const { return m_isend_bytes; } + size_t get_isend_test_count() const { return m_isend_test_count; } + + size_t get_irecv_count() const { return m_irecv_count; } + size_t get_irecv_bytes() const { return m_irecv_bytes; } + size_t get_irecv_test_count() const { return m_irecv_test_count; } + + double get_waitsome_isend_irecv_time() const { + return m_waitsome_isend_irecv_time; + } + size_t get_waitsome_isend_irecv_count() const { + return m_waitsome_isend_irecv_count; + } + + size_t get_iallreduce_count() const { return m_iallreduce_count; } + double get_waitsome_iallreduce_time() const { + return m_waitsome_iallreduce_time; + } + size_t get_waitsome_iallreduce_count() const { + return m_waitsome_iallreduce_count; + } + + double get_elapsed_time() const { return MPI_Wtime() - m_time_start; } + private: size_t m_async_count = 0; size_t m_rpc_count = 0; @@ -119,4 +126,4 @@ class comm_stats { double m_time_start = 0.0; }; } // namespace detail -} // namespace ygm \ No newline at end of file +} // namespace ygm diff --git a/include/ygm/detail/interrupt_mask.hpp b/include/ygm/detail/interrupt_mask.hpp index f8c489ef..5ed92019 100644 --- a/include/ygm/detail/interrupt_mask.hpp +++ b/include/ygm/detail/interrupt_mask.hpp @@ -14,12 +14,12 @@ namespace detail { class interrupt_mask { public: interrupt_mask(ygm::comm &c) : m_comm(c) { - m_comm.pimpl->m_enable_interrupts = false; + m_comm.m_enable_interrupts = false; } ~interrupt_mask() { - m_comm.pimpl->m_enable_interrupts = true; - // m_comm.pimpl->process_receive_queue(); //causes recursion into + m_comm.m_enable_interrupts = true; + // m_comm.process_receive_queue(); //causes recursion into // process_receive_queue } diff --git a/include/ygm/detail/lambda_map.hpp b/include/ygm/detail/lambda_map.hpp index ff87923a..9b9bd978 100644 --- a/include/ygm/detail/lambda_map.hpp +++ b/include/ygm/detail/lambda_map.hpp @@ -5,6 +5,7 @@ #pragma once +#include #include #include @@ -50,4 +51,4 @@ const FuncId lambda_map::lambda_enumerator::id = lambda_map::record(); } // namespace detail -} // namespace ygm \ No newline at end of file +} // namespace ygm diff --git a/include/ygm/detail/mpi.hpp b/include/ygm/detail/mpi.hpp index 784e8993..e026aad7 100644 --- a/include/ygm/detail/mpi.hpp +++ b/include/ygm/detail/mpi.hpp @@ -7,18 +7,13 @@ #include #include +#include namespace ygm::detail { class mpi_init_finalize { public: mpi_init_finalize(int *argc, char ***argv) { ASSERT_MPI(MPI_Init(argc, argv)); - // int provided; - // ASSERT_MPI(MPI_Init_thread(argc, argv, MPI_THREAD_MULTIPLE, &provided)); - // if (provided != MPI_THREAD_MULTIPLE) { - // throw std::runtime_error( - // "MPI_Init_thread: MPI_THREAD_MULTIPLE not provided."); - // } } ~mpi_init_finalize() { ASSERT_RELEASE(MPI_Barrier(MPI_COMM_WORLD) == MPI_SUCCESS); @@ -29,20 +24,75 @@ class mpi_init_finalize { } }; -inline MPI_Datatype mpi_typeof(char) { return MPI_CHAR; } -inline MPI_Datatype mpi_typeof(signed short) { return MPI_SHORT; } -inline MPI_Datatype mpi_typeof(signed int) { return MPI_INT; } -inline MPI_Datatype mpi_typeof(signed long) { return MPI_LONG; } -inline MPI_Datatype mpi_typeof(unsigned char) { return MPI_UNSIGNED_CHAR; } -inline MPI_Datatype mpi_typeof(unsigned short) { return MPI_UNSIGNED_SHORT; } -inline MPI_Datatype mpi_typeof(unsigned) { return MPI_UNSIGNED; } -inline MPI_Datatype mpi_typeof(unsigned long) { return MPI_UNSIGNED_LONG; } -inline MPI_Datatype mpi_typeof(unsigned long long) { - return MPI_UNSIGNED_LONG_LONG; -} -inline MPI_Datatype mpi_typeof(signed long long) { return MPI_LONG_LONG_INT; } -inline MPI_Datatype mpi_typeof(float) { return MPI_FLOAT; } -inline MPI_Datatype mpi_typeof(double) { return MPI_DOUBLE; } -inline MPI_Datatype mpi_typeof(long double) { return MPI_LONG_DOUBLE; } +template +inline MPI_Datatype mpi_typeof(T) { + static_assert(always_false<>, "Unkown MPI Type"); + return 0; +} + +template <> +inline MPI_Datatype mpi_typeof(char) { + return MPI_CHAR; +} + +template <> +inline MPI_Datatype mpi_typeof(bool) { + return MPI_CXX_BOOL; +} + +template <> +inline MPI_Datatype mpi_typeof(int8_t) { + return MPI_INT8_T; +} + +template <> +inline MPI_Datatype mpi_typeof(int16_t) { + return MPI_INT16_T; +} + +template <> +inline MPI_Datatype mpi_typeof(int32_t) { + return MPI_INT32_T; +} + +template <> +inline MPI_Datatype mpi_typeof(int64_t) { + return MPI_INT64_T; +} + +template <> +inline MPI_Datatype mpi_typeof(uint8_t) { + return MPI_UINT8_T; +} + +template <> +inline MPI_Datatype mpi_typeof(uint16_t) { + return MPI_UINT16_T; +} + +template <> +inline MPI_Datatype mpi_typeof(uint32_t) { + return MPI_UINT32_T; +} + +template <> +inline MPI_Datatype mpi_typeof(uint64_t) { + return MPI_UINT64_T; +} + +template <> +inline MPI_Datatype mpi_typeof(float) { + return MPI_FLOAT; +} + +template <> +inline MPI_Datatype mpi_typeof(double) { + return MPI_DOUBLE; +} + +template <> +inline MPI_Datatype mpi_typeof(long double) { + return MPI_LONG_DOUBLE; +} } // namespace ygm::detail diff --git a/include/ygm/detail/random.hpp b/include/ygm/detail/random.hpp new file mode 100644 index 00000000..1047c49f --- /dev/null +++ b/include/ygm/detail/random.hpp @@ -0,0 +1,51 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once + +#include + +#include + +namespace ygm::detail { + +/// @brief Applies a simple offset to the specified seed according to rank index +/// @tparam ResultType The random number (seed) type; defaults to std::mt19937 +/// @param comm The ygm::comm to be used +/// @param seed The specified seed +/// @return simply returns seed + rank +template +ResultType simple_offset(ygm::comm &comm, ResultType seed) { + return seed + comm.rank(); +} + +/// @brief A wrapper around a per-rank random engine that manipulates each +/// rank's seed according to a specified strategy +/// @tparam RandomEngine The underlying random engine, e.g. std::mt19337 +/// @tparam Function A `(ygm::comm, result_type) -> result_type` function that +/// modifies seeds for each rank +template +class random_engine { + public: + using rng_type = RandomEngine; + using result_type = typename RandomEngine::result_type; + + random_engine(ygm::comm &comm, result_type seed = std::random_device{}()) + : m_seed(Function(comm, seed)), m_rng(Function(comm, seed)) {} + + result_type operator()() { return m_rng(); } + + constexpr const result_type &seed() const { return m_seed; } + + static constexpr result_type min() { return rng_type::min(); } + static constexpr result_type max() { return rng_type::max(); } + + private: + rng_type m_rng; + result_type m_seed; +}; +} // namespace ygm::detail \ No newline at end of file diff --git a/include/ygm/detail/std_traits.hpp b/include/ygm/detail/std_traits.hpp new file mode 100644 index 00000000..66e35888 --- /dev/null +++ b/include/ygm/detail/std_traits.hpp @@ -0,0 +1,22 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once +#include +#include + +namespace ygm::detail { + +// is_std_pair +template +struct is_std_pair_impl : std::false_type {}; + +template +struct is_std_pair_impl> : std::true_type {}; + +template +constexpr bool is_std_pair = is_std_pair_impl::value; + +} // namespace ygm::detail diff --git a/include/ygm/detail/ygm_cereal_archive.hpp b/include/ygm/detail/ygm_cereal_archive.hpp index a601bd59..92d77771 100644 --- a/include/ygm/detail/ygm_cereal_archive.hpp +++ b/include/ygm/detail/ygm_cereal_archive.hpp @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include diff --git a/include/ygm/detail/ygm_traits.hpp b/include/ygm/detail/ygm_traits.hpp new file mode 100644 index 00000000..c1ac6c12 --- /dev/null +++ b/include/ygm/detail/ygm_traits.hpp @@ -0,0 +1,66 @@ +// Copyright 2019-2021 Lawrence Livermore National Security, LLC and other YGM +// Project Developers. See the top-level COPYRIGHT file for details. +// +// SPDX-License-Identifier: MIT + +#pragma once +#include + +namespace ygm::detail { + +template +constexpr std::false_type always_false{}; + +namespace detector_detail { +// based on https://en.cppreference.com/w/cpp/experimental/is_detected +template class Op, + class... Args> +struct detector { + using value_t = std::false_type; + using type = Default; +}; + +template class Op, class... Args> +struct detector>, Op, Args...> { + using value_t = std::true_type; + using type = Op; +}; + +struct nonesuch { + nonesuch() = delete; + ~nonesuch() = delete; + nonesuch(nonesuch const&) = delete; + void operator=(nonesuch const&) = delete; +}; + +template