Skip to content

Latest commit

 

History

History
739 lines (631 loc) · 62.4 KB

CHANGELOG.md

File metadata and controls

739 lines (631 loc) · 62.4 KB

Change Log

2.9.00 (2019-06-24)

Full Changelog

Implemented enhancements:

  • Capability: CUDA Streams #1723
  • Capability: CUDA Stream support for parallel_reduce #2061
  • Capability: Feature Request: TeamVectorRange #713
  • Capability: Adding HPX backend #2080
  • Capability: TaskScheduler to have multiple queues #565
  • Capability: Support for additional reductions in ScatterView #1674
  • Capability: Request: deep_copy within parallel regions #689
  • Capability: Feature Request: create\_mirror\_view\_without\_initializing #1765
  • View: Use SFINAE to restrict possible View type conversions #2127
  • Deprecation: Deprecate ExecutionSpace::fence() as static function and make it non-static #2140
  • Deprecation: Deprecate LayoutTileLeft #2122
  • Macros: KOKKOS_RESTRICT defined for non-Intel compilers #2038

Fixed bugs:

  • Cuda: TeamThreadRange loop count on device is passed by reference to host static constexpr #1733
  • Cuda: Build error with relocatable device code with CUDA 10.1 GCC 7.3 #2134
  • Cuda: cudaFuncSetCacheConfig is setting CachePreferShared too often #2066
  • Cuda: TeamPolicy doesn't throw then created with non-viable vector length and also doesn't backscale to viable one #2020
  • Cuda: cudaMemcpy error for large league sizes on V100 #1991
  • Cuda: illegal warp sync in parallel_reduce by functor on Turing 75 #1958
  • TeamThreadRange: Inconsistent results from TeamThreadRange reduction #1905
  • Atomics: atomic_fetch_oper & atomic_oper_fetch don't build for complex<float> #1964
  • Views: Kokkos randomread Views leak memory #2155
  • ScatterView: LayoutLeft overload currently non-functional #2165
  • KNL: With intel 17.2.174 illegal instruction in random number test #2078
  • Bitset: Enable copy constructor on device #2094
  • Examples: do not compile due to template deduction error (multi_fem) #1928

2.8.00 (2019-02-05)

Full Changelog

Implemented enhancements:

  • Capability, Tests: C++14 support and testing #1914
  • Capability: Add environment variables for all command line arguments #1798
  • Capability: --kokkos-ndevices not working for Slurm #1920
  • View: Undefined behavior when deep copying from and to an empty unmanaged view #1967
  • BuildSystem: nvcc_wrapper should stop immediately if nvcc is not in PATH #1861

Fixed bugs:

  • Cuda: Fix Volta Issues 1 Non-deterministic behavior on Volta, runs fine on Pascal #1949
  • Cuda: Fix Volta Issues 2 CUDA Team Scan gives wrong values on Volta with -G compile flag #1942
  • Cuda: illegal warp sync in parallel_reduce by functor on Turing 75 #1958
  • Threads: Pthreads backend does not handle RangePolicy with offset correctly #1976
  • Atomics: atomic_fetch_oper has no case for Kokkos::complex<double> or other 16-byte types #1951
  • MDRangePolicy: Fix zero-length range #1948
  • TeamThreadRange: TeamThreadRange MaxLoc reduce doesnt compile #1909

2.7.24 (2018-11-04)

Full Changelog

Implemented enhancements:

  • DualView: Add non-templated functions for sync, need_sync, view, modify #1858
  • DualView: Avoid needlessly allocates and initializes modify_host and modify_device flag views #1831
  • DualView: Incorrect deduction of "not device type" #1659
  • BuildSystem: Add KOKKOS_ENABLE_CXX14 and KOKKOS_ENABLE_CXX17 #1602
  • BuildSystem: Installed kokkos_generated_settings.cmake contains build directories instead of install directories #1838
  • BuildSystem: KOKKOS_ARCH: add ticks to printout of improper arch setting #1649
  • BuildSystem: Make core/src/Makefile for Cuda use needed nvcc_wrapper #1296
  • Build: Support PGI as host compiler for NVCC #1828
  • Build: Many Warnings Fixed e.g.#1786
  • Capability: OffsetView with non-zero begin index #567
  • Capability: Reductions into device side view #1788
  • Capability: Add max_size to Kokkos::Array #1760
  • Capability: View Assignment: LayoutStride -> LayoutLeft and LayoutStride -> LayoutRight #1594
  • Capability: Atomic function allow implicit conversion of update argument #1571
  • Capability: Add team_size_max with tagged functors #663
  • Capability: Fix allignment of views from Kokkos_ScratchSpace should use different alignment #1700
  • Capabilitiy: create_mirror_view_and_copy for DynRankView #1651
  • Capability: DeepCopy HBWSpace / HostSpace #548
  • ROCm: support team vector scan #1645
  • ROCm: Merge from rocm-hackathon2 #1636
  • ROCm: Add ParallelScanWithTotal #1611
  • ROCm: Implement MDRange in ROCm #1314
  • ROCm: Implement Reducers for Nested Parallelism Levels #963
  • ROCm: Add asynchronous deep copy #959
  • Tests: Memory pool test seems to allocate 8GB #1830
  • Tests: Add unit_test for team_broadcast #734

Fixed bugs:

  • BuildSystem: Makefile.kokkos gets gcc-toolchain wrong if gcc is cached #1841
  • BuildSystem: kokkos_generated_settings.cmake placement is inconsistent #1771
  • BuildSystem: Invalid escape sequence . in kokkos_functions.cmake #1661
  • BuildSystem: Problem in Kokkos generated cmake file #1770
  • BuildSystem: invalid file names on windows #1671
  • Tests: reducers min/max_loc test fails randomly due to multiple min values and thus multiple valid locations #1681
  • Tests: cuda.scatterview unit test causes "Bus error" when force_uvm and enable_lambda are enabled #1852
  • Tests: cuda.cxx11 unit test fails when force_uvm and enable_lambda are enabled #1850
  • Tests: threads.reduce_device_view_range_policy failing with Cuda/8.0.44 and RDC #1836
  • Build: compile error when compiling Kokkos with hwloc 2.0.1 (on OSX 10.12.6, with g++ 7.2.0) #1506
  • Build: dual_view.view broken with UVM #1834
  • Build: White cuda/9.2 + gcc/7.2 warnings triggering errors #1833
  • Build: warning: enum constant in boolean context #1813
  • Capability: Fix overly conservative max_team_size thingy #1808
  • DynRankView: Ctors taking ViewAllocateWithoutInitializing broken #1783
  • Cuda: Apollo cuda.team_broadcast test fail with clang-6.0 #1762
  • Cuda: Clang spurious test failure in impl_view_accessible #1753
  • Cuda: Kokkos::complex<double> atomic deadlocks with Clang 6 Cuda build with -O0 #1752
  • Cuda: LayoutStride Test fails for UVM as default memory space #1688
  • Cuda: Scan wrong values on Volta #1676
  • Cuda: Kokkos::deep_copy error with CudaUVM and Kokkos::Serial spaces #1652
  • Cuda: cudaErrorInvalidConfiguration with debug build #1647
  • Cuda: parallel_for with TeamPolicy::team_size_recommended with launch bounds not working -- reported by Daniel Holladay #1283
  • Cuda: Using KOKKOS_CLASS_LAMBDA in a class with Kokkos::Random_XorShift64_Pool member data #1696
  • Long Build Times on Darwin #1721
  • Capability: Typo in Kokkos_Sort.hpp - BinOp3D - wrong comparison #1720
  • Buffer overflow in SharedAllocationRecord in Kokkos_HostSpace.cpp #1673
  • Serial unit test failure #1632

2.7.00 (2018-05-24)

Full Changelog

Part of the Kokkos C++ Performance Portability Programming EcoSystem 2.7

Implemented enhancements:

  • Deprecate team_size auto adjusting to maximal value possible #1618
  • DynamicView - remove restrictions to std::is_trivial types and value_type is power of two #1586
  • Kokkos::StaticCrsGraph does not propagate memory traits (e.g., Unmanaged) #1581
  • Adding ETI for DeepCopy / ViewFill etc. #1578
  • Deprecate all the left over KOKKOS_HAVE_ Macros and Kokkos_OldMacros.hpp #1572
  • Error if Kokkos_ARCH set in CMake #1555
  • Deprecate ExecSpace::initialize / ExecSpace::finalize #1532
  • New API for TeamPolicy property setting #1531
  • clang 6.0 + cuda debug out-of-memory test failure #1521
  • Cuda UniqueToken interface not consistent with other backends #1505
  • Move Reducers out of Experimental namespace #1494
  • Provide scope guard for initialize/finalize #1479
  • Check Kokkos::is_initialized in SharedAllocationRecord dtor #1465
  • Remove static list of allocations #1464
  • Makefiles: Support single compile/link line use case #1402
  • ThreadVectorRange with a range #1400
  • Exclusive scan + last value API #1358
  • Install kokkos_generated_settings.cmake #1348
  • Kokkos arrays (not views!) don't do bounds checking in debug mode #1342
  • Expose round-robin GPU assignment outside of initialize(int, char**) #1318
  • DynamicView misses use_count and label function #1298
  • View constructor should check arguments #1286
  • False Positive on Oversubscription Warning #1207
  • Allow (require) execution space for 1st arg of VerifyExecutionCanAccessMemorySpace #1192
  • ROCm: Add ROCmHostPinnedSpace #958
  • power of two functions #656
  • CUDA 8 has 64bit __shfl #361
  • Add TriBITS/CMake configure information about node types #243

Fixed bugs:

  • CUDA atomic_fetch_sub for doubles is hitting CAS instead of intrinsic #1624
  • Bug: use of ballot on Volta #1612
  • Kokkos::deep_copy memory access failures #1583
  • g++ -std option doubly set for cmake project #1548
  • ViewFill for 1D Views of larger 32bit entries fails #1541
  • CUDA Volta another warpsync bug #1520
  • triple_nested_parallelism fails with KOKKOS_DEBUG and CUDA #1513
  • Jenkins errors in Kokkos_SharedAlloc.cpp with debug build #1511
  • Kokkos::Sort out-of-bounds with empty bins #1504
  • Get rid of deprecated functions inside Kokkos #1484
  • get_work_partition casts int64_t to int, causing a seg fault #1481
  • NVCC bug with __device__ on defaulted function #1470
  • CMake example broken with CUDA backend #1468

2.6.00 (2018-03-07)

Full Changelog

Part of the Kokkos C++ Performance Portability Programming EcoSystem 2.6

Implemented enhancements:

  • Support NVIDIA Volta microarchitecture #1466
  • Kokkos - Define empty functions when profiling disabled #1424
  • Don't use __constant__ cache for lock arrays, enable once per run update instead of once per call #1385
  • task dag enhancement. #1354
  • Cuda task team collectives and stack size #1353
  • Replace View operator acceptance of more than rank integers with 'access' function #1333
  • Interoperability: Do not shut down backend execution space runtimes upon calling finalize. #1305
  • shmem_size for LayoutStride #1291
  • Kokkos::resize performs poorly on 1D Views #1270
  • stride() is inconsistent with dimension(), extent(), etc. #1214
  • Kokkos::sort defaults to std::sort on host #1208
  • DynamicView with host size grow #1206
  • Unmanaged View with Anonymous Memory Space #1175
  • Sort subset of Kokkos::DynamicView #1160
  • MDRange policy doesn't support lambda reductions #1054
  • Add ability to set hook on Kokkos::finalize #714
  • Atomics with Serial Backend - Default should be Disable? #549
  • KOKKOS_ENABLE_DEPRECATED_CODE #1359

Fixed bugs:

  • cuda_internal_maximum_warp_count returns 8, but I believe it should return 16 for P100 #1269
  • Cuda: level 1 scratch memory bug (reported by Stan Moore) #1434
  • MDRangePolicy Reduction requires value_type typedef in Functor #1379
  • Kokkos DeepCopy between empty views fails #1369
  • Several issues with new CMake build infrastructure (reported by Eric Phipps) #1365
  • deep_copy between rank-1 host/device views of differing layouts without UVM no longer works (reported by Eric Phipps) #1363
  • Profiling can't be disabled in CMake, and a parallel_for is missing for tasks (reported by Kyungjoo Kim) #1349
  • get_work_partition int overflow (reported by berryj5) #1327
  • Kokkos::deep_copy must fence even if the two views are the same #1303
  • CudaUVMSpace::allocate/deallocate must fence #1302
  • ViewResize on CUDA fails in Debug because of too many resources requested #1299
  • Cuda 9 and intrepid2 calls from Panzer. #1183
  • Slowdown due to tracking_enabled() in 2.04.00 (found by Albany app) #1016
  • Bounds checking fails with zero-span Views (reported by Stan Moore) #1411

2.5.00 (2017-12-15)

Full Changelog

Part of the Kokkos C++ Performance Portability Programming EcoSystem 2.5

Implemented enhancements:

  • Provide Makefile.kokkos logic for CMake and TriBITS #878
  • Add Scatter View #825
  • Drop gcc 4.7 and intel 14 from supported compiler list #603
  • Enable construction of unmanaged view using common_view_alloc_prop #1170
  • Unused Function Warning with XL #1267
  • Add memory pool parameter check #1218
  • CUDA9: Fix warning for unsupported long double #1189
  • CUDA9: fix warning on defaulted function marking #1188
  • CUDA9: fix warnings for deprecated warp level functions #1187
  • Add CUDA 9.0 nightly testing #1174
  • {OMPI,MPICH}_CXX hack breaks nvcc_wrapper use case #1166
  • KOKKOS_HAVE_CUDA_LAMBDA became KOKKOS_CUDA_USE_LAMBDA #1274

Fixed bugs:

  • MinMax Reducer with tagged operator doesn't compile #1251
  • Reducers for Tagged operators give wrong answer #1250
  • Kokkos not Compatible with Big Endian Machines? #1235
  • Parallel Scan hangs forever on BG/Q #1234
  • Threads backend doesn't compile with Clang on OS X #1232
  • $(shell date) needs quote #1264
  • Unqualified parallel_for call conflicts with user-defined parallel_for #1219
  • KokkosAlgorithms: CMake issue in unit tests #1212
  • Intel 18 Error: "simd pragma has been deprecated" #1210
  • Memory leak in Kokkos::initialize #1194
  • CUDA9: compiler error with static assert template arguments #1190
  • Kokkos::Serial::is_initialized returns always true #1184
  • Triple nested parallelism still fails on bowman #1093
  • OpenMP openmp.range on Develop Runs Forever on POWER7+ with RHEL7 and GCC4.8.5 #995
  • Rendezvous performance at global scope #985

2.04.11 (2017-10-28)

Full Changelog

Implemented enhancements:

  • Add Subview pattern. #648
  • Add Kokkos "global" is_initialized #1060
  • Add create_mirror_view_and_copy #1161
  • Add KokkosConcepts SpaceAccessibility function #1092
  • Option to Disable Initialize Warnings #1142
  • Mature task-DAG capability #320
  • Promote Work DAG from experimental #1126
  • Implement new WorkGraph push/pop #1108
  • Kokkos_ENABLE_Cuda_Lambda should default ON #1101
  • Add multidimensional parallel for example and improve unit test #1064
  • Fix ROCm: Performance tests not building #1038
  • Make KOKKOS_ALIGN_SIZE a configure-time option #1004
  • Make alignment consistent #809
  • Improve subview construction on Cuda backend #615

Fixed bugs:

  • Kokkos::vector fixes for application #1134
  • DynamicView non-power of two value_type #1177
  • Memory pool bug #1154
  • Cuda launch bounds performance regression bug #1140
  • Significant performance regression in LAMMPS after updating Kokkos #1139
  • CUDA compile error #1128
  • MDRangePolicy neg idx test failure in debug mode #1113
  • subview construction on Cuda backend #615

2.04.04 (2017-09-11)

Full Changelog

Implemented enhancements:

  • OpenMP partition: set number of threads on nested level #1082
  • Add StaticCrsGraph row() method #1071
  • Enhance Kokkos complex operator overloading #1052
  • Tell Trilinos packages about host+device lambda #1019
  • Function markup for defaulted class members #952
  • Add deterministic random number generator #857

Fixed bugs:

  • Fix reduction_identity<T>::max for floating point numbers #1048
  • Fix MD iteration policy ignores lower bound on GPUs #1041
  • (Experimental) HBWSpace Linking issues in KokkosKernels #1094
  • (Experimental) ROCm: algorithms/unit_tests test_sort failing with segfault #1070

2.04.00 (2017-08-16)

Full Changelog

Implemented enhancements:

  • Added ROCm backend to support AMD GPUs
  • Kokkos::complex<T> behaves slightly differently from std::complex<T> #1011
  • Kokkos::Experimental::Crs constructor arguments were in the wrong order #992
  • Work graph construction ease-of-use (one lambda for count and fill) #991
  • when_all returns pointer of futures (improved interface) #990
  • Allow assignment of LayoutLeft to LayoutRight or vice versa for rank-0 Views #594
  • Changed the meaning of Kokkos_ENABLE_CXX11_DISPATCH_LAMBDA #1035

Fixed bugs:

  • memory pool default constructor does not properly set member variables. #1007

2.03.13 (2017-07-27)

Full Changelog

Implemented enhancements:

  • Disallow enabling both OpenMP and Threads in the same executable #406
  • Make Kokkos::OpenMP respect OMP environment even if hwloc is available #630
  • Improve Atomics Performance on KNL/Broadwell where PREFETCHW/RFO is Available #898
  • Kokkos::resize should test whether dimensions have changed before resizing #904
  • Develop performance-regression/acceptance tests #737
  • Make the deep_copy Profiling hook a start/end system #890
  • Add deep_copy Profiling hook #843
  • Append tag name to parallel construct name for Profiling #842
  • Add view label to View bounds error message for CUDA backend #870
  • Disable printing the loaded profiling library #824
  • "Declared but never referenced" warnings #853
  • Warnings about lock_address_cuda_space #852
  • WorkGraph execution policy #771
  • Simplify makefiles by guarding compilation with appropriate KOKKOS_ENABLE_### macros #716
  • Cmake build: wrong include install directory #668
  • Derived View type and allocation #566
  • Fix Compiler warnings when compiling core unit tests for Cuda #214

Fixed bugs:

  • Out-of-bounds read in Kokkos_Layout.hpp #975
  • CudaClang: Fix failing test with Clang 4.0 #941
  • Respawn when memory pool allocation fails (not available memory) #940
  • Memory pool aborts on zero allocation request, returns NULL for < minimum #939
  • Error with TaskScheduler query of underlying memory pool #917
  • Profiling::*Callee static variables declared in header #863
  • calling *Space::name() causes compile error #862
  • bug in Profiling::deallocateData #860
  • task_depend test failing, CUDA 8.0 + Pascal + RDC #829
  • [develop branch] Standalone cmake issues #826
  • Kokkos CUDA failes to compile with OMPI_CXX and MPICH_CXX wrappers #776
  • Task Team reduction on Pascal #767
  • CUDA stack overflow with TaskDAG test #758
  • TeamVector test on Cuda #670
  • Clang 4.0 Cuda Build broken again #560

2.03.05 (2017-05-27)

Full Changelog

Implemented enhancements:

  • Harmonize Custom Reductions over nesting levels #802
  • Prevent users directly including KokkosCore_config.h #815
  • DualView aborts on concurrent host/device modify (in debug mode) #814
  • Abort when running on a NVIDIA CC5.0 or higher architecture with code compiled for CC < 5.0 #813
  • Add "name" function to ExecSpaces #806
  • Allow null Future in task spawn dependences #795
  • Add Unit Tests for Kokkos::complex #785
  • Add pow function for Kokkos::complex #784
  • Square root of a complex #729
  • Command line processing of --threads argument prevents users from having any commandline arguments starting with --threads #760
  • Protected deprecated API with appropriate macro #756
  • Allow task scheduler memory pool to be used by tasks #747
  • View bounds checking on host-side performance: constructing a std::string #723
  • Add check for AppleClang as compiler distinct from check for Clang. #705
  • Uninclude source files for specific configurations to prevent link warning. #701
  • Add --small option to snapshot script #697
  • CMake Standalone Support #674
  • CMake build unit test and install #808
  • CMake: Fix having kokkos as a subdirectory in a pure cmake project #629
  • Tribits macro assumes build directory is in top level source directory #654
  • Use bin/nvcc_wrapper, not config/nvcc_wrapper #562
  • Allow MemoryPool::allocate() to be called from multiple threads per warp. #487
  • Allow MemoryPool::allocate\(\) to be called from multiple threads per warp. #487
  • Move OpenMP 4.5 OpenMPTarget backend into Develop #456
  • Testing on ARM testbed #288

Fixed bugs:

  • Fix label in OpenMP parallel_reduce verify_initialized #834
  • TeamScratch Level 1 on Cuda hangs #820
  • [bug] memory pool. #786
  • Some Reduction Tests fail on Intel 18 with aggressive vectorization on #774
  • Error copying dynamic view on copy of memory pool #773
  • CUDA stack overflow with TaskDAG test #758
  • ThreadVectorRange Customized Reduction Bug #739
  • set_scratch_size overflows #726
  • Get wrong results for compiler checks in Makefile on OS X. #706
  • Fix check if multiple host architectures enabled. #702
  • Threads Backend Does not Pass on Cray Compilers #609
  • Rare bug in memory pool where allocation can finish on superblock in empty state #452
  • LDFLAGS in core/unit_test/Makefile: potential "undefined reference" to pthread lib #148

2.03.00 (2017-04-25)

Full Changelog

Implemented enhancements:

  • UnorderedMap: make it accept Devices or MemorySpaces #711
  • sort to accept DynamicView and [begin,end) indices #691
  • ENABLE Macros should only be used via #ifdef or #if defined #675
  • Remove impl/Kokkos_Synchronic_* #666
  • Turning off IVDEP for Intel 14. #638
  • Using an installed Kokkos in a target application using CMake #633
  • Create Kokkos Bill of Materials #632
  • MDRangePolicy and tagged evaluators #547
  • Add PGI support #289

Fixed bugs:

  • Output from PerTeam fails #733
  • Cuda: architecture flag not added to link line #688
  • Getting large chunks of memory for a thread team in a universal way #664
  • Kokkos RNG normal() function hangs for small seed value #655
  • Kokkos Tests Errors on Shepard/HSW Builds #644

2.02.15 (2017-02-10)

Full Changelog

Implemented enhancements:

  • Containers: Adding block partitioning to StaticCrsGraph #625
  • Kokkos Make System can induce Errors on Cray Volta System #610
  • OpenMP: error out if KOKKOS_HAVE_OPENMP is defined but not _OPENMP #605
  • CMake: fix standalone build with tests #604
  • Change README (that GitHub shows when opening Kokkos project page) to tell users how to submit PRs #597
  • Add correctness testing for all operators of Atomic View #420
  • Allow assignment of Views with compatible memory spaces #290
  • Build only one version of Kokkos library for tests #213
  • Clean out old KOKKOS_HAVE_CXX11 macros clauses #156
  • Harmonize Macro names #150

Fixed bugs:

  • Cray and PGI: Kokkos_Parallel_Reduce #634
  • Kokkos Make System can induce Errors on Cray Volta System #610
  • Normal() function random number generator doesn't give the expected distribution #592

2.02.07 (2016-12-16)

Full Changelog

Implemented enhancements:

  • Add CMake option to enable Cuda Lambda support #589
  • Add CMake option to enable Cuda RDC support #588
  • Add Initial Intel Sky Lake Xeon-HPC Compiler Support to Kokkos Make System #584
  • Building Tutorial Examples #582
  • Internal way for using ThreadVectorRange without TeamHandle #574
  • Testing: Add testing for uvm and rdc #571
  • Profiling: Add Memory Tracing and Region Markers #557
  • nvcc_wrapper not installed with Kokkos built with CUDA through CMake #543
  • Improve DynRankView debug check #541
  • Benchmarks: Add Gather benchmark #536
  • Testing: add spot_check option to test_all_sandia #535
  • Deprecate Kokkos::Impl::VerifyExecutionCanAccessMemorySpace #527
  • Add AtomicAdd support for 64bit float for Pascal #522
  • Add Restrict and Aligned memory trait #517
  • Kokkos Tests are Not Run using Compiler Optimization #501
  • Add support for clang 3.7 w/ openmp backend #393
  • Provide an error throw class #79

Fixed bugs:

  • Cuda UVM Allocation test broken with UVM as default space #586
  • Bug (develop branch only): multiple tests are now failing when forcing uvm usage. #570
  • Error in generate_makefile.sh for Kokkos when Compiler is Empty String/Fails #568
  • XL 13.1.4 incorrect C++11 flag #553
  • Improve DynRankView debug check #541
  • Installing Library on MAC broken due to cp -u #539
  • Intel Nightly Testing with Debug enabled fails #534

2.02.01 (2016-11-01)

Full Changelog

Implemented enhancements:

  • Add Changelog generation to our process. #506

Fixed bugs:

  • Test scratch_request fails in Serial with Debug enabled #520
  • Bug In BoundsCheck for DynRankView #516

2.02.00 (2016-10-30)

Full Changelog

Implemented enhancements:

  • Add PowerPC assembly for grabbing clock register in memory pool #511
  • Add GCC 6.x support #508
  • Test install and build against installed library #498
  • Makefile.kokkos adds expt-extended-lambda to cuda build with clang #490
  • Add top-level makefile option to just test kokkos-core unit-test #485
  • Split and harmonize Object Files of Core UnitTests to increase build parallelism #484
  • LayoutLeft to LayoutLeft subview for 3D and 4D views #473
  • Add official Cuda 8.0 support #468
  • Allow C++1Z Flag for Class Lambda capture #465
  • Add Clang 4.0+ compilation of Cuda code #455
  • Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch #445
  • Add name of view to "View bounds error" #432
  • Move Sort Binning Operators into Kokkos namespace #421
  • TaskPolicy - generate error when attempt to use uninitialized #396
  • Import WithoutInitializing and AllowPadding into Kokkos namespace #325
  • TeamThreadRange requires begin, end to be the same type #305
  • CudaUVMSpace should track # allocations, due to CUDA limit on # UVM allocations #300
  • Remove old View and its infrastructure #259

Fixed bugs:

  • Bug in TestCuda_Other.cpp: most likely assembly inserted into Device code #515
  • Cuda Compute Capability check of GPU is outdated #509
  • multi_scratch test with hwloc and pthreads seg-faults. #504
  • generate_makefile.bash: "make install" is broken #503
  • make clean in Out of Source Build/Tests Does Not Work Correctly #502
  • Makefiles for test and examples have issues in Cuda when CXX is not explicitly specified #497
  • Dispatch lambda test directly inside GTEST macro doesn't work with nvcc #491
  • UnitTests with HWLOC enabled fail if run with mpirun bound to a single core #489
  • Failing Reducer Test on Mac with Pthreads #479
  • make test Dumps Error with Clang Not Found #471
  • OpenMP TeamPolicy member broadcast not using correct volatile shared variable #424
  • TaskPolicy - generate error when attempt to use uninitialized #396
  • New task policy implementation is pulling in old experimental code. #372
  • MemoryPool unit test hangs on Power8 with GCC 6.1.0 #298

2.01.10 (2016-09-27)

Full Changelog

Implemented enhancements:

  • Enable Profiling by default in Tribits build #438
  • parallel_reduce(0), parallel_scan(0) unit tests #436
  • data()==NULL after realloc with LayoutStride #351
  • Fix tutorials to track new Kokkos::View #323
  • Rename team policy set_scratch_size. #195

Fixed bugs:

  • Possible Issue with Intel 17.0.098 and GCC 6.1.0 in Develop Branch #445
  • Makefile spits syntax error #435
  • Kokkos::sort fails for view with all the same values #422
  • Generic Reducers: can't accept inline constructed reducer #404
  • data\(\)==NULL after realloc with LayoutStride #351
  • const subview of const view with compile time dimensions on Cuda backend #310
  • Kokkos (in Trilinos) Causes Internal Compiler Error on CUDA 8.0.21-EA on POWER8 #307
  • Core Oversubscription Detection Broken? #159

2.01.06 (2016-09-02)

Full Changelog

Implemented enhancements:

  • Add "standard" reducers for lambda-supportable customized reduce #411
  • TaskPolicy - single thread back-end execution #390
  • Kokkos master clone tag #387
  • Query memory requirements from task policy #378
  • Output order of test_atomic.cpp is confusing #373
  • Missing testing for atomics #341
  • Feature request for Kokkos to provide Kokkos::atomic_fetch_max and atomic_fetch_min #336
  • TaskPolicy<Cuda> performance requires teams mapped to warps #218

Fixed bugs:

  • Reduce with Teams broken for custom initialize #407
  • Failing Kokkos build on Debian #402
  • Failing Tests on NVIDIA Pascal GPUs #398
  • Algorithms: fill_random assumes dimensions fit in unsigned int #389
  • Kokkos::subview with RandomAccess Memory Trait #385
  • Build warning (signed / unsigned comparison) in Cuda implementation #365
  • wrong results for a parallel_reduce with CUDA8 / Maxwell50 #352
  • Hierarchical parallelism - 3 level unit test #344
  • Can I allocate a View w/ both WithoutInitializing & AllowPadding? #324
  • subview View layout determination #309
  • Unit tests with Cuda - Maxwell #196

2.01.00 (2016-07-21)

Full Changelog

Implemented enhancements:

  • Edit ViewMapping so assigning Views with the same custom layout compiles when const casting #327
  • DynRankView: Performance improvement for operator() #321
  • Interoperability between static and dynamic rank views #295
  • subview member function ? #280
  • Inter-operatibility between View and DynRankView. #245
  • (Trilinos) build warning in atomic_assign, with Kokkos::complex #177
  • View<>::shmem_size should runtime check for number of arguments equal to rank #176
  • Custom reduction join via lambda argument #99
  • DynRankView with 0 dimensions passed in at construction #293
  • Inject view_alloc and friends into Kokkos namespace #292
  • Less restrictive TeamPolicy reduction on Cuda #286
  • deep_copy using remap with source execution space #267
  • Suggestion: Enable opt-in L1 caching via nvcc-wrapper #261
  • More flexible create_mirror functions #260
  • Rename View::memory_span to View::required_allocation_size #256
  • Use of subviews and views with compile-time dimensions #237
  • Use of subviews and views with compile-time dimensions #237
  • Kokkos::Timer #234
  • Fence CudaUVMSpace allocations #230
  • View::operator() accept std::is_integral and std::is_enum #227
  • Allocating zero size View #216
  • Thread scalable memory pool #212
  • Add a way to disable memory leak output #194
  • Kokkos exec space init should init Kokkos profiling #192
  • Runtime rank wrapper for View #189
  • Profiling Interface #158
  • Fix View assignment (of managed to unmanaged) #153
  • Add unit test for assignment of managed View to unmanaged View #152
  • Check for oversubscription of threads with MPI in Kokkos::initialize #149
  • Dynamic resizeable 1dimensional view #143
  • Develop TaskPolicy for CUDA #142
  • New View : Test Compilation Downstream #138
  • New View Implementation #135
  • Add variant of subview that lets users add traits #134
  • NVCC-WRAPPER: Add --host-only flag #121
  • Address gtest issue with TriBITS Kokkos build outside of Trilinos #117
  • Make tests pass with -expt-extended-lambda on CUDA #108
  • Dynamic scheduling for parallel_for and parallel_reduce #106
  • Runtime or compile time error when reduce functor's join is not properly specified as const member function or with volatile arguments #105
  • Error out when the number of threads is modified after kokkos is initialized #104
  • Porting to POWER and remove assumption of X86 default #103
  • Dynamic scheduling option for RangePolicy #100
  • SharedMemory Support for Lambdas #81
  • Recommended TeamSize for Lambdas #80
  • Add Aggressive Vectorization Compilation mode #72
  • Dynamic scheduling team execution policy #53
  • UVM allocations in multi-GPU systems #50
  • Synchronic in Kokkos::Impl #44
  • index and dimension types in for loops #28
  • Subview assign of 1D Strided with stride 1 to LayoutLeft/Right #1

Fixed bugs:

  • misspelled variable name in Kokkos_Atomic_Fetch + missing unit tests #340
  • seg fault Kokkos::Impl::CudaInternal::print_configuration #338
  • Clang compiler error with named parallel_reduce, tags, and TeamPolicy. #335
  • Shared Memory Allocation Error at parallel_reduce #311
  • DynRankView: Fix resize and realloc #303
  • Scratch memory and dynamic scheduling #279
  • MemoryPool infinite loop when out of memory #312
  • Kokkos DynRankView changes break Sacado and Panzer #299
  • MemoryPool fails to compile on non-cuda non-x86 #297
  • Random Number Generator Fix #296
  • View template parameter ordering Bug #282
  • Serial task policy broken. #281
  • deep_copy with LayoutStride should not memcpy #262
  • DualView::need_sync should be a const method #248
  • Arbitrary-sized atomics on GPUs broken; loop forever #238
  • boolean reduction value_type changes answer #225
  • Custom init() function for parallel_reduce with array value_type #210
  • unit_test Makefile is Broken - Recursively Calls itself until Machine Apocalypse. #202
  • nvcc_wrapper Does Not Support -Xcompiler <compiler option> #198
  • Kokkos exec space init should init Kokkos profiling #192
  • Kokkos Threads Backend impl_shared_alloc Broken on Intel 16.1 (Shepard Haswell) #186
  • pthread back end hangs if used uninitialized #182
  • parallel_reduce of size 0, not calling init/join #175
  • Bug in Threads with OpenMP enabled #173
  • KokkosExp_SharedAlloc, m_team_work_index inaccessible #166
  • 128-bit CAS without Assembly Broken? #161
  • fatal error: Cuda/Kokkos_Cuda_abort.hpp: No such file or directory #157
  • Power8: Fix OpenMP backend #139
  • Data race in Kokkos OpenMP initialization #131
  • parallel_launch_local_memory and cuda 7.5 #125
  • Resize can fail with Cuda due to asynchronous dispatch #119
  • Qthread taskpolicy initialization bug. #92
  • Windows: sys/mman.h #89
  • Windows: atomic_fetch_sub() #88
  • Windows: snprintf #87
  • Parallel_Reduce with TeamPolicy and league size of 0 returns garbage #85
  • Throw with Cuda when using (2D) team_policy parallel_reduce with less than a warp size #76
  • Scalar views don't work with Kokkos::Atomic memory trait #69
  • Reduce the number of threads per team for Cuda #63
  • Named Kernels fail for reductions with CUDA #60
  • Kokkos View dimension_() for long returning unsigned int #20
  • atomic test hangs with LLVM #6
  • OpenMP Test should set omp_set_num_threads to 1 #4

Closed issues:

  • develop branch broken with CUDA 8 and --expt-extended-lambda #354
  • --arch=KNL with Intel 2016 build failure #349
  • Error building with Cuda when passing -DKOKKOS_CUDA_USE_LAMBDA to generate_makefile.bash #343
  • Can I safely use int indices in a 2-D View with capacity > 2B? #318
  • Kokkos::ViewAllocateWithoutInitializing is not working #317
  • Intel build on Mac OS X #277
  • deleted #271
  • Broken Mira build #268
  • 32-bit build #246
  • parallel_reduce with RDC crashes linker #232
  • build of Kokkos_Sparse_MV_impl_spmv_Serial.cpp.o fails if you use nvcc and have cuda disabled #209
  • Kokkos Serial execution space is not tested with TeamPolicy. #207
  • Unit test failure on Hansen KokkosCore_UnitTest_Cuda_MPI_1 #200
  • nvcc compiler warning: calling a __host__ function from a __host__ __device__ function is not allowed #180
  • Intel 15 build error with defaulted "move" operators #171
  • missing libkokkos.a during Trilinos 12.4.2 build, yet other libkokkos*.a libs are there #165
  • Tie atomic updates to execution space or even to thread team? (speculation) #144
  • New View: Compiletime/size Test #137
  • New View : Performance Test #136
  • Signed/unsigned comparison warning in CUDA parallel #130
  • Kokkos::complex: Need op* w/ std::complex & real #126
  • Use uintptr_t for casting pointers #110
  • Default thread mapping behavior between P and Q threads. #91
  • Windows: Atomic_Fetch_Exchange() return type #90
  • Synchronic unit test is way too long #84
  • nvcc_wrapper -> $(NVCC_WRAPPER) #42
  • Check compiler version and print helpful message #39
  • Kokkos shared memory on Cuda uses a lot of registers #31
  • Can not pass unit test cuda.space without a GT 720 #25
  • Makefile.kokkos lacks bounds checking option that CMake has #24
  • Kokkos can not complete unit tests with CUDA UVM enabled #23
  • Simplify teams + shared memory histogram example to remove vectorization #21
  • Kokkos needs to rever to ${PROJECT_NAME}_ENABLE_CXX11 not Trilinos_ENABLE_CXX11 #17
  • Kokkos Base Makefile adds AVX to KNC Build #16
  • MS Visual Studio 2013 Build Errors #9
  • subview(X, ALL(), j) for 2-D LayoutRight View X: should it view a column? #5

End_C++98 (2015-04-15)

* This Change Log was automatically generated by github_changelog_generator