cuda-api-wrappers latest version

« Changelog History

cuda-api-wrappers v0.4 Release Notes

Release Date: 2020-10-14 // over 3 years ago

Main changes since 0.3.3:
- The runtime API wrappers are now a header-only library.
- Split the NVTX wrappers and the Runtime API wrappers into two separate libraries.
- Added several fundamental types which were implicit in previous versions: cuda::size_t, cuda::dimensionality_t.
Minor API tweaks:
- 📇 Renamed launch -> enqueue_launch
- ⏱ Can now schedule managed memory region attachment on streams
- Now wrapping cudaMemAdvise() advice.
- Array copying uses typed pointers
- Added: A cuda::managed::device_side_pointer_for() standalone function
- ➕ Added: A container facade for the sequence of all devices, so you can now write for (auto device : cuda::devices() ) { }.
- De-templatized: device setter RAII class
- ➕ Added: a freestanding cuda::synchronize() function instead of some wrapper methods
- Made some type definitions from inside device_t to the device:: namespace
- ➕ Added: A subclass of memory::region_t for managed memory
- Using memory::region_t in more API functions
- Dropped cuda::kernel::maximum_dynamic_shared_memory_per_block().
- Centralized the definitions of take_ownership and do_not_take_ownership
- Made stream_t& parameters into const stream_t&, almost universally.
🐛 Bug fixes:
- Cross-device waiting on events
- 🛠 Error message fixes
- 0️⃣ Not assuming the uintNN_t types are in the default namespace
🏗 Build, compatibility, usability:
- 🛠 Fix support for CMake 3.8 (CMakeLists.txt was using some post-3.8 features)
- Clang-related:
  - Skipping examples which clang++ doesn't support yet (need
  - Only enabling separable compilation and CUDA
  - const-cast'ing const void * kernel function pointers before reinterpretation - clang wont'tt let it
  - GNU extension dropped when compiling examples with CUDA (clang dioesn't support ths)
  - Fixed std::max() call issue
- CMake targets depending on the wrappers should now have a C++11 language standard requirement for compilation
- The wrappers now assert C++11 or later is used, instead of letting you just fail somewhere.

Previous changes from v0.3.3

🚀 This release includes both significant additions to the coverage by the wrappers, as well as major changes to the existing wrappers API.

Main changes since 0.2.0:
- Forget about numeric handles! The wrapper classes no longer take numeric handles as parameters, in methods exposed to the user. You'll be dealing with device_t's, event_t's, stream_t's etc. - not device::id_t, device::stream_t and device::event_t's.
- Wrappers classes no longer templated. That means, on one hand, you don't have to worry about the template argument of "do we assume the wrapper's device is the current one?" ; but on the other hand, every use of the wrapper will set the current device (even if it's already the right one). A lot of code was simplified or even remoed thanks to this change.
- device_function_t is now named kernel_t , as only kernels are acceptable by the CUDA Runtime API calls mentioning "device functions". Also, kernel_t's are now a pair of (kernel, device), as the settings which can be made for a kernel are mostly/entirely device-specific.
- 🚚 The examples CMakeLists.txt has been split off from the main CMakeFiles.txt and moved into a subdirectory, removing any dependencies it may have.
- Kernel launching now uses perfect forwarding of all parameters.
- 👻 The library is now almost completely header-only. The single exception to this rule is profiling-related code. If you don't use it - the library is header-only for you.
- 🔄 Changed my email address in the code...
Main additions since 0.2.0:
- 👍 2D and 3D Array support.
- 👍 2D and 3D texture support.
- A single set() and get() for all memory spaces.
🛠 Plus a few bug fixes, and another example program from the CUDA samples.

🔄 Changes from 0.3.0:
- 🛠 Fixed: Self-recursion in one of the memory allocation functions.
- 🛠 Fixed: Added missing inline specifiers to some functions
- White space tweaks