cuda-api-wrappers v0.4 Release Notes
Release Date: 2020-10-14 // over 4 years ago-
Main changes since 0.3.3:
- The runtime API wrappers are now a header-only library.
- Split the NVTX wrappers and the Runtime API wrappers into two separate libraries.
- Added several fundamental types which were implicit in previous versions:
cuda::size_t
,cuda::dimensionality_t
.
Minor API tweaks:
- 📇 Renamed
launch
->enqueue_launch
- ⏱ Can now schedule managed memory region attachment on streams
- Now wrapping
cudaMemAdvise()
advice. - Array copying uses typed pointers
- Added: A
cuda::managed::device_side_pointer_for()
standalone function - ➕ Added: A container facade for the sequence of all devices, so you can now write
for (auto device : cuda::devices() ) { }
. - De-templatized: device setter RAII class
- ➕ Added: a freestanding
cuda::synchronize()
function instead of some wrapper methods - Made some type definitions from inside
device_t
to thedevice::
namespace - ➕ Added: A subclass of
memory::region_t
for managed memory - Using
memory::region_t
in more API functions - Dropped
cuda::kernel::maximum_dynamic_shared_memory_per_block()
. - Centralized the definitions of
take_ownership
anddo_not_take_ownership
- Made
stream_t&
parameters intoconst stream_t&
, almost universally.
🐛 Bug fixes:
- Cross-device waiting on events
- 🛠 Error message fixes
- 0️⃣ Not assuming the
uintNN_t
types are in the default namespace
🏗 Build, compatibility, usability:
- 🛠 Fix support for CMake 3.8 (
CMakeLists.txt
was using some post-3.8 features) - Clang-related:
- Skipping examples which clang++ doesn't support yet (need
- Only enabling separable compilation and CUDA
- const-cast'ing
const void *
kernel function pointers before reinterpretation - clang wont'tt let it - GNU extension dropped when compiling examples with CUDA (clang dioesn't support ths)
- Fixed
std::max()
call issue
- CMake targets depending on the wrappers should now have a C++11 language standard requirement for compilation
- The wrappers now assert C++11 or later is used, instead of letting you just fail somewhere.
Previous changes from v0.3.3
-
🚀 This release includes both significant additions to the coverage by the wrappers, as well as major changes to the existing wrappers API.
Main changes since 0.2.0:
- Forget about numeric handles! The wrapper classes no longer take numeric handles as parameters, in methods exposed to the user. You'll be dealing with
device_t
's,event_t
's,stream_t
's etc. - notdevice::id_t
,device::stream_t
anddevice::event_t
's. - Wrappers classes no longer templated. That means, on one hand, you don't have to worry about the template argument of "do we assume the wrapper's device is the current one?" ; but on the other hand, every use of the wrapper will set the current device (even if it's already the right one). A lot of code was simplified or even remoed thanks to this change.
device_function_t
is now namedkernel_t
, as only kernels are acceptable by the CUDA Runtime API calls mentioning "device functions". Also,kernel_t
's are now a pair of (kernel, device), as the settings which can be made for a kernel are mostly/entirely device-specific.- 🚚 The examples
CMakeLists.txt
has been split off from the mainCMakeFiles.txt
and moved into a subdirectory, removing any dependencies it may have. - Kernel launching now uses perfect forwarding of all parameters.
- 👻 The library is now almost completely header-only. The single exception to this rule is profiling-related code. If you don't use it - the library is header-only for you.
- 🔄 Changed my email address in the code...
Main additions since 0.2.0:
- 👍 2D and 3D Array support.
- 👍 2D and 3D texture support.
- A single
set()
andget()
for all memory spaces.
🛠 Plus a few bug fixes, and another example program from the CUDA samples.
🔄 Changes from 0.3.0:
- 🛠 Fixed: Self-recursion in one of the memory allocation functions.
- 🛠 Fixed: Added missing
inline
specifiers to some functions - White space tweaks
- Forget about numeric handles! The wrapper classes no longer take numeric handles as parameters, in methods exposed to the user. You'll be dealing with