cuda-api-wrappers v0.4 Release Notes

Release Date: 2020-10-14 // over 3 years ago
  • Main changes since 0.3.3:

    • The runtime API wrappers are now a header-only library.
    • Split the NVTX wrappers and the Runtime API wrappers into two separate libraries.
    • Added several fundamental types which were implicit in previous versions: cuda::size_t, cuda::dimensionality_t.

    Minor API tweaks:

    • 📇 Renamed launch -> enqueue_launch
    • ⏱ Can now schedule managed memory region attachment on streams
    • Now wrapping cudaMemAdvise() advice.
    • Array copying uses typed pointers
    • Added: A cuda::managed::device_side_pointer_for() standalone function
    • ➕ Added: A container facade for the sequence of all devices, so you can now write for (auto device : cuda::devices() ) { }.
    • De-templatized: device setter RAII class
    • ➕ Added: a freestanding cuda::synchronize() function instead of some wrapper methods
    • Made some type definitions from inside device_t to the device:: namespace
    • ➕ Added: A subclass of memory::region_t for managed memory
    • Using memory::region_t in more API functions
    • Dropped cuda::kernel::maximum_dynamic_shared_memory_per_block().
    • Centralized the definitions of take_ownership and do_not_take_ownership
    • Made stream_t& parameters into const stream_t&, almost universally.

    🐛 Bug fixes:

    • Cross-device waiting on events
    • 🛠 Error message fixes
    • 0️⃣ Not assuming the uintNN_t types are in the default namespace

    🏗 Build, compatibility, usability:

    • 🛠 Fix support for CMake 3.8 (CMakeLists.txt was using some post-3.8 features)
    • Clang-related:
      • Skipping examples which clang++ doesn't support yet (need
      • Only enabling separable compilation and CUDA
      • const-cast'ing const void * kernel function pointers before reinterpretation - clang wont'tt let it
      • GNU extension dropped when compiling examples with CUDA (clang dioesn't support ths)
      • Fixed std::max() call issue
    • CMake targets depending on the wrappers should now have a C++11 language standard requirement for compilation
    • The wrappers now assert C++11 or later is used, instead of letting you just fail somewhere.

Previous changes from v0.3.3

  • 🚀 This release includes both significant additions to the coverage by the wrappers, as well as major changes to the existing wrappers API.

    Main changes since 0.2.0:

    • Forget about numeric handles! The wrapper classes no longer take numeric handles as parameters, in methods exposed to the user. You'll be dealing with device_t's, event_t's, stream_t's etc. - not device::id_t, device::stream_t and device::event_t's.
    • Wrappers classes no longer templated. That means, on one hand, you don't have to worry about the template argument of "do we assume the wrapper's device is the current one?" ; but on the other hand, every use of the wrapper will set the current device (even if it's already the right one). A lot of code was simplified or even remoed thanks to this change.
    • device_function_t is now named kernel_t , as only kernels are acceptable by the CUDA Runtime API calls mentioning "device functions". Also, kernel_t's are now a pair of (kernel, device), as the settings which can be made for a kernel are mostly/entirely device-specific.
    • 🚚 The examples CMakeLists.txt has been split off from the main CMakeFiles.txt and moved into a subdirectory, removing any dependencies it may have.
    • Kernel launching now uses perfect forwarding of all parameters.
    • 👻 The library is now almost completely header-only. The single exception to this rule is profiling-related code. If you don't use it - the library is header-only for you.
    • 🔄 Changed my email address in the code...

    Main additions since 0.2.0:

    • 👍 2D and 3D Array support.
    • 👍 2D and 3D texture support.
    • A single set() and get() for all memory spaces.

    🛠 Plus a few bug fixes, and another example program from the CUDA samples.

    🔄 Changes from 0.3.0:

    • 🛠 Fixed: Self-recursion in one of the memory allocation functions.
    • 🛠 Fixed: Added missing inline specifiers to some functions
    • White space tweaks