ArrayFire v3.6.3 Release Notes

Release Date: 2019-04-22 // almost 5 years ago
  • v3.6.3

    The source code with sub-modules can be downloaded directly from the following link:

    http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.3.tar.bz2

    ๐Ÿ‘Œ Improvements

    • Graphics are now a runtime dependency instead of a link time dependency #2365
    • โฌ‡๏ธ Reduce the CUDA backend binary size using runtime compilation of kernels #2437
    • Improved batched matrix multiplication on the CPU backend by using Intel MKL's cblas_Xgemm_batched#2206
    • Print JIT kernels to disk or stream using the AF_JIT_KERNEL_TRACE environment variable #2404
    • void* pointers are now allowed as arguments to af::array::write() #2367
    • Slightly improve the efficiency of JITed tile operations #2472
    • ๐Ÿ‘‰ Make the random number generation on the CPU backend to be consistent with CUDA and OpenCL #2435
    • ๐Ÿ– Handled very large JIT tree generations #2484 #2487

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fixed af::array::array_proxy move assignment operator #2479
    • ๐Ÿ›  Fixed input array dimensions validation in svdInplace() #2331
    • ๐Ÿ›  Fixed the typedef declaration for window resource handle #2357.
    • Increase compatibility with GCC 8 #2379
    • ๐Ÿ›  Fixed af::write tests #2380
    • ๐Ÿ›  Fixed a bug in broadcast step of 1D exclusive scan #2366
    • ๐Ÿ›  Fixed OpenGL related build errors on OSX #2382
    • ๐Ÿ›  Fixed multiple array evaluation. Performance improvement. #2384
    • ๐Ÿ›  Fixed buffer overflow and expected output of kNN SSD small test #2445
    • ๐Ÿ›  Fixed MKL linking order to enable threaded BLAS #2444
    • โž• Added validations for forge module plugin availability before calling resource cleanup #2443
    • Improve compatibility on MSVC toolchain(_MSC_VER > 1914) with the CUDA backend #2443
    • ๐Ÿ›  Fixed BLAS gemm func generators for newest MSVC 19 on VS 2017 #2464
    • ๐Ÿ›  Fix errors on exits when using the cuda backend with unified #2470

    ๐Ÿ“š Documentation

    • ๐Ÿ›  Updated svdInplace() documentation following a bugfix #2331
    • ๐Ÿ›  Fixed a typo in matrix multiplication documentation #2358
    • ๐Ÿ›  Fixed a code snippet demonstrating C-API use #2406
    • โšก๏ธ Updated hamming matcher implementation limitation #2434
    • โž• Added illustration for the rotate function #2453

    Misc

    • ๐Ÿ‘‰ Use cudaMemcpyAsync instead of cudaMemcpy throughout the codebase #2362
    • Display a more informative error message if CUDA driver is incompatible #2421 #2448
    • ๐Ÿ”„ Changed forge resource management to use smart pointers #2452
    • ๐Ÿ—„ Deprecated intl and uintl typedefs in API #2360
    • ๐Ÿ— Enabled graphics by default for all builds starting with v3.6.3 #2365
    • ๐Ÿ›  Fixed several warnings #2344 #2356 #2361
    • ๐Ÿ”จ Refactored initArray() calls to use createEmptyArray(). initArray() is for internal use only by Array class. #2361
    • ๐Ÿ”จ Refactored void* memory allocations to use unsigned char type #2459
    • ๐Ÿ—„ Replaced deprecated MKL API with in-house implementations for sparse to sparse/dense conversions #2312
    • ๐Ÿ›  Reorganized and fixed some internal backend API #2356
    • โšก๏ธ Updated compilation order of CUDA files to speed up compile time #2368
    • โœ‚ Removed conditional graphics support builds after enabling runtime loading of graphics dependencies #2365
    • Marked graphics dependencies as optional in CPack RPM config #2365
    • ๐Ÿ”จ Refactored a sparse arithmetic backend API #2379
    • Fixed const correctness of af_device_array API #2396
    • โšก๏ธ Update Forge to v1.0.4 #2466
    • Manage Forge resources from the DeviceManager class #2381
    • ๐Ÿ›  Fixed non-mkl & non-batch blas upstream call arguments #2401
    • ๐Ÿ”— Link MKL with OpenMP instead of TBB by default
    • ๐Ÿ‘‰ use clang-format to format source code

    Contributions

    Special thanks to our contributors:
    Alessandro Bessi
    zhihaoy
    Jacob Khan
    William Tambellini