ArrayFire v3.5.0 Release Notes

Release Date: 2017-06-23 // almost 7 years ago
  • v3.5.0

    The source code with submodules can be downloaded directly from the following link:
    http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.0.tar.bz2

    Installer CUDA Version: 8.0 (Required)
    Installer OpenCL Version: 1.2 (Minimum)

    โšก๏ธ Major Updates

    • ๐Ÿ‘ ArrayFire now supports threaded applications. 1
    • โž• Added Canny edge detector. 1
    • โž• Added Sparse-Dense arithmetic operations. 1

    ๐Ÿ”‹ Features

    • ArrayFire Threading
      • af::array can be read by multiple threads
      • All ArrayFire functions can be executed concurrently by multiple threads
      • Threads can operate on different devices to simplify Muli-device workloads
    • ๐Ÿ†• New Canny edge detector function, af::canny(). 1
      • Can automatically calculate high threshold with AF_CANNY_THRESHOLD_AUTO_OTSU
      • Supports both L1 and L2 Norms to calculate gradients
    • ๐Ÿ†• New tuned OpenCL BLAS backend, CLBlast.

    ๐Ÿ‘Œ Improvements

    • ๐Ÿ“„ Converted CUDA JIT to use NVRTC instead of NVVM.
    • ๐ŸŽ Performance improvements in af::reorder(). 1
    • ๐ŸŽ Performance improvements in array::scalar(). 1
    • ๐Ÿ‘Œ Improved unified backend performance. 1
    • ArrayFire now depends on Forge v1.0. 1
    • Can now specify the FFT plan cache size using the af::setFFTPlanCacheSize() function.
    • Get the number of physical bytes allocated by the memory manager af_get_allocated_bytes(). 1
    • af::dot() can now return a scalar value to the host. 1

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fixed improper release of default Mersenne random engine. 1
    • ๐Ÿ›  Fixed af::randu() and af::randn() ranges for floating point types. 1
    • ๐Ÿ›  Fixed assignment bug in CPU backend. 1
    • ๐Ÿ›  Fixed complex (c32,c64) multiplication in OpenCL convolution kernels. 1
    • Fixed inconsistent behavior with af::replace() and replace_scalar(). 1
    • Fixed memory leak in af_fir(). 1
    • ๐Ÿ“œ Fixed memory leaks in af_cast for sparse arrays. 1
    • Fixing correctness of af_pow for complex numbers by using Cartesian form. 1
    • Corrected af::select() with indexing in CUDA and OpenCL backends. 1
    • โ†ช Workaround for VS2015 compiler ternary bug. 1
    • ๐Ÿ›  Fixed memory corruption in cuda::findPlan(). 1
    • Argument checks in af_create_sparse_array avoids inputs of type int64. 1

    ๐Ÿ— Build fixes

    • On OSX, utilize new GLFW package from the brew package manager. 1 2
    • ๐Ÿ›  Fixed CUDA PTX names generated by CMake v3.7. 1
    • ๐Ÿ‘Œ Support gcc > 5.x for CUDA. 1

    Examples

    • ๐Ÿ†• New genetic algorithm example. 1

    ๐Ÿ“š Documentation

    • โšก๏ธ Updated README.md to improve readability and formatting. 1
    • โšก๏ธ Updated README.md to mention Julia and Nim wrappers. 1
    • ๐Ÿ‘Œ Improved installation instructions - docs/pages/install.md. 1

    Miscellaneous

    • ๐Ÿ‘ A few improvements for ROCm support. 1
    • โœ‚ Removed CUDA 6.5 support. 1

    Known issues

    • ๐Ÿ Windows
      • The Windows NVIDIA driver version 37x.xx contains a bug which causes fftconvolve_opencl to fail. Upgrade or downgrade to a different version of the driver to avoid this failure.
      • The following tests fail on Windows with NVIDIA hardware: threading_cuda,qr_dense_opencl, solve_dense_opencl.
    • ๐ŸŽ macOS
      • The Accelerate framework, used by the CPU backend on macOS, leverages Intel graphics cards (Iris) when there are no discrete GPUs available. This OpenCL implementation is known to give incorrect results on the following tests: lu_dense_{cpu,opencl}, solve_dense_{cpu,opencl}, inverse_dense_{cpu,opencl}.
      • Certain tests intermittently fail on macOS with NVIDIA GPUs apparently due to inconsistent driver behavior: fft_large_cuda and svd_dense_cuda.
      • The following tests are currently failing on macOS with AMD GPUs: cholesky_dense_opencl and scan_by_key_opencl.