ArrayFire v3.4.1 Release Notes

Release Date: 2016-10-15 // over 7 years ago
  • v3.4.1

    The source code with submodules can be downloaded directly from the following link:
    http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.1.tar.bz2

    Installer CUDA Version: 8.0 (Required)
    Installer OpenCL Version: 1.2 (Minimum)

    Installers

    • ๐Ÿง Installers for Linux, OS X and Windows
      • CUDA backend now uses CUDA 8.0.
      • Uses Intel MKL 2017.
      • CUDA Compute 2.x (Fermi) is no longer compiled into the library.
    • Installer for OS X
      • The libraries shipping in the OS X Installer are now compiled with Apple
        Clang v7.3.1 (previouly v6.1.0).
      • The OS X version used is 10.11.6 (previously 10.10.5).
    • Installer for Jetson TX1 / Tegra X1
      • Requires JetPack for L4T 2.3
        ๐Ÿง (containing Linux for Tegra r24.2 for TX1).
      • CUDA backend now uses CUDA 8.0 64-bit.
      • Using CUDA's cusolver instead of CPU fallback.
      • Uses OpenBLAS for CPU BLAS.
      • All ArrayFire libraries are now 64-bit.

    ๐Ÿ‘Œ Improvements

    • โž• Add sparse array support to af::eval().
      1
    • โž• Add OpenCL-CPU fallback support for sparse af::matmul() when running on
      ๐Ÿ“œ a unified memory device. Uses MKL Sparse BLAS.
    • When using CUDA libdevice, pick the correct compute version based on device.
      1
    • ๐Ÿ‘ OpenCL FFT now also supports prime factors 7, 11 and 13.
      1
      2

    ๐Ÿ› Bug Fixes

    • ๐Ÿ‘ Allow CUDA libdevice to be detected from custom directory.
    • ๐Ÿ›  Fix aarch64 detection on Jetson TX1 64-bit OS.
      1
    • Add missing definition of af_set_fft_plan_cache_size in unified backend.
      1
    • ๐Ÿ›  Fix intial values for af::min() and af::max() operations.
      1
      2
    • ๐Ÿ›  Fix distance calculation in af::nearestNeighbour for CUDA and OpenCL backend.
      1
      2
    • ๐Ÿ›  Fix OpenCL bug where scalars where are passed incorrectly to compile options.
      1
    • ๐Ÿ›  Fix bug in af::Window::surface() with respect to dimensions and ranges.
      1
    • Fix possible double free corruption in af_assign_seq().
      1
    • โž• Add missing eval for key in af::scanByKey in CPU backend.
      1
    • Fixed creation of sparse values array using AF_STORAGE_COO.
      1
      1

    Examples

    • โž• Add a Conjugate Gradient solver example
      ๐Ÿ“œ to demonstrate sparse and dense matrix operations.
      1

    CUDA Backend

    • When using CUDA 8.0,
      0๏ธโƒฃ compute 2.x are no longer in default compute list.
      • This follows CUDA 8.0
        ๐Ÿ—„ deprecating computes 2.x.
      • Default computes for CUDA 8.0 will be 30, 50, 60.
    • 0๏ธโƒฃ When using CUDA pre-8.0, the default selection remains 20, 30, 50.
    • 0๏ธโƒฃ CUDA backend now uses -arch=sm_30 for PTX compilation as default.
      • Unless compute 2.0 is enabled.

    Known Issues

    • af::lu() on CPU is known to give incorrect results when built run on
      OS X 10.11 or 10.12 and compiled with Accelerate Framework.
      1
      • Since the OS X Installer libraries uses MKL rather than Accelerate
        Framework, this issue does not affect those libraries.