ArrayFire/CHANGELOG and ArrayFire Releases

All Versions

Latest Version

3.8.rc

Avg Release Cycle

118 days

Latest Release

1300 days ago

Changelog History

Page 2

v3.5.1 Changes
September 19, 2017
v3.5.1

The source code with submodules can be downloaded directly from the following
link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.1.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

👌 Improvements
- 😌 Relaxed af::unwrap() function's arguments. 1
- 🔄 Changed behavior of af::array::allocated() to specify memory allocated. 1
- ✂ Removed restriction on the number of bins for af::histogram() on CUDA and
  OpenCL kernels. 1
🐎 Performance
- 👌 Improved JIT performance. 1
- 👌 Improved CPU element-wise operation performance. 1
- 👌 Improved regions performance using texture objects. 1
🐛 Bug fixes
- 🛠 Fixed overflow issues in mean. 1
- 🛠 Fixed memory leak when chaining indexing operations. 1
- 🛠 Fixed bug in array assignment when using an empty array to index. 1
- 🛠 Fixed bug with af::matmul() which occured when its RHS argument was an
  indexed vector. 1
- 🛠 Fixed bug deadlock bug when sparse array was used with a JIT Array. 1
- 🛠 Fixed pixel tests for FAST kernels. 1
- 🛠 Fixed af::replace so that it is now copy-on-write. 1
- 🛠 Fixed launch configuration issues in CUDA JIT. 1
- 🛠 Fixed segfaults and "Pure Virtual Call" error warnings when exiting on
  Windows. 1 2
- ↪ Workaround for clEnqueueReadBuffer bug on OSX.
  1
🏗 Build
- Fixed issues when compiling with GCC 7.1. 1 2
- Eliminated unnecessary Boost dependency from CPU and CUDA backends. 1
Misc
- ⚡️ Updated support links to point to Slack instead of Gitter. 1
v3.5.0 Changes
June 23, 2017
v3.5.0

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.0.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

⚡️ Major Updates
- 👍 ArrayFire now supports threaded applications. 1
- ➕ Added Canny edge detector. 1
- ➕ Added Sparse-Dense arithmetic operations. 1
🔋 Features
- ArrayFire Threading
  - af::array can be read by multiple threads
  - All ArrayFire functions can be executed concurrently by multiple threads
  - Threads can operate on different devices to simplify Muli-device workloads
- 🆕 New Canny edge detector function, af::canny(). 1
  - Can automatically calculate high threshold with AF_CANNY_THRESHOLD_AUTO_OTSU
  - Supports both L1 and L2 Norms to calculate gradients
- 🆕 New tuned OpenCL BLAS backend, CLBlast.
👌 Improvements
- 📄 Converted CUDA JIT to use NVRTC instead of NVVM.
- 🐎 Performance improvements in af::reorder(). 1
- 🐎 Performance improvements in array::scalar(). 1
- 👌 Improved unified backend performance. 1
- ArrayFire now depends on Forge v1.0. 1
- Can now specify the FFT plan cache size using the af::setFFTPlanCacheSize() function.
- Get the number of physical bytes allocated by the memory manager af_get_allocated_bytes(). 1
- af::dot() can now return a scalar value to the host. 1
🐛 Bug Fixes
- 🛠 Fixed improper release of default Mersenne random engine. 1
- 🛠 Fixed af::randu() and af::randn() ranges for floating point types. 1
- 🛠 Fixed assignment bug in CPU backend. 1
- 🛠 Fixed complex (c32,c64) multiplication in OpenCL convolution kernels. 1
- Fixed inconsistent behavior with af::replace() and replace_scalar(). 1
- Fixed memory leak in af_fir(). 1
- 📜 Fixed memory leaks in af_cast for sparse arrays. 1
- Fixing correctness of af_pow for complex numbers by using Cartesian form. 1
- Corrected af::select() with indexing in CUDA and OpenCL backends. 1
- ↪ Workaround for VS2015 compiler ternary bug. 1
- 🛠 Fixed memory corruption in cuda::findPlan(). 1
- Argument checks in af_create_sparse_array avoids inputs of type int64. 1
🏗 Build fixes
- On OSX, utilize new GLFW package from the brew package manager. 1 2
- 🛠 Fixed CUDA PTX names generated by CMake v3.7. 1
- 👌 Support gcc > 5.x for CUDA. 1
Examples
- 🆕 New genetic algorithm example. 1
📚 Documentation
- ⚡️ Updated README.md to improve readability and formatting. 1
- ⚡️ Updated README.md to mention Julia and Nim wrappers. 1
- 👌 Improved installation instructions - docs/pages/install.md. 1
Miscellaneous
- 👍 A few improvements for ROCm support. 1
- ✂ Removed CUDA 6.5 support. 1
Known issues
- 🏁 Windows
  - The Windows NVIDIA driver version 37x.xx contains a bug which causes fftconvolve_opencl to fail. Upgrade or downgrade to a different version of the driver to avoid this failure.
  - The following tests fail on Windows with NVIDIA hardware: threading_cuda,qr_dense_opencl, solve_dense_opencl.
- 🍎 macOS
  - The Accelerate framework, used by the CPU backend on macOS, leverages Intel graphics cards (Iris) when there are no discrete GPUs available. This OpenCL implementation is known to give incorrect results on the following tests: lu_dense_{cpu,opencl}, solve_dense_{cpu,opencl}, inverse_dense_{cpu,opencl}.
  - Certain tests intermittently fail on macOS with NVIDIA GPUs apparently due to inconsistent driver behavior: fft_large_cuda and svd_dense_cuda.
  - The following tests are currently failing on macOS with AMD GPUs: cholesky_dense_opencl and scan_by_key_opencl.
v3.4.2 Changes
December 21, 2016
v3.4.2

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.2.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

🗄 Deprecation Announcement

🚀 This release supports CUDA 6.5 and higher. The next ArrayFire release will
👌 support CUDA 7.0 and higher, dropping support for CUDA 6.5. Reasons for no
👍 longer supporting CUDA 6.5 include:
- 👍 CUDA 7.0 NVCC supports the C++11 standard (whereas CUDA 6.5 does not), which
  is used by ArrayFire's CPU and OpenCL backends.
- Very few ArrayFire users still use CUDA 6.5.
👍 As a result, the older Jetson TK1 / Tegra K1 will no longer be supported in
🚀 the next ArrayFire release. The newer Jetson TX1 / Tegra X1 will continue to
have full capability with ArrayFire.

🐳 Docker
- 🐳 ArrayFire has been Dockerized.
👌 Improvements
- Implemented sparse storage format conversions between AF_STORAGE_CSR
  and AF_STORAGE_COO.
  1
  - Directly convert between AF_STORAGE_COO <--> AF_STORAGE_CSR
    📜 using the af::sparseConvertTo() function.
  - af::sparseConvertTo() now also supports converting to dense.
- 📜 Added cast support for sparse arrays.
  1
  - Casting only changes the values array and the type. The row and column
    index arrays are not changed.
- Reintroduced automated computation of chart axes limits for graphics functions.
  1
  - The axes limits will always be the minimum/maximum of the current and new
    limit.
  - The user can still set limits from API calls. If the user sets a limit
    from the API call, then the automatic limit setting will be disabled.
- Using boost::scoped_array instead of boost::scoped_ptr when managing
  array resources.
  1
- 🐎 Internal performance improvements to getInfo() by using const references
  to avoid unnecessary copying of ArrayInfo objects.
  1
- ➕ Added support for scalar af::array inputs for af::convolve() and
  set functions.
  1
  2
  3
- 🐎 Performance fixes in af::fftConvolve() kernels.
  1
  2
🏗 Build
- 👌 Support for Visual Studio 2015 compilation.
  1
  2
- 🛠 Fixed FindCBLAS.cmake when PkgConfig is used.
  1
🐛 Bug fixes
- 🛠 Fixes to JIT when tree is large.
  1
  2
- 🛠 Fixed indexing bug when converting dense to sparse af::array as
  AF_STORAGE_COO.
  1
- 🛠 Fixed af::bilateral() OpenCL kernel compilation on OS X.
  1
- 🛠 Fixed memory leak in af::regions() (CPU) and af::rgb2ycbcr().
  1
  2
  3
Installers
- 🛠 Major OS X installer fixes.
  1
  - Fixed installation scripts.
  - Fixed installation symlinks for libraries.
- 🏁 Windows installer now ships with more pre-built examples.
Examples
- ➕ Added af::choleskyInPlace() calls to cholesky.cpp example.
  1
📚 Documentation
- ➕ Added u8 as supported data type in getting_started.md.
  1
- 🛠 Fixed typos.
  1
CUDA 8 on OSX
- 👍 CUDA 8.0.55 supports Xcode 8.
  1
Known Issues
- Known failures with CUDA 6.5. These include all functions that use
  📜 sorting. As a result, sparse storage format conversion between
  AF_STORAGE_COO and AF_STORAGE_CSR has been disabled for CUDA 6.5.
v3.4.1 Changes
October 15, 2016
v3.4.1

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.1.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

Installers
- 🐧 Installers for Linux, OS X and Windows
  - CUDA backend now uses CUDA 8.0.
  - Uses Intel MKL 2017.
  - CUDA Compute 2.x (Fermi) is no longer compiled into the library.
- Installer for OS X
  - The libraries shipping in the OS X Installer are now compiled with Apple
    Clang v7.3.1 (previouly v6.1.0).
  - The OS X version used is 10.11.6 (previously 10.10.5).
- Installer for Jetson TX1 / Tegra X1
  - Requires JetPack for L4T 2.3
    🐧 (containing Linux for Tegra r24.2 for TX1).
  - CUDA backend now uses CUDA 8.0 64-bit.
  - Using CUDA's cusolver instead of CPU fallback.
  - Uses OpenBLAS for CPU BLAS.
  - All ArrayFire libraries are now 64-bit.
👌 Improvements
- ➕ Add sparse array support to af::eval().
  1
- ➕ Add OpenCL-CPU fallback support for sparse af::matmul() when running on
  📜 a unified memory device. Uses MKL Sparse BLAS.
- When using CUDA libdevice, pick the correct compute version based on device.
  1
- 👍 OpenCL FFT now also supports prime factors 7, 11 and 13.
  1
  2
🐛 Bug Fixes
- 👍 Allow CUDA libdevice to be detected from custom directory.
- 🛠 Fix aarch64 detection on Jetson TX1 64-bit OS.
  1
- Add missing definition of af_set_fft_plan_cache_size in unified backend.
  1
- 🛠 Fix intial values for af::min() and af::max() operations.
  1
  2
- 🛠 Fix distance calculation in af::nearestNeighbour for CUDA and OpenCL backend.
  1
  2
- 🛠 Fix OpenCL bug where scalars where are passed incorrectly to compile options.
  1
- 🛠 Fix bug in af::Window::surface() with respect to dimensions and ranges.
  1
- Fix possible double free corruption in af_assign_seq().
  1
- ➕ Add missing eval for key in af::scanByKey in CPU backend.
  1
- Fixed creation of sparse values array using AF_STORAGE_COO.
  1
  1
Examples
- ➕ Add a Conjugate Gradient solver example
  📜 to demonstrate sparse and dense matrix operations.
  1
CUDA Backend
- When using CUDA 8.0,
  0️⃣ compute 2.x are no longer in default compute list.
  - This follows CUDA 8.0
    🗄 deprecating computes 2.x.
  - Default computes for CUDA 8.0 will be 30, 50, 60.
- 0️⃣ When using CUDA pre-8.0, the default selection remains 20, 30, 50.
- 0️⃣ CUDA backend now uses -arch=sm_30 for PTX compilation as default.
  - Unless compute 2.0 is enabled.
Known Issues
- af::lu() on CPU is known to give incorrect results when built run on
  OS X 10.11 or 10.12 and compiled with Accelerate Framework.
  1
  - Since the OS X Installer libraries uses MKL rather than Accelerate
    Framework, this issue does not affect those libraries.
v3.4.0 Changes
September 13, 2016
v3.4.0

The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.0.tar.bz2

Installer CUDA Version: 7.5 (Required)
Installer OpenCL Version: 1.2 (Minimum)

⚡️ Major Updates
- 📜 [Sparse Matrix and BLAS](ref sparse_func). 1 2
- Faster JIT for CUDA and OpenCL. 1 2
- 👌 Support for [random number generator engines](ref af::randomEngine).
  1 2
- 👌 Improvements to graphics. 1 2
🔋 Features
- 📜 [Sparse Matrix and BLAS](ref sparse_func) 1 2
  - Support for [CSR](ref AF_STORAGE_CSR) and [COO](ref AF_STORAGE_COO)
    [storage types](ref af_storage).
  - Sparse-Dense Matrix Multiplication and Matrix-Vector Multiplication as a
    part of af::matmul() using AF_STORAGE_CSR format for sparse.
  - Conversion to and from [dense](ref AF_STORAGE_DENSE) matrix to [CSR](ref AF_STORAGE_CSR)
    and [COO](ref AF_STORAGE_COO) [storage types](ref af_storage).
- Faster JIT 1 2
  - Performance improvements for CUDA and OpenCL JIT functions.
  - Support for evaluating multiple outputs in a single kernel. See af::array::eval() for more.
- [Random Number Generation](ref af::randomEngine)
  1 2
  - af::randomEngine(): A random engine class to handle setting the type and seed
    for random number generator engines.
  - Supported engine types are:
  - Philox
  - Threefry
  - Mersenne Twister
- Graphics 1 2
  - Using Forge v0.9.0
  - [Vector Field](ref af::Window::vectorField) plotting functionality.
    1
  - Removed GLEW and replaced with glbinding.
  - Removed usage of GLEW after support for MX (multithreaded) was dropped in v2.0.
    1
  - Multiple overlays on the same window are now possible.
  - Overlays support for same type of object (2D/3D)
  - Supported by af::Window::plot, af::Window::hist, af::Window::surface,
    af::Window::vectorField.
  - New API to set axes limits for graphs.
  - Draw calls do not automatically compute the limits. This is now under user control.
  - af::Window::setAxesLimits can be used to set axes limits automatically or manually.
  - af::Window::setAxesTitles can be used to set axes titles.
  - New API for plot and scatter:
  - af::Window::plot() and af::Window::scatter() now can handle 2D and 3D and determine appropriate order.
  - af_draw_plot_nd()
  - af_draw_plot_2d()
  - af_draw_plot_3d()
  - af_draw_scatter_nd()
  - af_draw_scatter_2d()
  - af_draw_scatter_3d()
- 🆕 New [interpolation methods](ref af_interp_type)
  1
  - Applies to
  - af::resize()
  - af::transform()
  - af::approx1()
  - af::approx2()
- 👌 Support for [complex mathematical functions](ref mathfunc_mat)
  1
  - Add complex support for trig_mat, af::sqrt(), af::log().
- 🚦 af::medfilt1(): Median filter for 1-d signals 1
- Generalized scan functions: scan_func_scan and scan_func_scanbykey
  - Now supports inclusive or exclusive scans
  - Supports binary operations defined by af_binary_op.
    1
- [Image Moments](ref moments_mat) functions
  1
- ➕ Add af::getSizeOf() function for af_dtype
  1
- Explicitly extantiate af::array::device() for `void *
  1
🐛 Bug Fixes
- 🛠 Fixes to edge-cases in morph_mat. 1
- 👉 Makes JIT tree size consistent between devices. 1
- Delegate higher-dimension in convolve_mat to correct dimensions. 1
- Indexing fixes with C++11. 1 2
- 🖐 Handle empty arrays as inputs in various functions. 1
- 🛠 Fix bug when single element input to af::median. 1
- 🛠 Fix bug in calculation of time from af::timeit(). 1
- 🛠 Fix bug in floating point numbers in af::seq. 1
- 🛠 Fixes for OpenCL graphics interop on NVIDIA devices.
  1
- 🛠 Fix bug when compiling large kernels for AMD devices.
  1
- 🛠 Fix bug in af::bilateral when shared memory is over the limit.
  1
- 🛠 Fix bug in kernel header compilation tool bin2cpp.
  1
- 🛠 Fix inital values for morph_mat functions.
  1
- 🛠 Fix bugs in af::homography() CPU and OpenCL kernels.
  1
- 🛠 Fix bug in CPU TNJ.
  1
👌 Improvements
- CUDA 8 and compute 6.x(Pascal) support, current installer ships with CUDA 7.5. 1 2 3
- 👉 User controlled FFT plan caching. 1
- CUDA performance improvements for image_func_wrap, image_func_unwrap and approx_mat.
  1
- 👍 Fallback for CUDA-OpenGL interop when no devices does not support OpenGL.
  1
- Additional forms of batching with the transform_func_transform functions.
  New behavior defined here.
  1
- ⚡️ Update to OpenCL2 headers. 1
- 👌 Support for integration with external OpenCL contexts. 1
- 🐎 Performance improvements to interal copy in CPU Backend.
  1
- 🐎 Performance improvements to af::select and af::replace CUDA kernels.
  1
- 0️⃣ Enable OpenCL-CPU offload by default for devices with Unified Host Memory.
  1
  - To disable, use the environment variable AF_OPENCL_CPU_OFFLOAD=0.
🏗 Build
- Compilation speedups. 1
- 🏗 Build fixes with MKL. 1
- Error message when CMake CUDA Compute Detection fails. 1
- 🏗 Several CMake build issues with Xcode generator fixed.
  1 2
- 🛠 Fix multiple OpenCL definitions at link time. 1
- 🛠 Fix lapacke detection in CMake. 1
- ⚡️ Update build tags of
  - clBLAS
  - clFFT
  - Boost.Compute
  - Forge
  - glbinding
- 🛠 Fix builds with GCC 6.1.1 and GCC 5.3.0. 1
Installers
- 🏗 All installers now ship with ArrayFire libraries build with MKL 2016.
- All installers now ship with Forge development files and examples included.
- 🚚 CUDA Compute 2.0 has been removed from the installers. Please contact us
  directly if you have a special need.
Examples
- ➕ Added [example simulating gravity](ref graphics/field.cpp) for
  demonstration of vector field.
- Improvements to financial/black_scholes_options.cpp example.
- 👌 Improvements to graphics/gravity_sim.cpp example.
- 🛠 Fix graphics examples to use af::Window::setAxesLimits and
  af::Window::setAxesTitles functions.
📚 Documentation & Licensing
- ArrayFire copyright and trademark policy
- 🛠 Fixed grammar in license.
- ➕ Add license information for glbinding.
- ✂ Remove license infomation for GLEW.
- Random123 now applies to all backends.
- Random number functions are now under random_mat.
🗄 Deprecations

🚚 The following functions have been deprecated and may be modified or removed
permanently from future versions of ArrayFire.
- af::Window::plot3(): Use af::Window::plot instead.
- af_draw_plot(): Use af_draw_plot_nd or af_draw_plot_2d instead.
- af_draw_plot3(): Use af_draw_plot_nd or af_draw_plot_3d instead.
- af::Window::scatter3(): Use af::Window::scatter instead.
- af_draw_scatter(): Use af_draw_scatter_nd or af_draw_scatter_2d instead.
- af_draw_scatter3(): Use af_draw_scatter_nd or af_draw_scatter_3d instead.
Known Issues

✅ Certain CUDA functions are known to be broken on Tegra K1. The following ArrayFire tests are currently failing:
- assign_cuda
- harris_cuda
- homography_cuda
- median_cuda
- orb_cudasort_cuda
- sort_by_key_cuda
- sort_index_cuda

ArrayFire changelog

ArrayFire: a general purpose GPU library.

Changelog History Page 2

v3.5.1 Changes

v3.5.1

👌 Improvements

🐎 Performance

🐛 Bug fixes

🏗 Build

Misc

v3.5.0 Changes

v3.5.0

⚡️ Major Updates

🔋 Features

👌 Improvements

🐛 Bug Fixes

🏗 Build fixes

Examples

📚 Documentation

Miscellaneous

Known issues

v3.4.2 Changes

v3.4.2

🗄 Deprecation Announcement

🐳 Docker

👌 Improvements

🏗 Build

🐛 Bug fixes

Installers

Examples

📚 Documentation

CUDA 8 on OSX

Known Issues

v3.4.1 Changes

v3.4.1

Installers

👌 Improvements

🐛 Bug Fixes

Examples

CUDA Backend

Known Issues

v3.4.0 Changes

v3.4.0

⚡️ Major Updates

🔋 Features

🐛 Bug Fixes

👌 Improvements

🏗 Build

Installers

Examples

📚 Documentation & Licensing

🗄 Deprecations

Known Issues

Changelog History

Page 2