Changelog History
Page 1
-
v3.8.rc Changes
October 05, 2020๐ v3.8.0 Release Candidate
๐ New Functions
- Ragged max reduction - #2786
- ๐ Initialization list constructor for array class - #2829 , #2987
- ๐ New API for following statistics function: cov, var and stdev - #2986
- ๐ Bit-wise operator support for array and C API (af_bitnot) - #2865
- allocV2 and freeV2 which return cl_mem on OpenCL backend - #2911
- ๐ Move constructor and move assignment operator for Dim4 class - #2946
๐ Improvements
- โ Add f16 support for histogram - #2984
- โก๏ธ Update confidence connected components example for better illustration - #2968
- Enable disk caching of OpenCL kernel binaries - #2970
- ๐จ Refactor extension of kernel binaries stored to disk
.bin
- #2970 - โ Add minimum driver versions for CUDA toolkit 11 in internal map - #2982
- ๐ Improve warnings messages from run-time kernel compilation functions - #2996
๐ Fixes
- ๐ Fix bias factor of variance in var_all and cov functions - #2986
- ๐ Fix a race condition in confidence connected components function for OpenCL backend - #2969
- Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - #2970
- ๐ Fix randn by passing in correct values to Box-Muller - #2980
- ๐ Fix rounding issues in Box-Muller function used for RNG - #2980
- ๐ Fix problems in RNG for older compute architectures with fp16 - #2980#2996
- ๐ Fix performance regression of approx functions - #2977
- โ Remove assert that check that signal/filter types have to be the same - #2993
- ๐ Fix
checkAndSetDevMaxCompute
when the device cc is greater than max - #2996 - ๐ Fix documentation errors and warnings - #2973 , #2987
- โ Add missing opencl-arrayfire interoperability functions in unified back - #2981
Contributions
Special thanks to our contributors: P. J. Reed
-
v3.7.3 Changes
November 23, 2020v3.7.3
๐ Improvements
- โ Add f16 support for histogram - #2984
- โก๏ธ Update confidence connected components example for better illustration - #2968
- Enable disk caching of OpenCL kernel binaries - #2970
- ๐จ Refactor extension of kernel binaries stored to disk
.bin
- #2970 - โ Add minimum driver versions for CUDA toolkit 11 in internal map - #2982
- ๐ Improve warnings messages from run-time kernel compilation functions - #2996
๐ Fixes
- ๐ Fix bias factor of variance in var_all and cov functions - #2986
- ๐ Fix a race condition in confidence connected components function for OpenCL backend - #2969
- Safely ignore disk cache failures in CUDA backend for compiled kernel binaries - #2970
- ๐ Fix randn by passing in correct values to Box-Muller - #2980
- ๐ Fix rounding issues in Box-Muller function used for RNG - #2980
- ๐ Fix problems in RNG for older compute architectures with fp16 - #2980#2996
- ๐ Fix performance regression of approx functions - #2977
- โ Remove assert that check that signal/filter types have to be the same - #2993
- ๐ Fix
checkAndSetDevMaxCompute
when the device cc is greater than max - #2996 - ๐ Fix documentation errors and warnings - #2973 , #2987
- โ Add missing opencl-arrayfire interoperability functions in unified back - #2981
- ๐ Fix constexpr relates compilation error with VS2019 and Clang Compilers - #3049
Contributions
Special thanks to our contributors: P. J. Reed
-
v3.7.2 Changes
July 13, 2020v3.7.2
๐ Improvements
- Cache CUDA kernels to disk to improve load times(Thanks to @cschreib-ibex) #2848
- Staticly link against cuda libraries #2785
- ๐ Make cuDNN an optional build dependency #2836
- ๐ Improve support for different compilers and OS #2876 #2945 #2925 #2942 #2943 #2945
- ๐ Improve performance of join and transpose on CPU #2849
- ๐ Improve documentation #2816 #2821 #2846 #2918 #2928 #2947
- โฌ๏ธ Reduce binary size using NVRTC and template reducing instantiations #2849 #2861 #2890
- ๐ Improve reduceByKey performance on OpenCL by using builtin functions #2851
- ๐ Improve support for Intel OpenCL GPUs #2855
- ๐ Allow staticly linking against MKL #2877 (Sponsered by SDL)
- ๐ Better support for older CUDA toolkits #2923
- โ Add support for CUDA 11 #2939
- โ Add support for ccache for faster builds #2931
- โ Add support for the conan package manager on linux #2875
- ๐ Propagate build errors up the stack in AFError exceptions #2948 #2957
- ๐ Improve runtime dependency library loading #2954
- ๐ Improved cuDNN runtime checks and warnings #2960
- Document af_memory_manager_* native memory return values #2911
- โ Add support for cuDNN 8 #2963
๐ Fixes
- ๐ Bug crash when allocating large arrays #2827
- ๐ Fix various compiler warnings #2827 #2849 #2872 #2876
- ๐ Fix minor leaks in OpenCL functions #2913
- ๐ Various continuous integration related fixes #2819
- ๐ Fix zero padding with convolv2NN #2820
- Fix af_get_memory_pressure_threshold return value #2831
- Increased the max filter length for morph
- ๐ Handle empty array inputs for LU, QR, and Rank functions #2838
- ๐ Fix FindMKL.cmake script for sequential threading library #2840
- ๐จ Various internal refactoring #2839 #2861 #2864 #2873 #2890 #2891 #2913
- ๐ Fix OpenCL 2.0 builtin function name conflict #2851
- ๐ Fix error caused when releasing memory with multiple devices #2867
- ๐ Fix missing set stacktrace symbol from unified API #2915
- ๐ Fix zero padding issue in convolve2NN #2820
- ๐ Fixed bugs in ReduceByKey #2957
- โ Add clblast patch to handle custom context with multiple devices #2967
Contributions
Special thanks to our contributors:
Corentin Schreiber
Jacob Kahn
Paul Jurczak
Christoph Junghans -
v3.7.1 Changes
March 28, 2020v3.7.1
๐ Improvements
- ๐ Improve mtx download for test data #2742
- ๐ Improve Documentation #2754 #2792 #2797
- โ Remove verbose messages in older CMake versions #2773
- โฌ๏ธ Reduce binary size with the use of NVRTC #2790
- ๐ Use texture memory to load LUT in orb and fast #2791
- โ Add missing print function for f16 #2784
- โ Add checks for f16 support in the CUDA backend #2784
- Create a thrust policy to intercept temporary buffer allocations #2806
๐ Fixes
- ๐ Fix segfault on exit when ArrayFire is not initialized in the main thread
- ๐ Fix support for CMake 3.5.1 #2771 #2772 #2760
- ๐ Fix evalMultiple if the input array sizes aren't the same #2766
- Fix error when AF_BACKEND_DEFAULT is passed directly to backend #2769
- โช Workaround name collision with AMD OpenCL implementation #2802
- ๐ Fix on-exit errors with the unified backend #2769
- ๐ Fix check for f16 compatibility in OpenCL #2773
- ๐ Fix matmul on Intel OpenCL when passing same array as input #2774
- ๐ Fix CPU OpenCL blas batching #2774
- ๐ Fix memory pressure in the default memory manager #2801
Contributions
Special thanks to our contributors:
padentomasello
glavaux2 -
v3.7.0 Changes
February 13, 2020v3.7.0
โก๏ธ Major Updates
- โ Added the ability to customize the memory manager(Thanks jacobkahn and flashlight) [#2461]
- โ Added 16-bit floating point support for several functions [#2413] [#2587] [#2585] [#2587] [#2583]
- โ Added sumByKey, productByKey, minByKey, maxByKey, allTrueByKey, anyTrueByKey, countByKey [#2254]
- โ Added confidence connected components [#2748]
- โ Added neural network based convolution and gradient functions [#2359]
- โ Added a padding function [#2682]
- โ Added pinverse for pseudo inverse [#2279]
- โ Added support for uniform ranges in approx1 and approx2 functions. [#2297]
- โ Added support to write to preallocated arrays for some functions [#2599] [#2481] [#2328] [#2327]
- โ Added meanvar function [#2258]
- โ Add support for sparse-sparse arithmetic support [#2312]
- โ Added rsqrt function for reciprocal square root [#2500]
- โ Added a lower level af_gemm function for general matrix multiplication [#2481]
- โ Added a function to set the cuBLAS math mode for the CUDA backend [#2584]
- Separate debug symbols into separate files [#2535]
- ๐จ Print stacktraces on errors [#2632]
- ๐ Support move constructor for af::array [#2595]
- ๐ฆ Expose events in the public API [#2461]
- โ Add setAxesLabelFormat to format labels on graphs [#2495]
๐ Improvements
- ๐ Better error messages for systems with driver or device incompatibilities [#2678] [#2448][#2761]
- โก๏ธ Optimized unified backend function calls [#2695]
- โก๏ธ Optimized anisotropic smoothing [#2713]
- โก๏ธ Optimized canny filter for CUDA and OpenCL [#2727]
- ๐ Better MKL search script [#2738][#2743][#2745]
- ๐ Better logging of different submodules in ArrayFire [#2670] [#2669]
- ๐ Improve documentation [#2665] [#2620] [#2615] [#2639] [#2628] [#2633] [#2622] [#2617] [#2558] [#2326][#2515]
- โก๏ธ Optimized af::array assignment [#2575]
- โก๏ธ Update the k-means example to display the result [#2521]
๐ Fixes
- ๐ Fix multi-config generators [#2736]
- ๐ Fix access errors in canny [#2727]
- ๐ Fix segfault in the unified backend if no backends are available [#2720]
- ๐ Fix access errors in scan-by-key [#2693]
- ๐ Fix sobel operator [#2600]
- ๐ Fix an issue with the random number generator and s16 [#2587]
- ๐ Fix issue with boolean product reduction [#2544]
- ๐ Fix array_proxy move constructor [#2537]
- ๐ Fix convolve3 launch configuration [#2519]
- ๐ Fix an issue where the fft function modified the input array [#2520]
- โ Added a work around for nvidia-opencl runtime if forge dependencies are missing [#2761]
Contributions
Special thanks to our contributors:
@jacobkahn
@WilliamTambellini
@lehins
@r-barnes
@gaika
@ShalokShalom -
v3.6.4 Changes
May 20, 2019v3.6.4
The source code with sub-modules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.4.tar.bz2
๐ Fixes
-
v3.6.3 Changes
April 22, 2019v3.6.3
The source code with sub-modules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.3.tar.bz2
๐ Improvements
- Graphics are now a runtime dependency instead of a link time dependency #2365
- โฌ๏ธ Reduce the CUDA backend binary size using runtime compilation of kernels #2437
- Improved batched matrix multiplication on the CPU backend by using Intel MKL's
cblas_Xgemm_batched
#2206 - Print JIT kernels to disk or stream using the
AF_JIT_KERNEL_TRACE
environment variable #2404 void*
pointers are now allowed as arguments toaf::array::write()
#2367- Slightly improve the efficiency of JITed tile operations #2472
- ๐ Make the random number generation on the CPU backend to be consistent with CUDA and OpenCL #2435
- ๐ Handled very large JIT tree generations #2484 #2487
๐ Bug Fixes
- ๐ Fixed
af::array::array_proxy
move assignment operator #2479 - ๐ Fixed input array dimensions validation in svdInplace() #2331
- ๐ Fixed the typedef declaration for window resource handle #2357.
- Increase compatibility with GCC 8 #2379
- ๐ Fixed
af::write
tests #2380 - ๐ Fixed a bug in broadcast step of 1D exclusive scan #2366
- ๐ Fixed OpenGL related build errors on OSX #2382
- ๐ Fixed multiple array evaluation. Performance improvement. #2384
- ๐ Fixed buffer overflow and expected output of kNN SSD small test #2445
- ๐ Fixed MKL linking order to enable threaded BLAS #2444
- โ Added validations for forge module plugin availability before calling resource cleanup #2443
- Improve compatibility on MSVC toolchain(_MSC_VER > 1914) with the CUDA backend #2443
- ๐ Fixed BLAS gemm func generators for newest MSVC 19 on VS 2017 #2464
- ๐ Fix errors on exits when using the cuda backend with unified #2470
๐ Documentation
- ๐ Updated svdInplace() documentation following a bugfix #2331
- ๐ Fixed a typo in matrix multiplication documentation #2358
- ๐ Fixed a code snippet demonstrating C-API use #2406
- โก๏ธ Updated hamming matcher implementation limitation #2434
- โ Added illustration for the rotate function #2453
Misc
- ๐ Use cudaMemcpyAsync instead of cudaMemcpy throughout the codebase #2362
- Display a more informative error message if CUDA driver is incompatible #2421 #2448
- ๐ Changed forge resource management to use smart pointers #2452
- ๐ Deprecated intl and uintl typedefs in API #2360
- ๐ Enabled graphics by default for all builds starting with v3.6.3 #2365
- ๐ Fixed several warnings #2344 #2356 #2361
- ๐จ Refactored initArray() calls to use createEmptyArray(). initArray() is for internal use only by Array class. #2361
- ๐จ Refactored
void*
memory allocations to use unsigned char type #2459 - ๐ Replaced deprecated MKL API with in-house implementations for sparse to sparse/dense conversions #2312
- ๐ Reorganized and fixed some internal backend API #2356
- โก๏ธ Updated compilation order of CUDA files to speed up compile time #2368
- โ Removed conditional graphics support builds after enabling runtime loading of graphics dependencies #2365
- Marked graphics dependencies as optional in CPack RPM config #2365
- ๐จ Refactored a sparse arithmetic backend API #2379
- Fixed const correctness of
af_device_array
API #2396 - โก๏ธ Update Forge to v1.0.4 #2466
- Manage Forge resources from the DeviceManager class #2381
- ๐ Fixed non-mkl & non-batch blas upstream call arguments #2401
- ๐ Link MKL with OpenMP instead of TBB by default
- ๐ use clang-format to format source code
Contributions
Special thanks to our contributors:
Alessandro Bessi
zhihaoy
Jacob Khan
William Tambellini -
v3.6.2 Changes
November 29, 2018v3.6.2
The source code with sub-modules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.2.tar.bz2
๐ Features
- ๐ Batching support for
cond
argument in select() [#2243] - Broadcast batching for matmul [#2315]
- โ Add support for multiple nearest neighbours from nearestNeighbour() [#2280]
๐ Improvements
- ๐ Performance improvements in morph() [#2238]
- ๐ Fix linking errors when compiling without Freeimage/Graphics [#2248]
- ๐ Fixes to improve the usage of ArrayFire as a sub-project [#2290]
- ๐ Allow custom library path for loading dynamic backend libraries [#2302]
๐ Bug fixes
- ๐ Fix overflow in
dim4::ndims
. [#2289] - โ Remove setDevice from
af::array
destructor [#2319] - ๐ Fix pow precision for integral types [#2305]
- ๐ Fix issues with tile with a large repeat dimension [#2307]
- Fix grid based indexing calculation in
af_draw_hist
[#2230] - ๐ Fix bug when using an
af::array
for indexing [#2311] - ๐ Fix CLBlast errors on exit on Windows [#2222]
๐ Documentation
- ๐ Improve
unwrap
documentation [#2301] - ๐ Improve
wrap
documentation [#2320] - ๐ Fix and improve
accum
documentation [#2298] - ๐ Improve
tile
documentation [#2293] - ๐ Clarify
approx*
indexing in documentation [#2287] - ๐ Update examples of select in detailed documentation [#2277]
- โก๏ธ Update
lookup
examples [#2288] - ๐ Update set documentation [#2299]
Misc
- ๐ New ArrayFire ASSERT utility functions [#2249][#2256][#2257][#2263]
- ๐ Improve error messages in JIT [#2309]
af*
library and dependencies directory changed tolib64
[#2186]
Contributions
Thank you to our contributors:
Jacob Kahn
Vardan Akopian - ๐ Batching support for
-
v3.6.1 Changes
July 06, 2018v 3.6.1
๐ The source code for this release can be downloaded here:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.1.tar.bz2๐ Improvements
- FreeImage is now a run-time dependency [#2164]
- โฌ๏ธ Reduced binary size by setting the symbol visibility to hidden [#2168]
- โ Add logging to memory manager and unified loader using the
AF_TRACE
environment variable [#2169][#2216] - ๐ Improved CPU Anisotropic Diffusion performance [#2174]
- Perform normalization after FFT for improved accuracy [#2185, #2192]
- โก๏ธ Updated CLBlast to v1.4.0 [#2178]
- โ Added additional validation when using
af::seq
for indexing [#2153] - ๐ Perform checks for unsupported cards by the CUDA implementation [#2182]
- Avoid selecting backend if no devices are found. [#2218]
๐ Bug Fixes
- ๐ Fixed region when all pixels were the foreground or background [#2152]
- ๐ Fixed several memory leaks [#2202, #2201, #2180, #2179, #2177, #2175]
- ๐ Fixed bug in setDevice which didn't allow you to select the last device [#2189]
- ๐ Fixed bug in min/max where the first element of the array was a NaN value [#2155]
- ๐ Fixed graphics window indexing [#2207]
- ๐ Fixed renaming issue when installing cuda libraries on OSX [#2221]
- ๐ Fixed NSIS installer PATH variable [#2223]
-
v3.6.0 Changes
May 04, 2018v3.6.0
The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.0.tar.bz2โก๏ธ Major Updates
- Added the
topk()
function. 1 - Added batched matrix multiply support.2 3
- Added anisotropic diffusion,
anisotropicDiffusion()
.Documentation 3.
๐ Features
- Added support for batched matrix multiply. 1 2
- New anisotropic diffusion function,
anisotropicDiffusion()
. Documentation 3. - New
topk()
function, which returns the top k elements along a given dimension of the input. Documentation. 4 - ๐จ New gradient diffusion example.
๐ Improvements
- JITed
select()
andshift()
functions for CUDA and OpenCL backends. 1 - Significant CMake improvements. 2 3 4
- ๐ Improved the quality of the random number generator 5
- โ Corrected assert function calls in select() tests. 5
- Modified
af_colormap
struct to match forge's definition. 6 - ๐ Improved Black Scholes example. 7
- ๐ Used CPack to generate installers. 8. We will be using CPack to generate installers beginning with this release.
- Refactored black_scholes_options example to use built-in
af::erfc
function for cumulative normal distribution.9. - โฌ๏ธ Reduced the scope of mutexes in memory manager 10
- Official installers do not require the CUDA toolkit to be installed starting with v3.6.0.
๐ Bug fixes
- โ Fixed
shfl_down()
warnings with CUDA 9. 1 - Disabled CUDA JIT debug flags on ARM architecture.2
- ๐ Fixed CLBLast install lib dir for linux platform where
lib
directory has arch(64) suffix.3 - ๐ Fixed assert condition in 3d morph opencl kernel.4
- ๐ Fixed JIT errors with large non-linear kernels5
- ๐ Fixed bug in CPU JIT after moddims was called 5
- ๐ Fixed a deadlock scenario caused by the method
MemoryManager::nativeFree
6
๐ Documentation
- ๐ Fixed variable name typo in
vectorization.md
. 1 - Fixed
AF_API_VERSION
value in Doxygen config file. 2
Known issues
- ๐ NVCC does not currently support platform toolset v141 (Visual Studio 2017 R15.6). Use the v140 platform toolset, instead. You may pass in the toolset version to CMake via the
-T
flag like socmake -G "Visual Studio 15 2017 Win64" -T v140
.- To download and install other platform toolsets, visit https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017
- โ
Several OpenCL tests failing on OSX:
canny_opencl, fft_opencl, gen_assign_opencl, homography_opencl, reduce_opencl, scan_by_key_opencl, solve_dense_opencl, sparse_arith_opencl, sparse_convert_opencl, where_opencl
Contributions
Special thanks to our contributors:
Adrien F. Vincent, Cedric Nugteren, Felix, Filip Matzner, HoneyPatouceul, Patrick Lavin, Ralf Stubner, William Tambellini - Added the