ArrayFire v3.6.3 Release Notes
Release Date: 2019-04-22 // almost 5 years ago-
v3.6.3
The source code with sub-modules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.3.tar.bz2
๐ Improvements
- Graphics are now a runtime dependency instead of a link time dependency #2365
- โฌ๏ธ Reduce the CUDA backend binary size using runtime compilation of kernels #2437
- Improved batched matrix multiplication on the CPU backend by using Intel MKL's
cblas_Xgemm_batched
#2206 - Print JIT kernels to disk or stream using the
AF_JIT_KERNEL_TRACE
environment variable #2404 void*
pointers are now allowed as arguments toaf::array::write()
#2367- Slightly improve the efficiency of JITed tile operations #2472
- ๐ Make the random number generation on the CPU backend to be consistent with CUDA and OpenCL #2435
- ๐ Handled very large JIT tree generations #2484 #2487
๐ Bug Fixes
- ๐ Fixed
af::array::array_proxy
move assignment operator #2479 - ๐ Fixed input array dimensions validation in svdInplace() #2331
- ๐ Fixed the typedef declaration for window resource handle #2357.
- Increase compatibility with GCC 8 #2379
- ๐ Fixed
af::write
tests #2380 - ๐ Fixed a bug in broadcast step of 1D exclusive scan #2366
- ๐ Fixed OpenGL related build errors on OSX #2382
- ๐ Fixed multiple array evaluation. Performance improvement. #2384
- ๐ Fixed buffer overflow and expected output of kNN SSD small test #2445
- ๐ Fixed MKL linking order to enable threaded BLAS #2444
- โ Added validations for forge module plugin availability before calling resource cleanup #2443
- Improve compatibility on MSVC toolchain(_MSC_VER > 1914) with the CUDA backend #2443
- ๐ Fixed BLAS gemm func generators for newest MSVC 19 on VS 2017 #2464
- ๐ Fix errors on exits when using the cuda backend with unified #2470
๐ Documentation
- ๐ Updated svdInplace() documentation following a bugfix #2331
- ๐ Fixed a typo in matrix multiplication documentation #2358
- ๐ Fixed a code snippet demonstrating C-API use #2406
- โก๏ธ Updated hamming matcher implementation limitation #2434
- โ Added illustration for the rotate function #2453
Misc
- ๐ Use cudaMemcpyAsync instead of cudaMemcpy throughout the codebase #2362
- Display a more informative error message if CUDA driver is incompatible #2421 #2448
- ๐ Changed forge resource management to use smart pointers #2452
- ๐ Deprecated intl and uintl typedefs in API #2360
- ๐ Enabled graphics by default for all builds starting with v3.6.3 #2365
- ๐ Fixed several warnings #2344 #2356 #2361
- ๐จ Refactored initArray() calls to use createEmptyArray(). initArray() is for internal use only by Array class. #2361
- ๐จ Refactored
void*
memory allocations to use unsigned char type #2459 - ๐ Replaced deprecated MKL API with in-house implementations for sparse to sparse/dense conversions #2312
- ๐ Reorganized and fixed some internal backend API #2356
- โก๏ธ Updated compilation order of CUDA files to speed up compile time #2368
- โ Removed conditional graphics support builds after enabling runtime loading of graphics dependencies #2365
- Marked graphics dependencies as optional in CPack RPM config #2365
- ๐จ Refactored a sparse arithmetic backend API #2379
- Fixed const correctness of
af_device_array
API #2396 - โก๏ธ Update Forge to v1.0.4 #2466
- Manage Forge resources from the DeviceManager class #2381
- ๐ Fixed non-mkl & non-batch blas upstream call arguments #2401
- ๐ Link MKL with OpenMP instead of TBB by default
- ๐ use clang-format to format source code
Contributions
Special thanks to our contributors:
Alessandro Bessi
zhihaoy
Jacob Khan
William Tambellini